Date: Thu, 23 Sep 2010 15:13:44 -0400
Reply-To: Ming Chen <chenming@GMAIL.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Ming Chen <chenming@GMAIL.COM>
Subject: why random access is slower?
Content-Type: text/plain; charset=ISO-8859-1
I have to do a left join one table with 4 other tables.
Following is the table information:
Big1 : 45 millions rows with 10 vars, 3.5GB compressed in size
Big2 : 17 million rows with 5 vars, 1GB compressed
Big3 : 8 million rows with 5 vars, 800MB compressed
Small1 : 54 thousand rows with 12 vars
Small2 : 300 rows with 5 vars
I thought I came with a faster join methods by doing these following steps:
1. give each table a new column with the positions of each row. (for
example, Seq = _*n*_)
2. join the 5 tables together by only keeping the keys and position column
in each table
3. Based on the result of steps, using random access date set options point
to generate the final results.
What surprised me is that step 3 is very slow and it is going to take 24
hours to get the job done. I have tried the SGIO option but with very
limited improvement. the bottleneck is the I/O. Is there is way to improve?
I don't know whether direct proc sql can do any better if I have enough SAS
work space available to let me try out.
Thanks.