Date: Wed, 12 Oct 2005 10:16:47 -0600
Reply-To: Alan Churchill <SASL001@SAVIAN.NET>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Alan Churchill <SASL001@SAVIAN.NET>
Subject: Re: sas Performance Enhancement
In-Reply-To: <200510121602.j9CF6VJ1013766@malibu.cc.uga.edu>
Content-Type: text/plain; charset="us-ascii"
Paul,
Great analysis. I had assumed (that all too often dangerous thing) that
SASFILE would yield lots of improvements but the proof is in the pudding.
Really good work.
Alan Churchill
Savian "Bridging SAS and Microsoft Technologies"
www.savian.net
-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of
Dorfman, Paul
Sent: Wednesday, October 12, 2005 10:03 AM
To: SAS-L@LISTSERV.UGA.EDU
Subject: Re: sas Performance Enhancement
Ben,
Since its inception, SASFILE has been benchmarked at least by one person I
know of. Originally (and understandably) I was highly enthused when
SASFILE appeared in V8, but minimal testing quickly showed that, with
notable exceptions (see below) it has been a sort of disappointment,
mainly because, following the common wisdom of "memory is 100 faster than
disk", I had anticipated much more sizeable performance improvements.
To wit, say we have a .5g indexed SAS file:
216 data halfgig (index = (key)) ;
217 array data [49] ;
218 do key = 1 to 2 ** 30 / 2 / 8 / 50 ;
219 output ;
220 end ;
221 run ;
NOTE: The data set USER.HALFGIG has 1342177 observations and 50 variables.
NOTE: Simple index key has been defined.
NOTE: DATA statement used (Total process time):
real time 12.34 seconds
user cpu time 3.04 seconds
system cpu time 7.03 seconds
Memory 82118k
Let us see how fast SAS reads it (a) sequentially, (b) via the index, (c)
by osbervation number:
225 data _null_ ;
226 set halfgig ;
227 run ;
NOTE: There were 1342177 observations read from the data set USER.HALFGIG.
NOTE: DATA statement used (Total process time):
real time 1.36 seconds
user cpu time 0.75 seconds
system cpu time 0.63 seconds
Memory 493k
229 data _null_ ;
230 do _n_ = 1 to 1e5 ;
231 key = ceil (ranuni (1) * n) ;
232 set halfgig key = key nobs = n ;
233 end ;
234 stop ;
235 run ;
NOTE: DATA statement used (Total process time):
real time 6.83 seconds
user cpu time 1.37 seconds
system cpu time 5.46 seconds
Memory 509k
237 data _null_ ;
238 do _n_ = 1 to 1e5 ;
239 ptr = ceil (ranuni (1) * n) ;
240 set halfgig point = ptr nobs = n ;
241 end ;
242 stop ;
243 run ;
NOTE: DATA statement used (Total process time):
real time 4.09 seconds
user cpu time 0.46 seconds
system cpu time 3.64 seconds
Memory 509k
Ok? Now let us open the file using SASFILE:
245 sasfile halfgig open ;
NOTE: The file USER.HALFGIG.DATA has been opened by the SASFILE statement.
246
247 data _null_ ;
248 set halfgig ;
249 run ;
NOTE: There were 1342177 observations read from the data set USER.HALFGIG.
NOTE: DATA statement used (Total process time):
real time 4.75 seconds
user cpu time 0.82 seconds
system cpu time 3.93 seconds
Memory 493k
Note that most of the time consumed by this step is spent loading the file
into memory, which is evident both from comparing the run-time with the
earlier read (without SASFILE) and from:
251 data _null_ ;
252 set halfgig ;
253 run ;
NOTE: There were 1342177 observations read from the data set USER.HALFGIG.
NOTE: DATA statement used (Total process time):
real time 1.11 seconds
user cpu time 1.12 seconds
system cpu time 0.01 seconds
Memory 493k
Interestingly, real memory usage is NOT reported in the log by either
step, nor is it reported if one elects to load the file beforehand using:
SASFILE HALFGIG LOAD ;
which you kind of know is executing because there are several seconds to
sit there twiddling thumbs before the next step kicks off. Now testing the
speeds of indexed and direct reads against the SASFILEd file reveals:
255 data _null_ ;
256 do _n_ = 1 to 1e5 ;
257 key = ceil (ranuni (1) * n) ;
258 set halfgig key = key nobs = n ;
259 end ;
260 stop ;
261 run ;
NOTE: DATA statement used (Total process time):
real time 1.87 seconds
user cpu time 1.69 seconds
system cpu time 0.19 seconds
Memory 501k
263 data _null_ ;
264 do _n_ = 1 to 1e5 ;
265 ptr = ceil (ranuni (1) * n) ;
266 set halfgig point = ptr nobs = n ;
267 end ;
268 stop ;
269 run ;
NOTE: DATA statement used (Total process time):
real time 0.59 seconds
user cpu time 0.60 seconds
system cpu time 0.00 seconds
Memory 501k
271 sasfile halfgig close ;
NOTE: The file USER.HALFGIG.DATA has been closed by the SASFILE
statement.
So, what is the net result? This (S = with SASFILE, N = without SASFILE):
Time, sec | Real | User CPU | System CPU
-----------+------------+------------+-------------
READ | S | N | S | N | S | N
-----------+------+-----+-----+------+-------+-----
Sequential | 1.11 | 1.36| 1.12| 0.75 | 0.01 | 0.63
-----------+------+-----+-----+------+-------+-----
Indexed | 1.87 | 6.83| 1.69| 1.37 | 0.19 | 5.46
-----------+------+-----+-----+------+-------+-----
Direct | 0.59 | 4.09| 0.60| 0.46 | 0.01 | 3.64
---------------------------------------------------
Not to say that the prebuffered file yields no improvement in reading
performance, but in real time, it is definitely not 3 or even 2 orders of
magnitude. Heck, not even 1.
The most sizeable improvement observed in the speed of direct reads is
undoubtedly owing to the fact that with SASFILE, all file pages are
already preloaded to the buffer. Without SASFILE, if an observation is
requested and it is not in the currently buffered page, it must be
unbuffered, and the page containing the observation must be buffered in,
while when the file is prebuffered, this obviously is not necessary and
does not happen. The observed performance differences are mainly owing to
the fact that same pages get buffered repeatedly. The proof is in the
sequential read, where each page is buffered only once, and hence the
overall speed difference is negligible. By the same token, if pages are
read in order, for instance, as
do key = 1 to n by 10 ;
set halfgig key = key nobs = n ;
end ;
do ptr = 1 to n by 10 ;
set halfgig point = ptr nobs = n ;
end ;
the SASFILE performance improvements related to the indexed/random reads
dwindles to the almost why-bother level.
Kind regards
----------------
Paul M. Dorfman
Jacksonville, FL
----------------
On Wed, 12 Oct 2005 04:49:20 -0400, ben.powell@CLA.CO.UK wrote:
>You could try loading the dataset into memory using the SASFILE statement.
>Since you have 2GB ram a 500MB file should not be a problem. If you do
>benchmark this post back to the list!
>
>HTH.
>
>On Tue, 11 Oct 2005 14:45:55 -0700, docsms@gmail.com <docsms@GMAIL.COM>
wrote:
>
>>Hello All,
>>
>>I am using Lap Top (Centrino 1.8G) and recently upgrade RAM to 2G.
>>
>>I am wondering how to set up SAS to maximize the speed. I have to do
>>lots of sorting with huge dataset (500Mb on average).
>>
>>Plese give me specific ways to set up SAS system.
>>
>>Thank you all in advance.