LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (October 2005, week 2)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Wed, 12 Oct 2005 10:16:47 -0600
Reply-To:     Alan Churchill <SASL001@SAVIAN.NET>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Alan Churchill <SASL001@SAVIAN.NET>
Subject:      Re: sas Performance Enhancement
Comments: To: "Dorfman, Paul" <paul.dorfman@FCSO.COM>
In-Reply-To:  <200510121602.j9CF6VJ1013766@malibu.cc.uga.edu>
Content-Type: text/plain; charset="us-ascii"

Paul,

Great analysis. I had assumed (that all too often dangerous thing) that SASFILE would yield lots of improvements but the proof is in the pudding.

Really good work.

Alan Churchill Savian "Bridging SAS and Microsoft Technologies" www.savian.net

-----Original Message----- From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of Dorfman, Paul Sent: Wednesday, October 12, 2005 10:03 AM To: SAS-L@LISTSERV.UGA.EDU Subject: Re: sas Performance Enhancement

Ben,

Since its inception, SASFILE has been benchmarked at least by one person I know of. Originally (and understandably) I was highly enthused when SASFILE appeared in V8, but minimal testing quickly showed that, with notable exceptions (see below) it has been a sort of disappointment, mainly because, following the common wisdom of "memory is 100 faster than disk", I had anticipated much more sizeable performance improvements.

To wit, say we have a .5g indexed SAS file:

216 data halfgig (index = (key)) ; 217 array data [49] ; 218 do key = 1 to 2 ** 30 / 2 / 8 / 50 ; 219 output ; 220 end ; 221 run ;

NOTE: The data set USER.HALFGIG has 1342177 observations and 50 variables. NOTE: Simple index key has been defined. NOTE: DATA statement used (Total process time): real time 12.34 seconds user cpu time 3.04 seconds system cpu time 7.03 seconds Memory 82118k

Let us see how fast SAS reads it (a) sequentially, (b) via the index, (c) by osbervation number:

225 data _null_ ; 226 set halfgig ; 227 run ;

NOTE: There were 1342177 observations read from the data set USER.HALFGIG. NOTE: DATA statement used (Total process time): real time 1.36 seconds user cpu time 0.75 seconds system cpu time 0.63 seconds Memory 493k

229 data _null_ ; 230 do _n_ = 1 to 1e5 ; 231 key = ceil (ranuni (1) * n) ; 232 set halfgig key = key nobs = n ; 233 end ; 234 stop ; 235 run ;

NOTE: DATA statement used (Total process time): real time 6.83 seconds user cpu time 1.37 seconds system cpu time 5.46 seconds Memory 509k

237 data _null_ ; 238 do _n_ = 1 to 1e5 ; 239 ptr = ceil (ranuni (1) * n) ; 240 set halfgig point = ptr nobs = n ; 241 end ; 242 stop ; 243 run ;

NOTE: DATA statement used (Total process time): real time 4.09 seconds user cpu time 0.46 seconds system cpu time 3.64 seconds Memory 509k

Ok? Now let us open the file using SASFILE:

245 sasfile halfgig open ; NOTE: The file USER.HALFGIG.DATA has been opened by the SASFILE statement. 246 247 data _null_ ; 248 set halfgig ; 249 run ;

NOTE: There were 1342177 observations read from the data set USER.HALFGIG. NOTE: DATA statement used (Total process time): real time 4.75 seconds user cpu time 0.82 seconds system cpu time 3.93 seconds Memory 493k

Note that most of the time consumed by this step is spent loading the file into memory, which is evident both from comparing the run-time with the earlier read (without SASFILE) and from:

251 data _null_ ; 252 set halfgig ; 253 run ;

NOTE: There were 1342177 observations read from the data set USER.HALFGIG. NOTE: DATA statement used (Total process time): real time 1.11 seconds user cpu time 1.12 seconds system cpu time 0.01 seconds Memory 493k

Interestingly, real memory usage is NOT reported in the log by either step, nor is it reported if one elects to load the file beforehand using:

SASFILE HALFGIG LOAD ;

which you kind of know is executing because there are several seconds to sit there twiddling thumbs before the next step kicks off. Now testing the speeds of indexed and direct reads against the SASFILEd file reveals:

255 data _null_ ; 256 do _n_ = 1 to 1e5 ; 257 key = ceil (ranuni (1) * n) ; 258 set halfgig key = key nobs = n ; 259 end ; 260 stop ; 261 run ;

NOTE: DATA statement used (Total process time): real time 1.87 seconds user cpu time 1.69 seconds system cpu time 0.19 seconds Memory 501k

263 data _null_ ; 264 do _n_ = 1 to 1e5 ; 265 ptr = ceil (ranuni (1) * n) ; 266 set halfgig point = ptr nobs = n ; 267 end ; 268 stop ; 269 run ;

NOTE: DATA statement used (Total process time): real time 0.59 seconds user cpu time 0.60 seconds system cpu time 0.00 seconds Memory 501k

271 sasfile halfgig close ; NOTE: The file USER.HALFGIG.DATA has been closed by the SASFILE statement.

So, what is the net result? This (S = with SASFILE, N = without SASFILE):

Time, sec | Real | User CPU | System CPU -----------+------------+------------+------------- READ | S | N | S | N | S | N -----------+------+-----+-----+------+-------+----- Sequential | 1.11 | 1.36| 1.12| 0.75 | 0.01 | 0.63 -----------+------+-----+-----+------+-------+----- Indexed | 1.87 | 6.83| 1.69| 1.37 | 0.19 | 5.46 -----------+------+-----+-----+------+-------+----- Direct | 0.59 | 4.09| 0.60| 0.46 | 0.01 | 3.64 ---------------------------------------------------

Not to say that the prebuffered file yields no improvement in reading performance, but in real time, it is definitely not 3 or even 2 orders of magnitude. Heck, not even 1.

The most sizeable improvement observed in the speed of direct reads is undoubtedly owing to the fact that with SASFILE, all file pages are already preloaded to the buffer. Without SASFILE, if an observation is requested and it is not in the currently buffered page, it must be unbuffered, and the page containing the observation must be buffered in, while when the file is prebuffered, this obviously is not necessary and does not happen. The observed performance differences are mainly owing to the fact that same pages get buffered repeatedly. The proof is in the sequential read, where each page is buffered only once, and hence the overall speed difference is negligible. By the same token, if pages are read in order, for instance, as

do key = 1 to n by 10 ; set halfgig key = key nobs = n ; end ; do ptr = 1 to n by 10 ; set halfgig point = ptr nobs = n ; end ;

the SASFILE performance improvements related to the indexed/random reads dwindles to the almost why-bother level.

Kind regards ---------------- Paul M. Dorfman Jacksonville, FL ----------------

On Wed, 12 Oct 2005 04:49:20 -0400, ben.powell@CLA.CO.UK wrote:

>You could try loading the dataset into memory using the SASFILE statement. >Since you have 2GB ram a 500MB file should not be a problem. If you do >benchmark this post back to the list! > >HTH. > >On Tue, 11 Oct 2005 14:45:55 -0700, docsms@gmail.com <docsms@GMAIL.COM> wrote: > >>Hello All, >> >>I am using Lap Top (Centrino 1.8G) and recently upgrade RAM to 2G. >> >>I am wondering how to set up SAS to maximize the speed. I have to do >>lots of sorting with huge dataset (500Mb on average). >> >>Plese give me specific ways to set up SAS system. >> >>Thank you all in advance.


Back to: Top of message | Previous page | Main SAS-L page