Date: Wed, 12 Oct 2005 21:51:13 -0700
Reply-To: David L Cassell <davidlcassell@MSN.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: David L Cassell <davidlcassell@MSN.COM>
Subject: Re: sas Performance Enhancement
In-Reply-To: <200510121602.j9CF6VJ3013766@malibu.cc.uga.edu>
Content-Type: text/plain; format=flowed
paul.dorfman@FCSO.COM expertly replied:
>Since its inception, SASFILE has been benchmarked at least by one person I
>know of. Originally (and understandably) I was highly enthused when
>SASFILE appeared in V8, but minimal testing quickly showed that, with
>notable exceptions (see below) it has been a sort of disappointment,
>mainly because, following the common wisdom of "memory is 100 faster than
>disk", I had anticipated much more sizeable performance improvements.
>
>[really useful documentation of times elided by a lazy slob]
>
>Not to say that the prebuffered file yields no improvement in reading
>performance, but in real time, it is definitely not 3 or even 2 orders of
>magnitude. Heck, not even 1.
>
>The most sizeable improvement observed in the speed of direct reads is
>undoubtedly owing to the fact that with SASFILE, all file pages are
>already preloaded to the buffer. Without SASFILE, if an observation is
>requested and it is not in the currently buffered page, it must be
>unbuffered, and the page containing the observation must be buffered in,
>while when the file is prebuffered, this obviously is not necessary and
>does not happen. The observed performance differences are mainly owing to
>the fact that same pages get buffered repeatedly. The proof is in the
>sequential read, where each page is buffered only once, and hence the
>overall speed difference is negligible. By the same token, if pages are
>read in order, for instance, as
>
> do key = 1 to n by 10 ;
> set halfgig key = key nobs = n ;
> end ;
> do ptr = 1 to n by 10 ;
> set halfgig point = ptr nobs = n ;
> end ;
>
>the SASFILE performance improvements related to the indexed/random reads
>dwindles to the almost why-bother level.
Dale McLerran and I have found (Dale deserves most of the credit here) that
SASFILE makes a big difference in time when using PROC SURVEYSELECT to
do bootstrapping. That is:
sasfile targetpopulationfile open;
proc surveyselect data=targetpopulationfile out=bootstrapfile
method=urs
seed=4954734
outhits
reps=1000;
run;
It appears that the proc doesn't cache the data set beforehand, so this
process saves a lot of I/O time.
So I think that the use of SASFILE needs to be restrained. It doesn't solve
all problems, and it can cause headaches if the file is too big. But some
cases,
like your last example and the above, show that there can be some merit in
its use.
David
--
David L. Cassell
mathematical statistician
Design Pathways
3115 NW Norwood Pl.
Corvallis OR 97330
_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today - it's FREE!
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/
|