LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (February 2006, week 4)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Tue, 28 Feb 2006 12:38:53 -0500
Reply-To:     "Dorfman, Paul" <paul.dorfman@FCSO.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         "Dorfman, Paul" <paul.dorfman@FCSO.COM>
Subject:      Re: IN operator and temporary arrays
Comments: To: J S Huang <Jiann-Shiun.Huang@AMERUS.COM>

On Tue, 28 Feb 2006 08:09:34 -0600, Jiann-Shiun Huang <Jiann- Shiun.Huang@AMERUS.COM> wrote (in small part):

> Here are my observations: >(1) Declaring _temporary_ for array elements not to be saved as part of >dataset can save significant amount of both CPU and real time.

Jiann-Shiun,

I bet this discovery did not take you completely by surprise. A couple of points: "not to be saved as part of dataset" is irrelevant to the issue. You can achieve the "not to be saved" effect by dropping unwanted variables belonging to a PDV array, but it does not mean that these:

array a[1000000] ; array a[1000000] _temporary_ ;

will have the same compile time. This happens not because the temporary variables are "not to be saved as part of dataset"but because the compiler does not have to organize them in the PDV and store in the compiler symbol table in the first place. The latter requires upwards of 100 bytes of RAM for each numeric variable; whilst a temporary array item requires almost exactly 8. Elsewhere you write that _temporary_ is an important array "feature". Important, indeed! only I would not call it feature, for in a DATA step, a PDV array and _temporary_ array are two quite distinct data structures, their only similarity being their common ability to be referenced by a subscript. In particular, all elements of a _temporary_ array are always contiguous in physical memory, while for PDV arrays, it may or may not be the case - for example, when the items of PDV arrays belong to more than one array, or the same PDV array lists the same variable more than once, or the compiler saw the array variables before the array itself.

>(2) Initialization of an array is costly in CPU time and other resource.

Not if it is necessary only once, either at compile or run time (say at _n_=1). A real resource hog rears its head when a _temporary_ array needs to be initialized to anything or a non-retained PDV array needs to be initialized to something other than missing values a great many times in teh same step (e.g. before each observation or by-group).

Here is a quiz: For run-time initializing of array A[4000] to consecutive natural numbers, devise a method about 100000-fold more efficient (time- wise) than

do i = 1 to dim (a) ; a[i] = i ; end ;

Test the relative speed by repeating both the above and the method you have come up with, say 100,000 times.

>Last time when I tried to run up to 5e6 array size, I received warning >about virtual memory.

If your machine starts swapping a 40 megabytes worth of an array to the virtual memory, you either have your RAM hogged by something else or... you need more RAM.

Kind regards ------------ Paul Dorfman Jax, FL ------------ > > Any other thoughts? >>>> "Richard A. DeVenezia" <rdevenezia@WILDBLUE.NET> 2/27/2006 8:51:00 >PM >>> >Tip: IN has operated with temporary arrays since v9 (thanks Joe). > >data _null_; > array XS[10] _temporary_ ( 1 2 3 5 6 7 10 11 12 15); > > do _n_ = 1 to 1e6; > xs[_n_]=_n_; > end; > > if 1 in XS then put 'Have 500000'; > > a = 1; > if a in XS then put 'Have ' a=; >run; > >How does it scale? Haven't benchmarked anything, but 1e6 elements >doesn't >cause any problems. Note: The 1,000,000 element array initialization >seems >to have a severe impact on time. Not sure why a explicit loop would be >so >much faster than an implicit initialization loop > >data _null_; > array XS[1000000] _temporary_ (1:1000000); * implicit loop for >initializer; > > found_1 = (1 in XS); > put found_1=; > > a = 1000000; > found_a = (a in XS); > put found_a= a=; >run; >--- >found_1=1 >found_a=1 a=1000000 >NOTE: DATA statement used (Total process time): > real time 3.82 seconds > cpu time 3.78 seconds > > > >data _null_; > array XS[1000000] _temporary_; > > * explicit array fill; > do _n_ = 1 to dim(XS); > xs[_n_]=_n_; > end; > > found_1 = (1 in XS); > put found_1=; > > a = 1000000; > found_a = (a in XS); > put found_a= a=; >run; >--- >found_1=1 >found_a=1 a=1000000 >NOTE: DATA statement used (Total process time): > real time 0.07 seconds > cpu time 0.07 seconds > > >Not knowing the mechanics or any benchmarks, it would be hard to say if >one >should use IN instead of Hash when dealing with lookup keys that are >integers. > > >-- >Richard A. DeVenezia >http://www.devenezia.com/


Back to: Top of message | Previous page | Main SAS-L page