LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (December 2002, week 1)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Tue, 3 Dec 2002 18:53:59 +0000
Reply-To:     sashole@bellsouth.net
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Paul Dorfman <paul_dorfman@HOTMAIL.COM>
Subject:      Re: Are array's 10 times slower?
Comments: To: DBoylan@HRBLOCK.COM
Content-Type: text/plain; format=flowed

----Original Message Follows---- From: "Boylan, David" <DBoylan@HRBLOCK.COM> >I re-wrote a SAS program using array references to make it "look" >cleaner >and the processing time increased 10 fold (from 3 minutes to 30 >minutes!). >The program I modified is a bit complex and involved, so >I've generated >two sample programs that approximate the issues I'm >having. The logs >demonstrate the increase in processing time. > >/*******************************************************/ >120 >121 data b; >122 >123 do i=1 to 1e7; >124 var1=1; 125 var2=1; ><snip> >153 var30=1; >154 >155 if mod(i,100) = 0 then output; >156 end; >157 >158 run; >NOTE: The data set WORK.B has 100000 observations and 31 variables. >NOTE: DATA statement used: > real time 2.83 seconds > cpu time 2.77 seconds >159 >160 >161 data a; >162 >163 array a1 (30) var1-var30; >164 >165 do i=1 to 1e7; >166 do j=1 to 30; >167 a1(j) = 1; >168 end; >169 if mod(i,100) = 0 then output; >170 end; >171 >172 run; > >NOTE: The data set WORK.A has 100000 observations and 32 variables. >NOTE: DATA statement used: > real time 33.21 seconds > cpu time 32.95 seconds > >/*******************************************************/ > >Are these results to be expected? Please advise.

David,

The answer is "yes, they are". What is most surprising to me, though, is that it can be surprising. Even SAS guru Ed Heaton has apparently fallen prey to this "surprise" if he writes he is *discovering* that array processing is associated with overhead.

Now why what you have observed is natural? Just let us see what the computer has to do in the two cases. When you assign 1 to each variable directly, all SAS executes for each iteration of the outer loop is 30 assignments. When you assign 1 to each array element, SAS does the same PLUS 30 of each of the the following operations:

1) increment J by 1 J 2) compare J to 30 and decide if J > 30

Each of these two are as costly or more than costly than the assignment itself, hence the result. In fact, I recall that in one of the old PL/I manuals almost this very example was used to explain how the PL/I macro processor could be used to enhance performance. There is no reason why the same concept should not be used in SAS:

%macro initialize (var=var, initval=0, n=0) / stmt ; %do j = 1 %to &n ; &var&j = &initval ; %end ; %mend initialize ;

option implmac mprint ;

data _null_ ; array var (30) ; do j = 1 to 1e7 ; initialize var=var initval=1 n=30 ; end ; run ;

This *both* gets rid of the run-time array overhead and eliminates typo-prone wall-paper. At the same time it demonstrates that often-heard blank statments like "macro is much slower than the Data step" is utter nonsense because the two facilities simply do different things. Macro can be used to generate efficient SAS code, and SAS code can be written inefficiently without any macro. Executing the step above along with

data _null_ ; array var (30) ; do j = 1 to 1e7 ; do i = 1 to 30 ; var (i) = 1 ; end ; end ; run ;

on my machine with V9.0 shows that the former and the latter take 1.4 and 25 seconds respectively, just as expected. As an alternative to the macro one could use

data _null_ ; array var (30) (30*1) ; a1 = addr(var(1)) ; initstr = peekc (a1, 30*8) ; do j = 1 to 1e6 ; call poke (initstr, a1, 240) ; end ; run ;

This takes about 4 seconds to run under the same circumstances, so it is still more than twice as slow as the direct assignment (that will change if favor or POKE if the number of elements is much larger, say, 3000), but surely eliminates some array overhead. Note that above, it is not necessary to use the initialized array and PEEKC to create INITSTR;

array var (30) ; length initstr $240 ; initstr = repeat (put(1, rb8.), 29) ;

will work just as well.

Finally, just as I lashed at the "macro-is-slow" moniker eaarlier, I would like to do the same in the case of arrays. Not so long ago, I was stunned to hear from a *very* experienced and accomplished SAS programmer literally the following: "Arrays are good for programming, but bad for performance".

Huh? Although judging from the example above, one could think it is true, let us think again and not generalize needlessly. A general statement like "Arrays are slow" is another instance of utter nonsense. Everything depends on what for and how they are used. Any algorithm where the cost of internal index calculation is low compared to the work performed on array elements being accessed will benefit from array usage. Binary search, where the index has to be computed only log2(N) times to find or reject a search key, is a good example.

For instance, consider a variable list v1=1, v2=2, ... v1000=1000, such as

300 data v ; 301 array v (1000) ; 302 do _n_ = 1 to 1000 ; 303 v (_n_) = _n_ ; 304 end ; 305 run ; NOTE: The data set WORK.V has 1 observations and 1000 variables

and imagine that we need to find if the search key k=0 is in the list (it is not). It can be done without any array, for example:

306 data _null_ ; 307 retain k 0 ; 308 set v ; 309 do i = 1 to 1e5 ; 310 found = 0 ; 311 select ; 312 %macro search ; 313 %do j = 1 %to 1000 ; 314 when (v&j = k) found = 1 ; 315 %end ; 316 %mend search ; 317 %search 318 otherwise ; 319 end ; 320 end ; 321 put found= ; 322 run ; found=0 NOTE: There were 1 observations read from the data set WORK.V. NOTE: DATA statement used (Total process time): real time 4.37 seconds cpu time 4.37 seconds

Or it can be done by placing the variable list in an array and using the binary search:

323 data _null_ ; 324 retain k 0 ; 325 set v ; 326 array v (1000) ; 327 do i = 1 to 1e5 ; 328 found = 0 ; 329 l = 1 ; 330 h = 1000 ; 331 do until (l > h or found) ; 332 m = floor ( (l+h) * .5) ; 333 if k < v(m) then h = m - 1 ; 334 else if k > v(m) then l = m + 1 ; 335 else found = 1 ; 336 end ; 337 end ; 338 put found= ; 339 run ; found=0 NOTE: There were 1 observations read from the data set WORK.V. NOTE: DATA statement used (Total process time): real time 0.26 seconds cpu time 0.26 seconds

Does the array look slow now? Moreover, if we reorganize the entire search concept and trade memory for speed in the key-indexed search, then the step

340 data _null_ ; 341 retain k 0 ; 342 set v ; 343 array v ( 1 : 1000) ; 344 array kx (-10000 : +10000) _temporary_ ; 345 do i = 1 to 1000 ; 346 kx (i) = v (i) ; 347 end ; 348 do i = 1 to 1e5 ; 349 found = 0 ; 350 if kx (k) then found = 1 ; 351 end ; 352 put found= ; 353 run ; found=0 NOTE: There were 1 observations read from the data set WORK.V. NOTE: DATA statement used (Total process time): real time 0.03 seconds cpu time 0.03 seconds

is not only the simplest of all, but also runs at 10 times the speed that can be achieved without array usage. If instead of looking for the k=0 we look for the existing key, for example, k=301, the three steps run at 1.5, 0.3, and 0.03 seconds, respectively, so the performance ratio basically remains the same.

The moral: Before making the conclusion that a knife is dull, one should carefully examine which side of the blade is used.

Kind regards, ------------------- Paul M. Dorfman Jacksonville, FL -------------------

_________________________________________________________________ The new MSN 8: smart spam protection and 2 months FREE* http://join.msn.com/?page=features/junkmail


Back to: Top of message | Previous page | Main SAS-L page