LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (July 2004, week 3)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Sat, 17 Jul 2004 02:53:09 -0400
Reply-To:   sashole@bellsouth.net
Sender:   "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:   "Paul M. Dorfman" <sashole@BELLSOUTH.NET>
Organization:   Sashole of Florida
Subject:   Re: a very crude macro on calculating factorial..
Comments:   To: Dale McLerran <stringplayer_2@YAHOO.COM>
In-Reply-To:   <20040716214057.81565.qmail@web21125.mail.yahoo.com>
Content-Type:   text/plain; charset="us-ascii"

Dale,

Calling 1E7 a billion must have been the end of a pretty strenuous day :-), but I am grateful, anyway, since if it WERE the billion, my patience waiting for those tests to finish would be stretched to a personally unbearable limit. More to the point, I am not sure you have me convinced SAS are using a sort of table look-up. Or, if they indeed do, then they have not chosen the right one, the is the naked truth.

Fancy that SAS used a table lookup to implement FACT. Its argument can realistically range from 0 to 170, which means that the most straightforward way of implementing the lookup is key-indexed search, right? No one in the right mind would implement such search as a TREE, or even as a general-purpose hash (for key-indexed search is by far the fastest hash). First, you prepare a lookup table:

data fact ; n = 0 ; fact = 1 ; output ; do n = 1 to 170 ; fact = fact (n) ; output ; end ; run ;

Then, when you compute a factorial, you load the key-indexed table F and search it (while the speed of search is pure O[1]):

204 %let k = 100 ; 205 data _null_ ; 206 array f [0 : 170] _temporary_ ; 207 do until (z) ; 208 set fact end = z ; 209 f [ n ] = fact ; 210 end ; 211 do repeat = 1 to 1e7 ; 212 fact = f [&k] ; 213 end ; 214 put fact= ; 215 run ;

fact=9.332622E157 NOTE: There were 170 observations read from the data set WORK.FACT. NOTE: DATA statement used (Total process time): real time 0.25 seconds cpu time 0.25 seconds

That is how fast (at least) table look-up *should* run. Of course, if it were coded in the underlying software, I would expect it to run much faster. Now let us look at the times returned by the FACT function:

216 data _null_ ; 217 do repeat = 1 to 1e7 ; 218 fact = fact (&k) ; 219 end ; 220 put fact= ; 221 run ;

fact=9.332622E157 NOTE: DATA statement used (Total process time): real time 5.45 seconds cpu time 5.45 seconds

There are several ways ot explain the results:

1) SAS has FACT() do all the multiplications every time FACT(N) is to be computed, only, having been coded in the underlying software, it does it much faster that the same method coded in the SAS Language, which really shows at K>10.

2) SAS have chosen to use a maladroit table look-up algorithm. Suppose they use an algorithm similar to a format:

295 data cntlin ; 296 retain fmtname 'fact' type 'n' ; 297 do start = 1 to 170 ; 298 label = put (fact (start), rb8.) ; 299 output ; 300 end ; 301 hlo = 'o' ; 302 label = '.' ; 303 output ; 304 run ;

305 proc format cntlin = cntlin ; 306 run ;

307 %let k = 100 ; 308 data _null_ ; 309 do repeat = 1 to 1e7 ; 310 fact = input (put (&k, fact.), rb8.) ; 311 end ; 312 put fact= ; 313 run ; fact=9.332622E157 NOTE: DATA statement used (Total process time): real time 5.71 seconds cpu time 5.70 seconds

314 data _null_ ; 315 do repeat = 1 to 1e7 ; 316 fact = fact (&k) ; 317 end ; 318 put fact= ; 319 run ; fact=9.332622E157 NOTE: DATA statement used (Total process time): real time 5.45 seconds cpu time 5.45 seconds -------------------------------------

Well, judging from the similarity of the times, they might indeed have chosen to use a tree! If so, it definitely was not a decision made after having washed down a sturgeon steak with decent well-fermented grape juice (maybe M&Ms and Diet Coke instead). But suppose they went smart and used a hash table instead:

353 data _null_ ; 354 dcl hash f (dataset: 'fact') ; 355 f.definekey ('n') ; 356 f.definedata ('fact') ; 357 f.definedone () ; 358 retain n &k ; 359 do repeat = 1 to 1e7 ; 360 f.find () ; 361 end ; 362 put fact= ; 363 run ; fact=9.332622E157 NOTE: DATA statement used (Total process time): real time 5.14 seconds cpu time 5.10 seconds

Well, a bit better, but compared to the 0.25 seconds of the key-indexed search, it is still starkly maladroit, is it not? As a matter of fact, one could compute all factorials up to an almost impossibly large number stored in a long text field, say FACT(10000), store them in a key-indexed table by N, and have exactly the same return times for EXACT factorials the FACT() function cannot even dream of. By the way, I should note with not a small degree of amazement, those having coded GAMMA() did a commendable job compared to those having coded FACT():

408 data _null_ ; 409 do repeat = 1 to 1e7 ; 410 fact = gamma (&k+1) ; 411 end ; 412 put fact= ; 413 run ; fact=9.332622E157 NOTE: DATA statement used (Total process time): real time 4.56 seconds cpu time 4.54 seconds

414 data _null_ ; 415 do repeat = 1 to 1e7 ; 416 fact = fact (&k) ; 417 end ; 418 put fact= ; 419 run ; fact=9.332622E157 NOTE: DATA statement used (Total process time): real time 5.45 seconds cpu time 5.45 seconds

Oh well :-).

Kind regards, ---------------- Paul M. Dorfman Jacksonville, FL ----------------

> -----Original Message----- > From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On > Behalf Of Dale McLerran > Sent: Friday, July 16, 2004 5:41 PM > To: SAS-L@LISTSERV.UGA.EDU > Subject: Re: a very crude macro on calculating factorial.. > > --- "Richard A. DeVenezia" <radevenz@IX.NETCOM.COM> wrote: > > Paul M. Dorfman wrote: > > > %macro fact (n= , out=) ; > > > data fact ; > > > > Type in haste no doubt. > > data fact ; > > should be > > data &out ; > > > > > > > > do n = 1 to &n ; > > > fact = fact (n) ; > > > output ; > > > end ; > > > > > > Moreover, it is easy to see that computer-wise, this code is > > actually > > > less efficient, for each time FACT() is called, its guts have to > > > recalculate the entire product instead of just multiplying by the > > > next N. Of course, the difference is negligible and virtually > > > impossible to notice, unless just for the sake of curiosity we > > > recompute a factorial enough times (from N = 1 to 170, 1 million > > > times below): > > > > > > 04 %let n = 170 ; > > > 05 data _null_ ; > > > 06 do repeat = 1 to 1e6 ; > > > 07 fact = 1 ; > > > 08 do n = 1 to &n ; > > > 09 fact = fact * n ; > > > 10 end ; > > > 11 end ; > > > 12 run ; > > > NOTE: DATA statement used (Total process time): > > > real time 4.87 seconds > > > cpu time 4.87 seconds > > > > > > 13 data _null_ ; > > > 14 do repeat = 1 to 1e6 ; > > > 15 do n = 1 to &n ; > > > 16 fact = fact (n) ; > > > 17 end ; > > > 18 end ; > > > 19 run ; > > > NOTE: DATA statement used (Total process time): > > > real time 1:33.79 > > > cpu time 1:33.54 > > > > > > > If I had a hat, I would have to tip it to Paul. > > > > FACT is a function that, in Windows, accepts only 172 distinct > > inputs: The > > integers [0...171]. > > Q for SI: Why does the function do an internal looping calculation > > instead of a table lookup to return the factorial value ? > > Last I checked, factorial(I) is a constant over time :) The 172 > > constants should be computed at SAS system build time. > > > > Good point. A table lookup would seem very appropriate for > the FACT function. > > > I am too impatient to wait out a mere 1.5 minutes, but here is a > > visual you can use to examine the O() of the looping inside FACT(). > > Increase i & N upper limits if you are more patient than I. > Increase > > by step if you are less impatient. > > > > data timing; > > do N = 1 to 40 by 2; > > t0 = datetime(); > > do i = 1 to 5e6; > > fact = fact(N); > > end; > > t1 = datetime(); > > e = t1-t0; > > put n= z2. e= z10.6; > > output; > > end; > > run; > > > > symbol1 i=join v=dot; > > > > proc gplot data=timing; > > plot e*n; > > run; > > > > I expected to see E increasing linearly with N if additional > looping increased execution time. In fact, the plot of E is > flat over N on my system (SAS 8.12 (TS2M0), Windows 2000 > Workstation service pack 3, 1GB RAM, 2.4 Ghz). I added some > additional values of N from 41 to 171 by 10, and even for > large N the execution time for the FACT function was > constant. I take it that you observe behavior in which E > increases with N. Is that correct? > > I also ran Paul's code with macro variable N=10 and N=100. > Note that Paul's code computes every factorial from 1! to > &N!, so that the number of factorial computations when &N=100 > is 10 times the number of factorial computations when &N=10. > Paul's sequential algorithm for computing the factorials > should scale by a factor of 10. If the FACT function employs > looping from 1 to K, when computing K!, then the FACT > function should not scale by a factor of 10. Rather, the > ratio of execution times should be approximately > 112 times longer to compute 1! through 100! than it would be > to compute 1! through 10! since there would be 5050 loops for > N=100 and 45 loops for N=10. On the other hand, if the FACT > function employs a table lookup, that should scale > (approximately) by 10 since we have to look up 10 times as > many values. > > My log is shown below. Regardless of whether we employ the > FACT function or code a factorial algorithm, the time that it > takes to execute 1!, 2!, ..., 100! 1E6 times is ten times the > length of time that is required to compute 1!, 2!, ..., 10!. > > 77 %let n = 10 ; > 78 data _null_ ; > 79 do repeat = 1 to 1e6 ; > 80 fact = 1 ; > 81 do n = 1 to &n ; > 82 fact = fact * n ; > 83 end ; > 84 end ; > 85 run ; > > NOTE: DATA statement used: > real time 0.32 seconds > cpu time 0.32 seconds > > > 86 > 87 data _null_ ; > 88 do repeat = 1 to 1e6 ; > 89 do n = 1 to &n ; > 90 fact = fact (n) ; > 91 end ; > 92 end ; > 93 run ; > > NOTE: DATA statement used: > real time 6.51 seconds > cpu time 6.35 seconds > > > 94 > 95 > 96 %let n = 100 ; > 97 data _null_ ; > 98 do repeat = 1 to 1e6 ; > 99 fact = 1 ; > 100 do n = 1 to &n ; > 101 fact = fact * n ; > 102 end ; > 103 end ; > 104 run ; > > NOTE: DATA statement used: > real time 2.90 seconds > cpu time 2.82 seconds > > > 105 > 106 data _null_ ; > 107 do repeat = 1 to 1e6 ; > 108 do n = 1 to &n ; > 109 fact = fact (n) ; > 110 end ; > 111 end ; > 112 run ; > > NOTE: DATA statement used: > real time 1:02.67 > cpu time 1:01.20 > > > The factorial algorithm which Paul coded is faster than the > FACT function for computing every factorial value from 1! > through N!, no doubt about that. However, sequential > processing as Paul has coded it should be fast compared to a > table lookup if we have to compute every term from 1! to N!. > With a table lookup, we have to traverse the tree for every > new value passed to the FACT function. We can't take > advantage of knowlege of the value of (K-1)! when computing > K! as is done in Paul's factorial algorithm. > > Below is one more proof that the FACT function probably is > employing a lookup table, rather than looping from 1 to K > when computing K!. > The code is a variant of the code which Paul presents. We > fix K to either 10 or 100 and compute K! one billion times. > For K=10, Paul's sequential algorithm is faster than the FACT > function. > However, for K=100, the FACT function is faster than the > sequential algorithm. Note that the sequential approach is > exactly the same as Paul presented before. We have to > compute 1!, 2!, ..., (K-1)! > in order to compute K!. When we employ the FACT function, > the time required to compute 100! a billion times is in fact > less than the time required to compute 10! a billion times. > Apparently, it takes less time to traverse the lookup table > to find the factorial value for K=100 than it does to find > the factorial value for K=10. > > 170 %let k = 10 ; > 171 data _null_ ; > 172 do repeat = 1 to 1e7 ; > 173 fact = 1 ; > 174 do k = 1 to &k ; > 175 fact = fact * k ; > 176 end ; > 177 end ; > 178 run ; > > NOTE: DATA statement used: > real time 3.48 seconds > cpu time 3.37 seconds > > > 179 > 180 data _null_ ; > 181 do repeat = 1 to 1e7 ; > 182 fact = fact (&k) ; > 183 end ; > 184 run ; > > NOTE: DATA statement used: > real time 7.10 seconds > cpu time 6.92 seconds > > > 185 > 186 > 187 %let k = 100 ; > 188 data _null_ ; > 189 do repeat = 1 to 1e7 ; > 190 fact = 1 ; > 191 do k = 1 to &k ; > 192 fact = fact * k ; > 193 end ; > 194 end ; > 195 run ; > > NOTE: DATA statement used: > real time 29.32 seconds > cpu time 28.29 seconds > > > 196 > 197 data _null_ ; > 198 do repeat = 1 to 1e7 ; > 199 fact = fact (&k) ; > 200 end ; > 201 run ; > > NOTE: DATA statement used: > real time 6.18 seconds > cpu time 5.93 seconds > > > > Dale > > ===== > --------------------------------------- > Dale McLerran > Fred Hutchinson Cancer Research Center > mailto: dmclerra@fhcrc.org > Ph: (206) 667-2926 > Fax: (206) 667-5977 > --------------------------------------- > > > > __________________________________ > Do you Yahoo!? > Vote for the stars of Yahoo!'s next ad campaign! > http://advision.webevents.yahoo.com/yahoo/votelifeengine/ >


Back to: Top of message | Previous page | Main SAS-L page