Date: Sat, 17 Jul 2004 02:53:09 -0400 sashole@bellsouth.net "SAS(r) Discussion" "Paul M. Dorfman" Sashole of Florida Re: a very crude macro on calculating factorial.. To: Dale McLerran <20040716214057.81565.qmail@web21125.mail.yahoo.com> text/plain; charset="us-ascii"

Dale,

Calling 1E7 a billion must have been the end of a pretty strenuous day :-), but I am grateful, anyway, since if it WERE the billion, my patience waiting for those tests to finish would be stretched to a personally unbearable limit. More to the point, I am not sure you have me convinced SAS are using a sort of table look-up. Or, if they indeed do, then they have not chosen the right one, the is the naked truth.

Fancy that SAS used a table lookup to implement FACT. Its argument can realistically range from 0 to 170, which means that the most straightforward way of implementing the lookup is key-indexed search, right? No one in the right mind would implement such search as a TREE, or even as a general-purpose hash (for key-indexed search is by far the fastest hash). First, you prepare a lookup table:

data fact ; n = 0 ; fact = 1 ; output ; do n = 1 to 170 ; fact = fact (n) ; output ; end ; run ;

Then, when you compute a factorial, you load the key-indexed table F and search it (while the speed of search is pure O[1]):

204 %let k = 100 ; 205 data _null_ ; 206 array f [0 : 170] _temporary_ ; 207 do until (z) ; 208 set fact end = z ; 209 f [ n ] = fact ; 210 end ; 211 do repeat = 1 to 1e7 ; 212 fact = f [&k] ; 213 end ; 214 put fact= ; 215 run ;

fact=9.332622E157 NOTE: There were 170 observations read from the data set WORK.FACT. NOTE: DATA statement used (Total process time): real time 0.25 seconds cpu time 0.25 seconds

That is how fast (at least) table look-up *should* run. Of course, if it were coded in the underlying software, I would expect it to run much faster. Now let us look at the times returned by the FACT function:

216 data _null_ ; 217 do repeat = 1 to 1e7 ; 218 fact = fact (&k) ; 219 end ; 220 put fact= ; 221 run ;

fact=9.332622E157 NOTE: DATA statement used (Total process time): real time 5.45 seconds cpu time 5.45 seconds

There are several ways ot explain the results:

1) SAS has FACT() do all the multiplications every time FACT(N) is to be computed, only, having been coded in the underlying software, it does it much faster that the same method coded in the SAS Language, which really shows at K>10.

2) SAS have chosen to use a maladroit table look-up algorithm. Suppose they use an algorithm similar to a format:

295 data cntlin ; 296 retain fmtname 'fact' type 'n' ; 297 do start = 1 to 170 ; 298 label = put (fact (start), rb8.) ; 299 output ; 300 end ; 301 hlo = 'o' ; 302 label = '.' ; 303 output ; 304 run ;

305 proc format cntlin = cntlin ; 306 run ;

307 %let k = 100 ; 308 data _null_ ; 309 do repeat = 1 to 1e7 ; 310 fact = input (put (&k, fact.), rb8.) ; 311 end ; 312 put fact= ; 313 run ; fact=9.332622E157 NOTE: DATA statement used (Total process time): real time 5.71 seconds cpu time 5.70 seconds

314 data _null_ ; 315 do repeat = 1 to 1e7 ; 316 fact = fact (&k) ; 317 end ; 318 put fact= ; 319 run ; fact=9.332622E157 NOTE: DATA statement used (Total process time): real time 5.45 seconds cpu time 5.45 seconds -------------------------------------

Well, judging from the similarity of the times, they might indeed have chosen to use a tree! If so, it definitely was not a decision made after having washed down a sturgeon steak with decent well-fermented grape juice (maybe M&Ms and Diet Coke instead). But suppose they went smart and used a hash table instead:

353 data _null_ ; 354 dcl hash f (dataset: 'fact') ; 355 f.definekey ('n') ; 356 f.definedata ('fact') ; 357 f.definedone () ; 358 retain n &k ; 359 do repeat = 1 to 1e7 ; 360 f.find () ; 361 end ; 362 put fact= ; 363 run ; fact=9.332622E157 NOTE: DATA statement used (Total process time): real time 5.14 seconds cpu time 5.10 seconds

Well, a bit better, but compared to the 0.25 seconds of the key-indexed search, it is still starkly maladroit, is it not? As a matter of fact, one could compute all factorials up to an almost impossibly large number stored in a long text field, say FACT(10000), store them in a key-indexed table by N, and have exactly the same return times for EXACT factorials the FACT() function cannot even dream of. By the way, I should note with not a small degree of amazement, those having coded GAMMA() did a commendable job compared to those having coded FACT():

408 data _null_ ; 409 do repeat = 1 to 1e7 ; 410 fact = gamma (&k+1) ; 411 end ; 412 put fact= ; 413 run ; fact=9.332622E157 NOTE: DATA statement used (Total process time): real time 4.56 seconds cpu time 4.54 seconds

414 data _null_ ; 415 do repeat = 1 to 1e7 ; 416 fact = fact (&k) ; 417 end ; 418 put fact= ; 419 run ; fact=9.332622E157 NOTE: DATA statement used (Total process time): real time 5.45 seconds cpu time 5.45 seconds

Oh well :-).

Kind regards, ---------------- Paul M. Dorfman Jacksonville, FL ----------------