Date: Fri, 16 Jul 2004 14:40:57 -0700 Dale McLerran "SAS(r) Discussion" Dale McLerran Re: a very crude macro on calculating factorial.. To: "Richard A. DeVenezia" <2lqmi7Ffrj9pU1@uni-berlin.de> text/plain; charset=us-ascii

--- "Richard A. DeVenezia" <radevenz@IX.NETCOM.COM> wrote: > Paul M. Dorfman wrote: > > %macro fact (n= , out=) ; > > data fact ; > > Type in haste no doubt. > data fact ; > should be > data &out ; > > > > > do n = 1 to &n ; > > fact = fact (n) ; > > output ; > > end ; > > > > Moreover, it is easy to see that computer-wise, this code is > actually > > less efficient, for each time FACT() is called, its guts have to > > recalculate the entire product instead of just multiplying by the > > next N. Of course, the difference is negligible and virtually > > impossible to notice, unless just for the sake of curiosity we > > recompute a factorial enough times (from N = 1 to 170, 1 million > > times below): > > > > 04 %let n = 170 ; > > 05 data _null_ ; > > 06 do repeat = 1 to 1e6 ; > > 07 fact = 1 ; > > 08 do n = 1 to &n ; > > 09 fact = fact * n ; > > 10 end ; > > 11 end ; > > 12 run ; > > NOTE: DATA statement used (Total process time): > > real time 4.87 seconds > > cpu time 4.87 seconds > > > > 13 data _null_ ; > > 14 do repeat = 1 to 1e6 ; > > 15 do n = 1 to &n ; > > 16 fact = fact (n) ; > > 17 end ; > > 18 end ; > > 19 run ; > > NOTE: DATA statement used (Total process time): > > real time 1:33.79 > > cpu time 1:33.54 > > > > If I had a hat, I would have to tip it to Paul. > > FACT is a function that, in Windows, accepts only 172 distinct > inputs: The > integers [0...171]. > Q for SI: Why does the function do an internal looping calculation > instead > of a table lookup to return the factorial value ? > Last I checked, factorial(I) is a constant over time :) The 172 > constants > should be computed at SAS system build time. >

Good point. A table lookup would seem very appropriate for the FACT function.

> I am too impatient to wait out a mere 1.5 minutes, but here is a > visual you > can use to examine the O() of the looping inside FACT(). > Increase i & N upper limits if you are more patient than I. Increase > by > step if you are less impatient. > > data timing; > do N = 1 to 40 by 2; > t0 = datetime(); > do i = 1 to 5e6; > fact = fact(N); > end; > t1 = datetime(); > e = t1-t0; > put n= z2. e= z10.6; > output; > end; > run; > > symbol1 i=join v=dot; > > proc gplot data=timing; > plot e*n; > run; >

I expected to see E increasing linearly with N if additional looping increased execution time. In fact, the plot of E is flat over N on my system (SAS 8.12 (TS2M0), Windows 2000 Workstation service pack 3, 1GB RAM, 2.4 Ghz). I added some additional values of N from 41 to 171 by 10, and even for large N the execution time for the FACT function was constant. I take it that you observe behavior in which E increases with N. Is that correct?

I also ran Paul's code with macro variable N=10 and N=100. Note that Paul's code computes every factorial from 1! to &N!, so that the number of factorial computations when &N=100 is 10 times the number of factorial computations when &N=10. Paul's sequential algorithm for computing the factorials should scale by a factor of 10. If the FACT function employs looping from 1 to K, when computing K!, then the FACT function should not scale by a factor of 10. Rather, the ratio of execution times should be approximately 112 times longer to compute 1! through 100! than it would be to compute 1! through 10! since there would be 5050 loops for N=100 and 45 loops for N=10. On the other hand, if the FACT function employs a table lookup, that should scale (approximately) by 10 since we have to look up 10 times as many values.

My log is shown below. Regardless of whether we employ the FACT function or code a factorial algorithm, the time that it takes to execute 1!, 2!, ..., 100! 1E6 times is ten times the length of time that is required to compute 1!, 2!, ..., 10!.

77 %let n = 10 ; 78 data _null_ ; 79 do repeat = 1 to 1e6 ; 80 fact = 1 ; 81 do n = 1 to &n ; 82 fact = fact * n ; 83 end ; 84 end ; 85 run ;

NOTE: DATA statement used: real time 0.32 seconds cpu time 0.32 seconds

86 87 data _null_ ; 88 do repeat = 1 to 1e6 ; 89 do n = 1 to &n ; 90 fact = fact (n) ; 91 end ; 92 end ; 93 run ;

NOTE: DATA statement used: real time 6.51 seconds cpu time 6.35 seconds

94 95 96 %let n = 100 ; 97 data _null_ ; 98 do repeat = 1 to 1e6 ; 99 fact = 1 ; 100 do n = 1 to &n ; 101 fact = fact * n ; 102 end ; 103 end ; 104 run ;

NOTE: DATA statement used: real time 2.90 seconds cpu time 2.82 seconds

105 106 data _null_ ; 107 do repeat = 1 to 1e6 ; 108 do n = 1 to &n ; 109 fact = fact (n) ; 110 end ; 111 end ; 112 run ;

NOTE: DATA statement used: real time 1:02.67 cpu time 1:01.20

The factorial algorithm which Paul coded is faster than the FACT function for computing every factorial value from 1! through N!, no doubt about that. However, sequential processing as Paul has coded it should be fast compared to a table lookup if we have to compute every term from 1! to N!. With a table lookup, we have to traverse the tree for every new value passed to the FACT function. We can't take advantage of knowlege of the value of (K-1)! when computing K! as is done in Paul's factorial algorithm.

Below is one more proof that the FACT function probably is employing a lookup table, rather than looping from 1 to K when computing K!. The code is a variant of the code which Paul presents. We fix K to either 10 or 100 and compute K! one billion times. For K=10, Paul's sequential algorithm is faster than the FACT function. However, for K=100, the FACT function is faster than the sequential algorithm. Note that the sequential approach is exactly the same as Paul presented before. We have to compute 1!, 2!, ..., (K-1)! in order to compute K!. When we employ the FACT function, the time required to compute 100! a billion times is in fact less than the time required to compute 10! a billion times. Apparently, it takes less time to traverse the lookup table to find the factorial value for K=100 than it does to find the factorial value for K=10.

170 %let k = 10 ; 171 data _null_ ; 172 do repeat = 1 to 1e7 ; 173 fact = 1 ; 174 do k = 1 to &k ; 175 fact = fact * k ; 176 end ; 177 end ; 178 run ;

NOTE: DATA statement used: real time 3.48 seconds cpu time 3.37 seconds

179 180 data _null_ ; 181 do repeat = 1 to 1e7 ; 182 fact = fact (&k) ; 183 end ; 184 run ;

NOTE: DATA statement used: real time 7.10 seconds cpu time 6.92 seconds

185 186 187 %let k = 100 ; 188 data _null_ ; 189 do repeat = 1 to 1e7 ; 190 fact = 1 ; 191 do k = 1 to &k ; 192 fact = fact * k ; 193 end ; 194 end ; 195 run ;

NOTE: DATA statement used: real time 29.32 seconds cpu time 28.29 seconds

196 197 data _null_ ; 198 do repeat = 1 to 1e7 ; 199 fact = fact (&k) ; 200 end ; 201 run ;

NOTE: DATA statement used: real time 6.18 seconds cpu time 5.93 seconds

Dale

===== --------------------------------------- Dale McLerran Fred Hutchinson Cancer Research Center mailto: dmclerra@fhcrc.org Ph: (206) 667-2926 Fax: (206) 667-5977 ---------------------------------------

__________________________________ Do you Yahoo!? Vote for the stars of Yahoo!'s next ad campaign! http://advision.webevents.yahoo.com/yahoo/votelifeengine/

Back to: Top of message | Previous page | Main SAS-L page