|
--- "Richard A. DeVenezia" <radevenz@IX.NETCOM.COM> wrote:
> Paul M. Dorfman wrote:
> > %macro fact (n= , out=) ;
> > data fact ;
>
> Type in haste no doubt.
> data fact ;
> should be
> data &out ;
>
> >
> > do n = 1 to &n ;
> > fact = fact (n) ;
> > output ;
> > end ;
> >
> > Moreover, it is easy to see that computer-wise, this code is
> actually
> > less efficient, for each time FACT() is called, its guts have to
> > recalculate the entire product instead of just multiplying by the
> > next N. Of course, the difference is negligible and virtually
> > impossible to notice, unless just for the sake of curiosity we
> > recompute a factorial enough times (from N = 1 to 170, 1 million
> > times below):
> >
> > 04 %let n = 170 ;
> > 05 data _null_ ;
> > 06 do repeat = 1 to 1e6 ;
> > 07 fact = 1 ;
> > 08 do n = 1 to &n ;
> > 09 fact = fact * n ;
> > 10 end ;
> > 11 end ;
> > 12 run ;
> > NOTE: DATA statement used (Total process time):
> > real time 4.87 seconds
> > cpu time 4.87 seconds
> >
> > 13 data _null_ ;
> > 14 do repeat = 1 to 1e6 ;
> > 15 do n = 1 to &n ;
> > 16 fact = fact (n) ;
> > 17 end ;
> > 18 end ;
> > 19 run ;
> > NOTE: DATA statement used (Total process time):
> > real time 1:33.79
> > cpu time 1:33.54
> >
>
> If I had a hat, I would have to tip it to Paul.
>
> FACT is a function that, in Windows, accepts only 172 distinct
> inputs: The
> integers [0...171].
> Q for SI: Why does the function do an internal looping calculation
> instead
> of a table lookup to return the factorial value ?
> Last I checked, factorial(I) is a constant over time :) The 172
> constants
> should be computed at SAS system build time.
>
Good point. A table lookup would seem very appropriate for the
FACT function.
> I am too impatient to wait out a mere 1.5 minutes, but here is a
> visual you
> can use to examine the O() of the looping inside FACT().
> Increase i & N upper limits if you are more patient than I. Increase
> by
> step if you are less impatient.
>
> data timing;
> do N = 1 to 40 by 2;
> t0 = datetime();
> do i = 1 to 5e6;
> fact = fact(N);
> end;
> t1 = datetime();
> e = t1-t0;
> put n= z2. e= z10.6;
> output;
> end;
> run;
>
> symbol1 i=join v=dot;
>
> proc gplot data=timing;
> plot e*n;
> run;
>
I expected to see E increasing linearly with N if additional
looping increased execution time. In fact, the plot of E is flat
over N on my system (SAS 8.12 (TS2M0), Windows 2000 Workstation
service pack 3, 1GB RAM, 2.4 Ghz). I added some additional values
of N from 41 to 171 by 10, and even for large N the execution time
for the FACT function was constant. I take it that you observe
behavior in which E increases with N. Is that correct?
I also ran Paul's code with macro variable N=10 and N=100. Note
that Paul's code computes every factorial from 1! to &N!, so that
the number of factorial computations when &N=100 is 10 times the
number of factorial computations when &N=10. Paul's sequential
algorithm for computing the factorials should scale by a factor
of 10. If the FACT function employs looping from 1 to K, when
computing K!, then the FACT function should not scale by a factor
of 10. Rather, the ratio of execution times should be approximately
112 times longer to compute 1! through 100! than it would be to
compute 1! through 10! since there would be 5050 loops for N=100
and 45 loops for N=10. On the other hand, if the FACT function
employs a table lookup, that should scale (approximately) by 10
since we have to look up 10 times as many values.
My log is shown below. Regardless of whether we employ the FACT
function or code a factorial algorithm, the time that it takes to
execute 1!, 2!, ..., 100! 1E6 times is ten times the length of time
that is required to compute 1!, 2!, ..., 10!.
77 %let n = 10 ;
78 data _null_ ;
79 do repeat = 1 to 1e6 ;
80 fact = 1 ;
81 do n = 1 to &n ;
82 fact = fact * n ;
83 end ;
84 end ;
85 run ;
NOTE: DATA statement used:
real time 0.32 seconds
cpu time 0.32 seconds
86
87 data _null_ ;
88 do repeat = 1 to 1e6 ;
89 do n = 1 to &n ;
90 fact = fact (n) ;
91 end ;
92 end ;
93 run ;
NOTE: DATA statement used:
real time 6.51 seconds
cpu time 6.35 seconds
94
95
96 %let n = 100 ;
97 data _null_ ;
98 do repeat = 1 to 1e6 ;
99 fact = 1 ;
100 do n = 1 to &n ;
101 fact = fact * n ;
102 end ;
103 end ;
104 run ;
NOTE: DATA statement used:
real time 2.90 seconds
cpu time 2.82 seconds
105
106 data _null_ ;
107 do repeat = 1 to 1e6 ;
108 do n = 1 to &n ;
109 fact = fact (n) ;
110 end ;
111 end ;
112 run ;
NOTE: DATA statement used:
real time 1:02.67
cpu time 1:01.20
The factorial algorithm which Paul coded is faster than the FACT
function for computing every factorial value from 1! through N!,
no doubt about that. However, sequential processing
as Paul has coded it should be fast compared to a table lookup
if we have to compute every term from 1! to N!. With a table
lookup, we have to traverse the tree for every new value passed
to the FACT function. We can't take advantage of knowlege of the
value of (K-1)! when computing K! as is done in Paul's factorial
algorithm.
Below is one more proof that the FACT function probably is employing
a lookup table, rather than looping from 1 to K when computing K!.
The code is a variant of the code which Paul presents. We fix K
to either 10 or 100 and compute K! one billion times. For K=10,
Paul's sequential algorithm is faster than the FACT function.
However, for K=100, the FACT function is faster than the sequential
algorithm. Note that the sequential approach is exactly the same
as Paul presented before. We have to compute 1!, 2!, ..., (K-1)!
in order to compute K!. When we employ the FACT function, the
time required to compute 100! a billion times is in fact less
than the time required to compute 10! a billion times. Apparently,
it takes less time to traverse the lookup table to find the
factorial value for K=100 than it does to find the factorial
value for K=10.
170 %let k = 10 ;
171 data _null_ ;
172 do repeat = 1 to 1e7 ;
173 fact = 1 ;
174 do k = 1 to &k ;
175 fact = fact * k ;
176 end ;
177 end ;
178 run ;
NOTE: DATA statement used:
real time 3.48 seconds
cpu time 3.37 seconds
179
180 data _null_ ;
181 do repeat = 1 to 1e7 ;
182 fact = fact (&k) ;
183 end ;
184 run ;
NOTE: DATA statement used:
real time 7.10 seconds
cpu time 6.92 seconds
185
186
187 %let k = 100 ;
188 data _null_ ;
189 do repeat = 1 to 1e7 ;
190 fact = 1 ;
191 do k = 1 to &k ;
192 fact = fact * k ;
193 end ;
194 end ;
195 run ;
NOTE: DATA statement used:
real time 29.32 seconds
cpu time 28.29 seconds
196
197 data _null_ ;
198 do repeat = 1 to 1e7 ;
199 fact = fact (&k) ;
200 end ;
201 run ;
NOTE: DATA statement used:
real time 6.18 seconds
cpu time 5.93 seconds
Dale
=====
---------------------------------------
Dale McLerran
Fred Hutchinson Cancer Research Center
mailto: dmclerra@fhcrc.org
Ph: (206) 667-2926
Fax: (206) 667-5977
---------------------------------------
__________________________________
Do you Yahoo!?
Vote for the stars of Yahoo!'s next ad campaign!
http://advision.webevents.yahoo.com/yahoo/votelifeengine/
|