Date: Fri, 12 Mar 2004 15:18:05 -0600
Reply-To: pudding man <pudding_man@MAIL.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: pudding man <pudding_man@MAIL.COM>
Subject: Nested Loop: "v8.2 is about 5x _slower_ than v9.1" (was Super
Size ...)
Content-Type: text/plain; charset="iso-8859-1"
Richard sez:
> In this sample v8.2 is about 5x _slower_ than v9.1 !
5x? Surely you jest!
'Fraid I'll have to request more info on thisun ...
You have run tests on the same hardware/OS? You are perhaps
one of the "Infinitively Brave" who would dare to run both
8.2 and 9.1 on the same Windozy system? If so, is it
possible that 8.2 is 5x slower -because- you're running both
on the same system? <g>
> Kudos to the v9 dev teams.
I recall >= 1 sas-l thread on dubious performance in nested
loops. Could there have been a material problem that they
'fixed' in 9.1? If so, (please substantiate) we should
lavish praise in thousand-part harmony ... :-)
Prost,
Puddin'
*******************************************************
***** Puddin' Man **** Pudding_Man-at-mail.com ********
*******************************************************;
"Ain't nobody's business if I bark like a dawg!"
- Muddy Waters
----- Original Message -----
From: "Richard A. DeVenezia" <radevenz@IX.NETCOM.COM>
Date: Fri, 12 Mar 2004 11:48:27 -0500
To: SAS-L@LISTSERV.UGA.EDU
Subject: Re: Super Size Dataset from Cartesian Join
pudding man wrote:
> Yes, I think that Richards approach, if practical, is
> advisable. Mayhap be the onliest approach to avoid the
> proverbial 40 daze/40 nites of runtime. :-)
>
> Not to nitpick, but I think that, for N x,y tuples,
> there are only N(N-1)/2 distances to be computed. So,
> instead of:
>
Please, nitpick. Both you and Chang (in private) pointed out some obvious
problems that are easily fixed and improve the runtime by ~50%. I do
appreciate being 'called to the mat' as it were, keeps the quality level
high.
Here's the updated version, runtime is spot on 3minutes (cpu only, no
output) in v9.1 on a fast windows workstation (circa jan.2004) 13.5M rows
needing output is ~0.8% of the comb(6e5,2) pairs examined.
Additional optimizations could be investigated if certain data constructs
are present (such as the points are sorted by x and y coordinate, etc...)
In this sample v8.2 is about 5x _slower_ than v9.1 ! Kudos to the v9 dev
teams.
-----
* not really cartesian, more like order independent pairings ;
%let n = 60000;
data points;
do id = 1 to &n;
x = ranuni(1);
y = ranuni(1);
output;
end;
run;
proc sql noprint;
select count(*) into :n from points;
quit;
%let maxdistance = 0.05;
%let maxdistance_squared = %sysevalf (&maxdistance**2); %* thanks Chang;
%let report_every = 5e7;
%let t0 = %sysfunc(datetime());
data close;
array _x[&n] _temporary_;
array _y[&n] _temporary_;
do until (eopoints);
set points end=eopoints;
_x[_n_] = x;
_y[_n_] = y;
_n_ + 1;
end;
do i = 1 to %eval(&n-1);
do j = i+1 to &n;
pairCount + 1;
reportCount + 1;
if reportCount >= &report_every then do;
reportCount = 0;
pctDone = pairCount / %sysevalf( (&n-1) * &n/2 * 1/100); %* % of n
things taken 2 at a time;
pctOut = outputCount / pairCount * 100;
walltime = datetime()-&t0;
put walltime=time8. outputCount=comma15. pctOut=4.1 pairCount=comma15.
pctDone=4.1;
end;
* less taxing tests for faster filtering;
if abs(_x[i] - _x[j]) > &maxdistance then continue;
if abs(_y[i] - _y[j]) > &maxdistance then continue;
* no real need for sqrt ing this;
distance_squared = (_x[i]-_x[j])**2 + (_y[i]-_y[j])**2 ;
if distance_squared <= &maxdistance_squared then do;
outputCount + 1;
* output;
end;
end;
end;
put outputCount=comma12. pairCount=comma12.;
stop;
keep i j distance_squared;
run;
-----
--
Richard
--
___________________________________________________________
Sign-up for Ads Free at Mail.com
http://promo.mail.com/adsfreejump.htm