LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (March 2004, week 2)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Fri, 12 Mar 2004 15:18:05 -0600
Reply-To:     pudding man <pudding_man@MAIL.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         pudding man <pudding_man@MAIL.COM>
Subject:      Nested Loop: "v8.2 is about 5x _slower_ than v9.1" (was Super
              Size  ...)
Comments: To: "Richard A. DeVenezia" <radevenz@IX.NETCOM.COM>
Content-Type: text/plain; charset="iso-8859-1"

Richard sez:

> In this sample v8.2 is about 5x _slower_ than v9.1 !

5x? Surely you jest!

'Fraid I'll have to request more info on thisun ...

You have run tests on the same hardware/OS? You are perhaps one of the "Infinitively Brave" who would dare to run both 8.2 and 9.1 on the same Windozy system? If so, is it possible that 8.2 is 5x slower -because- you're running both on the same system? <g>

> Kudos to the v9 dev teams.

I recall >= 1 sas-l thread on dubious performance in nested loops. Could there have been a material problem that they 'fixed' in 9.1? If so, (please substantiate) we should lavish praise in thousand-part harmony ... :-)

Prost, Puddin'

******************************************************* ***** Puddin' Man **** Pudding_Man-at-mail.com ******** *******************************************************;

"Ain't nobody's business if I bark like a dawg!" - Muddy Waters

----- Original Message ----- From: "Richard A. DeVenezia" <radevenz@IX.NETCOM.COM> Date: Fri, 12 Mar 2004 11:48:27 -0500 To: SAS-L@LISTSERV.UGA.EDU Subject: Re: Super Size Dataset from Cartesian Join

pudding man wrote: > Yes, I think that Richards approach, if practical, is > advisable. Mayhap be the onliest approach to avoid the > proverbial 40 daze/40 nites of runtime. :-) > > Not to nitpick, but I think that, for N x,y tuples, > there are only N(N-1)/2 distances to be computed. So, > instead of: >

Please, nitpick. Both you and Chang (in private) pointed out some obvious problems that are easily fixed and improve the runtime by ~50%. I do appreciate being 'called to the mat' as it were, keeps the quality level high.

Here's the updated version, runtime is spot on 3minutes (cpu only, no output) in v9.1 on a fast windows workstation (circa jan.2004) 13.5M rows needing output is ~0.8% of the comb(6e5,2) pairs examined.

Additional optimizations could be investigated if certain data constructs are present (such as the points are sorted by x and y coordinate, etc...)

In this sample v8.2 is about 5x _slower_ than v9.1 ! Kudos to the v9 dev teams.

----- * not really cartesian, more like order independent pairings ;

%let n = 60000;

data points; do id = 1 to &n;

x = ranuni(1); y = ranuni(1); output;

end; run;

proc sql noprint; select count(*) into :n from points; quit;

%let maxdistance = 0.05; %let maxdistance_squared = %sysevalf (&maxdistance**2); %* thanks Chang; %let report_every = 5e7; %let t0 = %sysfunc(datetime());

data close;

array _x[&n] _temporary_; array _y[&n] _temporary_;

do until (eopoints); set points end=eopoints; _x[_n_] = x; _y[_n_] = y; _n_ + 1; end;

do i = 1 to %eval(&n-1); do j = i+1 to &n;

pairCount + 1; reportCount + 1; if reportCount >= &report_every then do; reportCount = 0; pctDone = pairCount / %sysevalf( (&n-1) * &n/2 * 1/100); %* % of n things taken 2 at a time; pctOut = outputCount / pairCount * 100; walltime = datetime()-&t0; put walltime=time8. outputCount=comma15. pctOut=4.1 pairCount=comma15. pctDone=4.1; end;

* less taxing tests for faster filtering; if abs(_x[i] - _x[j]) > &maxdistance then continue; if abs(_y[i] - _y[j]) > &maxdistance then continue;

* no real need for sqrt ing this; distance_squared = (_x[i]-_x[j])**2 + (_y[i]-_y[j])**2 ;

if distance_squared <= &maxdistance_squared then do; outputCount + 1; * output; end;

end; end;

put outputCount=comma12. pairCount=comma12.;

stop;

keep i j distance_squared; run; -----

-- Richard -- ___________________________________________________________ Sign-up for Ads Free at Mail.com http://promo.mail.com/adsfreejump.htm


Back to: Top of message | Previous page | Main SAS-L page