LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (May 2006, week 4)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Mon, 29 May 2006 01:44:14 +0000
Reply-To:   toby dunn <tobydunn@HOTMAIL.COM>
Sender:   "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:   toby dunn <tobydunn@HOTMAIL.COM>
Subject:   Re: Point function: something's wrong in my code
Comments:   To: EvilPettingZoo97@AOL.COM
In-Reply-To:   <422.282aaf1.31a9f957@aol.com>
Content-Type:   text/plain; format=flowed

Ken ,

Well done, well done indeed. I would however note that it may speed up the whole process if only slightly ( I havent done any comparison on a big data sets) at the expense of needing to run on V9 and higher and the need to fully understand hashing and hashing objects. Which as we both know arent easy things to teach everyone.

Toby Dunn

From: Ken Borowiak <EvilPettingZoo97@AOL.COM> Reply-To: EvilPettingZoo97@AOL.COM To: SAS-L@LISTSERV.UGA.EDU Subject: Re: Point function: something's wrong in my code Date: Sat, 27 May 2006 14:49:59 EDT

Kitty, Toby, etc. ,

Below is a similar type of solution Toby provided (using a disk resident index and the KEY statement option in a DATA step), but using the V9 hash object instead. I was momentarily stumped over this problem the other day b/c doing a table look up using the hash object requires the key and its attributes (parameter type and length) to match that of the key in the data set you are iterating over. The dataset LOOKUP has the field CASEID to key with CASEID1 and CASEID2 of the data set PAIR. Loading the table LOOKUP into a two separate hash objects with the key CASEID being renamed was tempting, but loading redundant information into memory is wasteful. Toby used his disk resident index twice, so I should be able to use my hash object twice. Creating the field CASEID which takes on the values of CASEID1 and CASEID2 from PAIR courtesy on an array to search the hash object twice seems to do the the trick.

data Need2(drop=_:) ; if 0 then set lookup;

/*- Define and load key & data into hash object (HO) -*/ declare hash lu(hashexp:4, dataset:'lookup'); lu.definekey('caseid'); lu.definedata(all:'Y'); lu.definedone() ;

/*- Iterate over PAIR with DoW loop and do multiple table look up -*/ do until(eof) ; set pair end=eof ;

array _caseid(*) caseid1 caseid2 ; array _fid(*) fid1 fid2;

/*- Do DIM(_CASEID) table look ups -*/ /*- Variables in _CASEID need to match key name and parameter attributes of HO -*/ do _i=1 to dim(_caseid) ; caseid=_caseid[_i]; /*- WHen a match is found, assign value of FID to _fid[_i] -*/ if lu.find()=0 then _fid[_i]=fid; end; output; end; stop; run;

proc print data=Need2(drop=caseid fid) ; run;

I also suggested using an SQL solution.

proc sql _method ; /*create table Need as*/ select T1.*, T2.fid as fid1, T3.fid as fid2 from Pair as T1, LookUp as T2, LookUp as T3 where T1.caseid1=T2.caseid and T1.caseid2=T3.caseid; quit;

I would view this solution as inferior to the DATA step hash solution above. With _method option turned on, you can peruse the log to see that hash joins (execution method SQXJHSH) were used to execute the query. However, LOOKUP was loaded twice into memory, which is what I wanted to avoid doing in the DATA step above.

Ken

In a message dated 5/24/2006 8:58:10 PM Eastern Standard Time, tobydunn@HOTMAIL.COM writes: Kitty ,

Another solution which is designed to work with the data step would be:

data lookup ( index = ( CaseId ) ) ; input caseid fid; cards; 5471290 1 5471292 2 5471294 3 5471296 4 5472960 5 5472966 6 5472990 7 5472992 8 5474030 9 5474032 10 ; run;

data pair; input caseid1 caseid2 amount; cards; 5471290 5471294 345 5471292 5472966 6423 5471296 5472992 5500 5472960 5474030 234 5472990 5472992 4532 5474032 5474030 3421 ; run;

data need ; set Pair ; set LookUp ( rename = ( CaseId = CaseId1 Fid = Fid1 ) ) Key = CaseId1 ; set LookUp ( rename = ( CaseId = CaseId2 Fid = Fid2 ) ) Key = CaseId2 ; run ;

proc print data = Need ; run ;

Toby Dunn


Back to: Top of message | Previous page | Main SAS-L page