| Date: | Mon, 29 May 2006 01:44:14 +0000 |
| Reply-To: | toby dunn <tobydunn@HOTMAIL.COM> |
| Sender: | "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU> |
| From: | toby dunn <tobydunn@HOTMAIL.COM> |
| Subject: | Re: Point function: something's wrong in my code |
|
| In-Reply-To: | <422.282aaf1.31a9f957@aol.com> |
| Content-Type: | text/plain; format=flowed |
|---|
Ken ,
Well done, well done indeed. I would however note that it may speed up the
whole process if only slightly ( I havent done any comparison on a big data
sets) at the expense of needing to run on V9 and higher and the need to
fully understand hashing and hashing objects. Which as we both know arent
easy things to teach everyone.
Toby Dunn
From: Ken Borowiak <EvilPettingZoo97@AOL.COM>
Reply-To: EvilPettingZoo97@AOL.COM
To: SAS-L@LISTSERV.UGA.EDU
Subject: Re: Point function: something's wrong in my code
Date: Sat, 27 May 2006 14:49:59 EDT
Kitty, Toby, etc. ,
Below is a similar type of solution Toby provided (using a disk resident
index and
the KEY statement option in a DATA step), but using the V9 hash object
instead. I was
momentarily stumped over this problem the other day b/c doing a table look
up using the
hash object requires the key and its attributes (parameter type and length)
to match that
of the key in the data set you are iterating over. The dataset LOOKUP has
the
field CASEID to
key with CASEID1 and CASEID2 of the data set PAIR. Loading the table LOOKUP
into a two separate
hash objects with the key CASEID being renamed was tempting, but loading
redundant information
into memory is wasteful. Toby used his disk resident index twice, so I
should
be able to
use my hash object twice. Creating the field CASEID which takes on the
values of CASEID1 and CASEID2 from PAIR courtesy on an array to search the
hash object twice
seems to do the the trick.
data Need2(drop=_:) ;
if 0 then set lookup;
/*- Define and load key & data into hash object (HO) -*/
declare hash lu(hashexp:4, dataset:'lookup');
lu.definekey('caseid');
lu.definedata(all:'Y');
lu.definedone() ;
/*- Iterate over PAIR with DoW loop and do multiple table look up -*/
do until(eof) ;
set pair end=eof ;
array _caseid(*) caseid1 caseid2 ;
array _fid(*) fid1 fid2;
/*- Do DIM(_CASEID) table look ups -*/
/*- Variables in _CASEID need to match key name and parameter
attributes of HO -*/
do _i=1 to dim(_caseid) ;
caseid=_caseid[_i];
/*- WHen a match is found, assign value of FID to _fid[_i] -*/
if lu.find()=0 then _fid[_i]=fid;
end;
output;
end;
stop;
run;
proc print data=Need2(drop=caseid fid) ;
run;
I also suggested using an SQL solution.
proc sql _method ;
/*create table Need as*/
select T1.*,
T2.fid as fid1,
T3.fid as fid2
from Pair as T1,
LookUp as T2,
LookUp as T3
where T1.caseid1=T2.caseid and
T1.caseid2=T3.caseid;
quit;
I would view this solution as inferior to the DATA step hash solution above.
With _method option
turned on, you can peruse the log to see that hash joins (execution method
SQXJHSH)
were used to execute the query. However, LOOKUP was loaded twice into
memory,
which is what I
wanted to avoid doing in the DATA step above.
Ken
In a message dated 5/24/2006 8:58:10 PM Eastern Standard Time,
tobydunn@HOTMAIL.COM writes:
Kitty ,
Another solution which is designed to work with the data step would be:
data lookup ( index = ( CaseId ) ) ;
input caseid fid;
cards;
5471290 1
5471292 2
5471294 3
5471296 4
5472960 5
5472966 6
5472990 7
5472992 8
5474030 9
5474032 10
;
run;
data pair;
input caseid1 caseid2 amount;
cards;
5471290 5471294 345
5471292 5472966 6423
5471296 5472992 5500
5472960 5474030 234
5472990 5472992 4532
5474032 5474030 3421
;
run;
data need ;
set Pair ;
set LookUp ( rename = ( CaseId = CaseId1 Fid = Fid1 ) ) Key = CaseId1 ;
set LookUp ( rename = ( CaseId = CaseId2 Fid = Fid2 ) ) Key = CaseId2 ;
run ;
proc print
data = Need ;
run ;
Toby Dunn
|