Date: Sat, 2 Feb 2008 13:29:08 -0500
Reply-To: Muthia Kachirayan <muthia.kachirayan@GMAIL.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Muthia Kachirayan <muthia.kachirayan@GMAIL.COM>
Subject: Re: PROC SQL with alternate keys
In-Reply-To: <8494c0f9-75f3-4d61-ae27-499ca2e4ebc1@e23g2000prf.googlegroups.com>
Content-Type: text/plain; charset=ISO-8859-1
On Feb 2, 2008 10:45 AM, Paul <paulvonhippel@yahoo.com> wrote:
> I'm trying to join a small table A to a large table B where A.key can
> join B.key1 or (rarely) B.key2. Including key2 in the join slows the
> query substantially. That is.
>
> (1) select * from A inner join B where A.key=B.key1
> is pretty fast, but
> (2) select * from A inner join B where (A.key=B.key1 or A.key=B.key2)
> is much more than twice as slow.
>
> I don't know what's going on, but I'm thinking query (2) might not be
> optimized for the fact that table A is much smaller than table B.
>
> Any tips on speeding performance of query (2) would be most
> appreciated.
>
> Thanks!
> Paul
>
Paul,
Following Gregg, here is a Key Indexing method. Unlike HASH solution, Key
Indexing depends on a numeric value to look up the ARRAY( K[ ] ). If your
KEY takes value not more than 9-digit number, then your system can find
sufficient memory space.
In the following code I assume your key takes a maximum value of 50 million.
%let num = 50000000;
data wanted;
array k[&num] _temporary_;
do until(eofS);
set SMALL end = eofS;
k[key] = 1;
end;
do until(eofL);
set LARGE end = eofL;
if k[key1] or k[key2] then output;
end;
stop;
drop key;
run;
Regards,
Muthia Kachirayan
|