Date: Fri, 8 Nov 2002 17:52:56 -0500
Reply-To: Sigurd Hermansen <HERMANS1@WESTAT.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Sigurd Hermansen <HERMANS1@WESTAT.COM>
Subject: Re: SQL, join or union or...
Content-Type: text/plain; charset="iso-8859-1"
I must confess that I have merely glanced at your message and may have
misinterpreted what you are asking. As best I understand what you want, try
something along these lines:
proc sql;
create table comlist1 as
select case when inDS1 and not inDS2 then 1
when not inDS1 and inDS2 then 2
when inDS1 and inDS2 then 3
else 0
end as inWhich,keyvar from
(select distinct 1 as inDS1,keyvar from ds1)
outer union corr
(select distinct 2 as inDS2,keyvar from ds2);
quit;
That should take some of the klunk out of the SQL solution.
Sig
-----Original Message-----
From: Talbot Katz [mailto:TopKatz@MSN.COM]
Sent: Friday, November 08, 2002 4:49 PM
To: SAS-L@LISTSERV.UGA.EDU
Subject: SQL, join or union or...
Hey, gang.
Must be time for another one of my stupid sql stumpers (stumping me -- not
you!). I want a combined list of unique keys from two different files.
That's easy enough to do with a union --
proc sql;
create table comlist1 as
select * from
(select distinct keyvar from ds1)
union corr
(select distinct keyvar from ds2);
quit;
but, of course, I'm never satisfied with something quite so simple. I want
to add a membership flag -- 1 if keyar is in ds1 only, 2 if keyvar is in
ds2 only, 3 if keyvar is in both datasets. I do this frequently with data
step merges as follows :
proc sort data = ds1 (keep = keyvar) out = ds1k nodupkey;
by keyvar;
run;
proc sort data = ds2 (keep = keyvar) out = ds2k nodupkey;
by keyvar;
run;
data comlist2;
merge ds1k (in = in1) ds2k (in = in2);
by keyvar;
keep keyvar membership;
if in1 then do;
if in2 then do;
membership = 3;
end;
else do;
membership = 1;
end;
end;
else if in2 then do;
membership = 2;
end;
run;
I have a way of doing this with proc sql, but it's extremely clunky. To
begin with, it requires that keyvar is a fixed length character variable.
Then, you'll see that it concatenates the keyvar values from the two files,
and lops off one of them (If the keyvar is numeric or non fixed length,
some times I can force it to behave.)
* &lk holds the fixed length of keyvar ;
proc sql;
reset noprint;
create table comlist3 as
select distinct substr(compress(ds1.keyvar || ds2.keyvar),1,&lk.) as
keyvar,
sum(ds1.membership,ds2.membership) as membership
from
(select distinct keyvar, 1 as membership from ds1k) ds1
full join
(select distinct keyvar, 2 as membership from ds2k) ds2
on ds1.keyvar = ds2.keyvar;
quit;
It seems to me there should be a way of doing this where the keyvar value
is the value from whichever data set it comes from on records which don't
match, and the matched value on records that do match.
Thanks!
-- TMK --