Date: Fri, 8 Nov 2002 17:20:01 -0500
Reply-To: "Droogendyk, Harry" <Harry.Droogendyk@CIBC.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Droogendyk, Harry" <Harry.Droogendyk@CIBC.COM>
Subject: Re: SQL, join or union or...
Content-Type: text/plain; charset="iso-8859-1"
One more, utilizing a view to do some pseudo pre-processing. You may want
to test each of the approaches for efficiency:
proc sql;
create view inter as
select i, 1 as membership from a
union
select i, 2 as membership from b
;
select i, sum(membership)
from inter
group by i;
quit;
-----Original Message-----
From: Droogendyk, Harry
Sent: Friday, November 08, 2002 5:08 PM
To: 'SAS-L@LISTSERV.UGA.EDU'
Subject: RE: SQL, join or union or...
Using sub-queries:
data a;
do i = 1 to 10;
output;
end;
run;
data b;
do i = 7 to 12;
output;
end;
run;
proc sql;
select i, '1' as membership
from a
where i not in ( select i from b )
union
select i, '2' as membership
from b
where i not in ( select i from a )
union
select i, '3' as membership
from a
where i in ( select i from b )
;
quit;
Using EXCEPT / INTERSECT
proc sql;
select i, '1' as membership
from a
except
select i, '1' as membership
from b
union
select i, '2' as membership
from b
except
select i, '2' as membership
from a
union
select i, '3' as membership
from a
intersect
select i, '3' as membership
from b
;
quit;
-----Original Message-----
From: Talbot Katz [mailto:TopKatz@MSN.COM]
Sent: Friday, November 08, 2002 4:49 PM
To: SAS-L@LISTSERV.UGA.EDU
Subject: SQL, join or union or...
Hey, gang.
Must be time for another one of my stupid
sql stumpers (stumping me -- not
you!). I want a combined list of unique
keys from two different files.
That's easy enough to do with a union --
proc sql;
create table comlist1 as
select * from
(select distinct keyvar from ds1)
union corr
(select distinct keyvar from ds2);
quit;
but, of course, I'm never satisfied with
something quite so simple. I want
to add a membership flag -- 1 if keyar is in
ds1 only, 2 if keyvar is in
ds2 only, 3 if keyvar is in both datasets.
I do this frequently with data
step merges as follows :
proc sort data = ds1 (keep = keyvar) out =
ds1k nodupkey;
by keyvar;
run;
proc sort data = ds2 (keep = keyvar) out =
ds2k nodupkey;
by keyvar;
run;
data comlist2;
merge ds1k (in = in1) ds2k (in = in2);
by keyvar;
keep keyvar membership;
if in1 then do;
if in2 then do;
membership = 3;
end;
else do;
membership = 1;
end;
end;
else if in2 then do;
membership = 2;
end;
run;
I have a way of doing this with proc sql,
but it's extremely clunky. To
begin with, it requires that keyvar is a
fixed length character variable.
Then, you'll see that it concatenates the
keyvar values from the two files,
and lops off one of them (If the keyvar is
numeric or non fixed length,
some times I can force it to behave.)
* &lk holds the fixed length of keyvar ;
proc sql;
reset noprint;
create table comlist3 as
select distinct substr(compress(ds1.keyvar
|| ds2.keyvar),1,&lk.) as
keyvar,
sum(ds1.membership,ds2.membership) as
membership
from
(select distinct keyvar, 1 as membership
from ds1k) ds1
full join
(select distinct keyvar, 2 as membership
from ds2k) ds2
on ds1.keyvar = ds2.keyvar;
quit;
It seems to me there should be a way of
doing this where the keyvar value
is the value from whichever data set it
comes from on records which don't
match, and the matched value on records that
do match.
Thanks!
-- TMK --