Date: Fri, 23 May 2008 11:58:18 -0400
Reply-To: Mike Rhoads <RHOADSM1@WESTAT.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Mike Rhoads <RHOADSM1@WESTAT.COM>
Subject: Re: Merging on either the ID or AltID
In-Reply-To: <941871A13165C2418EC144ACB212BDB04E15D8@dshsmxoly1504g.dshs.wa.lcl>
Content-Type: text/plain; charset="us-ascii"
The only potential problem I can see with Dan's solution is if the 2nd
data set has the same value for ID in one record as for ALTID in a
second record. In that case, the ID will occur twice in the output. As
an example, add the record below to the example data set:
002-00 002-01 XX
A slight modification merges two instances of the data set separately,
using the COALESCE function to "prefer" the Allele4 value from the
record where the value is in ID rather than ALTID:
create table want2 as
select a.*, coalesce(b1.a4,b2.a4)
from
have1 as a left join
have2 as b1
on
a.ID = b1.id left join
have2 as b2
on a.id = b2.alt_id
;
Mike Rhoads
Westat
RhoadsM1@Westat.com
-----Original Message-----
From: owner-sas-l@listserv.uga.edu [mailto:owner-sas-l@listserv.uga.edu]
On Behalf Of Nordlund, Dan (DSHS/RDA)
Sent: Thursday, May 22, 2008 5:08 PM
To: SAS-L@LISTSERV.UGA.EDU
Subject: RE: Merging on either the ID or AltID
> -----Original Message-----
> From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On
> Behalf Of Mary
> Sent: Thursday, May 22, 2008 1:53 PM
> To: SAS-L@LISTSERV.UGA.EDU
> Subject: Merging on either the ID or AltID
>
> OK, new question!
>
> I have one data set that has one ID in it. Example, ID and 3
> variables (Allele results Allele1 Allele2 Allele3)
>
> 001-01 AA CC AC
> 002-01 AC AA AC
> 003-01 AA AC AC
> 004-01 AA AA AA
>
> I have another data set that has both an ID and sometimes a
> second Alternate ID in it (ID, ALTID, Allele4) I'm putting
> XXX-XX to show the place but in my data they are just missing.
>
> 001-01 XXX-XX GT
> 002-01 XXX-XX TT
> 003-00 003-01 GT
>
> I would like to create a data set that has all the records
> from the first data set, and merges on the records from the
> second data set, but if it can't find a match on the primary
> ID of the second data set, then I want to try to find a match
> on the second ID.
>
> Thus I'd get additonal data for 001-01, 002-01, and also for
> 003-01 (because it finds no primary ID matching, but finds
> the alternate ID matching), but I wouldn't have any
> additional data for 004-01. I do want all the data in the
> first table even if it can't find any matching data in the
> second table (i.e., a left outer join).
>
> So I'd like to get:
>
> 001-01 AA CC AC GT
> 002-01 AC AA AC TT
> 003-01 AA AC AC GT
> 004-01 AA AA AA .
>
Mary,
Does this work for you
data have1;
input id $ a1 $ a2 $ a3 $ ;
cards;
001-01 AA CC AC
002-01 AC AA AC
003-01 AA AC AC
004-01 AA AA AA
;
run;
data have2;
input id $ alt_id $ a4 $ ;
cards;
001-01 XXX-XX GT
002-01 XXX-XX TT
003-00 003-01 GT
;
run;
proc sql;
create table want as
select a.*, b.a4
from
have1 as a left join
have2 as b
on
a.ID = b.id
or a.id = b.alt_id
;
quit;
Hope this is helpful,
Dan
Daniel J. Nordlund
Washington State Department of Social and Health Services
Planning, Performance, and Accountability
Research and Data Analysis Division
Olympia, WA 98504-5204
|