```Date: Fri, 20 Jan 2006 00:46:39 +0000 Reply-To: toby dunn Sender: "SAS(r) Discussion" From: toby dunn Subject: Re: Help with a SQL Join Comments: To: HERMANS1@WESTAT.COM, sashole@bellsouth.net In-Reply-To: Content-Type: text/plain; format=flowed Well Sig et... I had originally ask Paul Dorfman to help out with this. He came up with a solution but it didn't quit meet all th ecrazy requirments, but that more than likely has to do with me not explaining it well enough. However his code did start me down the track of coming up with the following solution see what you think and if there is any place I can make some logical improvements. data SIDR ; infile cards ; input SSN \$1. @3 AdmitDate date9. ; cards ; 1 17jan2005 1 15jul2005 1 24Dec2005 2 12Jul2005 ; run ; data DMDC ; infile cards ; input @1 SSN \$1. @3 StartDate date9. @13 EndDate date9. ; cards ; 1 01jan2005 30Mar2005 1 01apr2005 01jun2005 1 29jul2005 30nov2005 1 01oct2005 31dec2005 3 01jun2005 20oct2005 ; run ; proc sql ; create table Admits as select coalesce(DMDC.SSN , SIDR.SSN) as SSN , SIDR.AdmitDate , DMDC.StartDate , DMDC.EndDate , (SIDR.AdmitDate - DMDC.StartDate) as diff , case when calculated diff < 0 then 9000000 else calculated diff end as diff2 from DMDC as DMDC right join SIDR as SIDR on DMDC.SSN = SIDR.SSN group by DMDC.SSN , SIDR.AdmitDate having diff2 = min(diff2) order by ssn , AdmitDate ; create table need as select Coalesce(DMDC.SSN , Admits.SSN) as SSN , Admits.AdmitDate , DMDC.StartDate , DMDC.EndDate from DMDC as DMDC full join Admits as Admits on DMDC.SSN = Admits.SSN and DMDC.StartDate = Admits.StartDate and DMDC.EndDate = Admits.EndDate order by SSN , AdmitDate ; quit ; proc print data = need ; format AdmitDate StartDate EndDate date9. ; run ; Toby Dunn From: Sigurd Hermansen Reply-To: Sigurd Hermansen To: SAS-L@LISTSERV.UGA.EDU Subject: Re: Help with a SQL Join Date: Thu, 19 Jan 2006 17:55:53 -0500 Toby: A solution would likely be the equivalent of unions of four subqueries. This close solution handles three. Two correspond to constraint 3) and one to constraint 1): /* 1). Now I need to join the two data sets together by SSN , and when the One.AdmitDate falls between Two.StartDate and Two.EndDate. 2). In the cases that One.AdmitDate does not fall nicely between these two dates I need it to be joined with the record that contains the earliest date in which the AdmitDate falls between. 3. I also need the Final data set to contain all records with no matching SSN in One and all Records with no matching record in Two. And for good measures there are no duplicate StartDate's, AdmitDates, or overlapping StartDates/EndDates per SSN. */ proc sql; create table close as select * from ( select distinct coalesce(t1.SSN,t2.SSN) as SSN,t1.AdmitDate,t2.StartDate,t2.EndDate from One as t1 full join Two as t2 on t1.SSN=t2.SSN ) where t1.AdmitDate between t2.StartDate and t2.EndDate or t1.AdmitDate is NULL or t2.StartDate is NULL ; quit; /* I'll have to look closer at the remaining subquery. If you figure it out first, let me know. The logic looks turgid. I find that when one part of a logical problem looks much more complicated than the other parts, that one part may change when reviewed. Sig -----Original Message----- From: owner-sas-l@listserv.uga.edu [mailto:owner-sas-l@listserv.uga.edu] On Behalf Of toby dunn Sent: Thursday, January 19, 2006 4:58 PM To: SAS-L@LISTSERV.UGA.EDU Subject: Help with a SQL Join Okay for some reason my brain can't find an easy solution to this join: Lets say I have two data sets: data One ; infile cards ; input SSN \$1. @3 AdmitDate date9. ; cards ; 1 17jan2005 1 15jul2005 1 24Dec2005 2 12Jul2005 ; run ; data Two ; infile cards ; input @1 SSN \$1. @3 StartDate date9. @13 EndDate date9. ; cards ; 1 01jan2005 30Mar2005 1 01apr2005 01jun2005 1 29jul2005 30nov2005 1 01oct2005 31dec2005 3 01jun2005 20oct2005 ; run ; Now I need to join the two data sets together by SSN , and when the One.AdmitDate falls between Two.StartDate and Two.EndDate. In the cases that One.AdmitDate does not fall nicely between these two dates I need it to be joined with the record that contains the earliest date in which the AdmitDate falls between. I also need the Final data set to contain all records with no matching SSN in One and all Records with no matching record in Two. And for good measures there are no duplicate StartDate's, AdmitDates, or overlapping StartDates/EndDates per SSN. From the above data the final data set should look like: Obs SSN StartDate EndDate AdmitDate 1 1 01JAN2005 30MAR2005 17JAN2005 2 1 01APR2005 01JUN2005 15JUL2005 3 1 29JUL2005 30NOV2005 . 4 1 01OCT2005 31DEC2005 24DEC2005 5 2 . . 12JUL2005 6 3 01JUN2005 20OCT2005 . Toby Dunn ```

Back to: Top of message | Previous page | Main SAS-L page