Date: Fri, 20 Jan 2006 00:46:39 +0000
Reply-To: toby dunn <tobydunn@HOTMAIL.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: toby dunn <tobydunn@HOTMAIL.COM>
Subject: Re: Help with a SQL Join
In-Reply-To: <A1E198DD1F8D744089A93B0A90C3884F0484F3F3@MAILBE2.westat.com>
Content-Type: text/plain; format=flowed
Well Sig et...
I had originally ask Paul Dorfman to help out with this. He came up with a
solution but it didn't quit meet all th ecrazy requirments, but that more
than likely has to do with me not explaining it well enough. However his
code did start me down the track of coming up with the following solution
see what you think and if there is any place I can make some logical
improvements.
data SIDR ;
infile cards ;
input SSN $1. @3 AdmitDate date9. ;
cards ;
1 17jan2005
1 15jul2005
1 24Dec2005
2 12Jul2005
;
run ;
data DMDC ;
infile cards ;
input @1 SSN $1. @3 StartDate date9. @13 EndDate date9. ;
cards ;
1 01jan2005 30Mar2005
1 01apr2005 01jun2005
1 29jul2005 30nov2005
1 01oct2005 31dec2005
3 01jun2005 20oct2005
;
run ;
proc sql ;
create table Admits as
select coalesce(DMDC.SSN , SIDR.SSN) as SSN , SIDR.AdmitDate
, DMDC.StartDate , DMDC.EndDate
, (SIDR.AdmitDate - DMDC.StartDate) as diff
, case when calculated diff < 0 then 9000000 else calculated
diff end as diff2
from DMDC as DMDC
right join
SIDR as SIDR
on DMDC.SSN = SIDR.SSN
group by DMDC.SSN , SIDR.AdmitDate
having diff2 = min(diff2)
order by ssn , AdmitDate
;
create table need as
select Coalesce(DMDC.SSN , Admits.SSN) as SSN ,
Admits.AdmitDate , DMDC.StartDate , DMDC.EndDate
from DMDC as DMDC
full join
Admits as Admits
on DMDC.SSN = Admits.SSN
and DMDC.StartDate = Admits.StartDate
and DMDC.EndDate = Admits.EndDate
order by SSN , AdmitDate
;
quit ;
proc print
data = need ;
format AdmitDate StartDate EndDate date9. ;
run ;
Toby Dunn
From: Sigurd Hermansen <HERMANS1@WESTAT.COM>
Reply-To: Sigurd Hermansen <HERMANS1@WESTAT.COM>
To: SAS-L@LISTSERV.UGA.EDU
Subject: Re: Help with a SQL Join
Date: Thu, 19 Jan 2006 17:55:53 -0500
Toby:
A solution would likely be the equivalent of unions of four subqueries.
This close solution handles three. Two correspond to constraint 3) and
one to constraint 1):
/*
1). Now I need to join the two data sets together by SSN , and when the
One.AdmitDate falls
between Two.StartDate and Two.EndDate.
2). In the cases that One.AdmitDate does not fall
nicely between these two dates I need it to be joined with the record
that contains the
earliest date in which the AdmitDate falls between.
3. I also need the Final data set to
contain all records with no matching SSN in One and all Records with no
matching record in Two.
And for good measures there are no duplicate StartDate's, AdmitDates, or
overlapping StartDates/EndDates per SSN.
*/
proc sql;
create table close as
select * from
(
select distinct coalesce(t1.SSN,t2.SSN) as
SSN,t1.AdmitDate,t2.StartDate,t2.EndDate
from One as t1 full join Two as t2
on t1.SSN=t2.SSN
)
where t1.AdmitDate between t2.StartDate and t2.EndDate
or t1.AdmitDate is NULL or t2.StartDate is NULL
;
quit;
/*
I'll have to look closer at the remaining subquery. If you figure it out
first, let me know. The logic looks turgid. I find that when one part of
a logical problem looks much more complicated than the other parts, that
one part may change when reviewed.
Sig
-----Original Message-----
From: owner-sas-l@listserv.uga.edu [mailto:owner-sas-l@listserv.uga.edu]
On Behalf Of toby dunn
Sent: Thursday, January 19, 2006 4:58 PM
To: SAS-L@LISTSERV.UGA.EDU
Subject: Help with a SQL Join
Okay for some reason my brain can't find an easy solution to this join:
Lets say I have two data sets:
data One ;
infile cards ;
input SSN $1. @3 AdmitDate date9. ;
cards ;
1 17jan2005
1 15jul2005
1 24Dec2005
2 12Jul2005
;
run ;
data Two ;
infile cards ;
input @1 SSN $1. @3 StartDate date9. @13 EndDate date9. ;
cards ;
1 01jan2005 30Mar2005
1 01apr2005 01jun2005
1 29jul2005 30nov2005
1 01oct2005 31dec2005
3 01jun2005 20oct2005
;
run ;
Now I need to join the two data sets together by SSN , and when the
One.AdmitDate falls between Two.StartDate and Two.EndDate. In the cases
that One.AdmitDate does not fall nicely between these two dates I need
it to be joined with the record that contains the earliest date in which
the AdmitDate falls between. I also need the Final data set to contain
all records with no matching SSN in One and all Records with no matching
record in Two. And for good measures there are no duplicate
StartDate's, AdmitDates, or overlapping StartDates/EndDates per SSN.
From the above data the final data set should look like:
Obs SSN StartDate EndDate AdmitDate
1 1 01JAN2005 30MAR2005 17JAN2005
2 1 01APR2005 01JUN2005 15JUL2005
3 1 29JUL2005 30NOV2005 .
4 1 01OCT2005 31DEC2005 24DEC2005
5 2 . . 12JUL2005
6 3 01JUN2005 20OCT2005 .
Toby Dunn