Date: Fri, 4 Mar 2011 10:59:05 -0800
Reply-To: Irin later <irinfigvam@YAHOO.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Irin later <irinfigvam@YAHOO.COM>
Subject: Re: Confusing Dedup code
In-Reply-To: <000a01cbda95$6e02f160$4a08d420$@com>
Content-Type: text/plain; charset=utf-8
Max,
The output actually contains the records for each scenario while I need just one record.
If tag1=1 and smallest timeframe met than we just get rid of others within the ID group.
However if it doesn't meet then tag2=1 +smallest takes it's turn. If meets then get rid of the rest
If both tags=1 then just smallest timeframe criteria takes place and we get rid of the rest.
Final result should contain just one record ("the best")
Sorry for the confusion.
--- On Fri, 3/4/11, bbser 2009 <bbser2009@GMAIL.COM> wrote:
From: bbser 2009 <bbser2009@GMAIL.COM>
Subject: Re: Confusing Dedup code
To: SAS-L@LISTSERV.UGA.EDU
Date: Friday, March 4, 2011, 12:56 PM
Irin
Below is the whole verified code based on the idea in my last email. Wish to be what you want.
-Max
data have;
infile cards;
input id $3. srv_date : mmddyy10. disch_date : mmddyy10. timeframe tag1 tag2;
cards;
12a 07/15/2010 07/05/2010 10 1 1
12a 08/31/2010 08/01/2010 30 0 1
12a 03/10/2010 02/16/2010 24 0 1
12a 06/24/2010 06/10/2010 14 1 1
12a 05/30/2010 03/30/2010 60 0 0
12a 09/25/2010 09/05/2010 20 0 1
12a 08/31/2010 08/01/2010 30 0 1
12a 03/10/2010 02/16/2010 24 0 1
12a 05/30/2010 03/30/2010 60 0 0
12a 09/25/2010 09/05/2010 20 0 1
12a 09/30/2010 08/01/2010 60 0 0
12a 03/14/2010 02/10/2010 34 0 0
12a 05/30/2010 03/25/2010 55 0 0
12a 10/25/2010 09/15/2010 40 0 0
;
data new;
set have;
indicator=-1;
if tag1=1 then indicator=1;
else if tag2=1 then indicator=0;
run;
proc sort data=new out=new_sort;
by id indicator timeframe;
run;
data need;
set new_sort;
by id indicator timeframe;
if first.indicator then output;
run;
proc print;
run;
-----Original Message-----
From: bbser 2009 [mailto:bbser2009@gmail.com]
Sent: March-04-11 12:33 PM
To: 'Irin later'; 'SAS-L@LISTSERV.UGA.EDU'
Subject: RE: Confusing Dedup code
I see. I was thinking of doing it like this:
In first data step, make a new variable to differentiate the three cases:
....
indicator=-1;
if tag1=1 then indicator=1;
else if tag2=1 then indicator=0;
...
In second data step, just use last. and first. technique like this:
...
by id indicator timeframe;
...
In this data step, you only output the observation where first.indicator=1.
I guess this should work.
Thanks.
-Max
From: Irin later [mailto:irinfigvam@yahoo.com]
Sent: March-04-11 11:32 AM
To: SAS-L@LISTSERV.UGA.EDU; bbser 2009
Subject: Re: Confusing Dedup code
Max, actually I do concern about values of tag1 and tag2
If tag1=1 I need this record but if there are couple of records w tag1=1 within the same ID group, I do need just those with smallest timeframe.
If tag1=0 I ignore this record and my concern is tag2=1 . within the same ID group, I do need just those with smallest timeframe.
If tag1 and tag2 both are 0 then I have nothing to do but to chose a record with o . However in such a case my concern still would me the smallest timeframe among those records with tag1=0 and tag2=0
--- On Fri, 3/4/11, bbser 2009 <bbser2009@GMAIL.COM> wrote:
From: bbser 2009 <bbser2009@GMAIL.COM>
Subject: Re: Confusing Dedup code
To: SAS-L@LISTSERV.UGA.EDU
Date: Friday, March 4, 2011, 11:23 AM
Irin
It seems to me that you always want the smallest timeframe for the same ID regardless of values of tag1 and tag2.
Did I misunderstand anything?
Thanks.
-Max
-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of Irin later
Sent: March-04-11 10:57 AM
To: SAS-L@LISTSERV.UGA.EDU
Subject: [SAS-L] Confusing Dedup code
I am confused with a dedup code…..
-Among the patients with the same ID and criteria tag1=1, one needs first to keep from records with tag1=1 and with the shortest timeframe between Disch_date and srv_date that follows the hospital discharge. This is the 1st I priority for dedup.
-In case tag1=0, the priority would be tag2=1 with the shortest timeframe
-In case both tag1=0 and tag2=0 one needs to keep the record with the shortest timeframe
for example:
Scenarion #1:
id srv_date disch_date timeframe tag1 tag2
12a 07/15/2010 07/05/2010 10 1 1
12a 08/31/2010 08/01/2010 30 0 1
12a 03/10/2010 02/16/2010 24 0 1
12a 06/24/2010 06/10/2010 14 1 1
12a 05/30/2010 03/30/2010 60 0 0
12a 09/25/2010 09/05/2010 20 0 1
I need the following record with tag1=1 and with the shortest timeframe=10
(rather than timeframe=14)
id SRV_date disch_date timeFrame tag1 tag2
12a 07/15/2010 07/05/2010 10 1 1
Scenario #2:
id srv_date disch_date timeFrame tag1 tag2
12a 08/31/2010 08/01/2010 30 0 1
12a 03/10/2010 02/16/2010 24 0 1
12a 05/30/2010 03/30/2010 60 0 0
12a 09/25/2010 09/05/2010 20 0 1
I need the following record with tag2=1 and the shortest timeframe=20
(rather than timeframe=24 or 30)
id srv_date disch_date timeFrame tag1 tag2
12a 09/25/2010 09/05/2010 20 0 1
Scenario #3 (both tag1 and tag2 =0):
id srv_date disch_date timeFrame tag1 tag2
12a 09/31/2010 08/01/2010 60 0 0
12a 03/14/2010 02/10/2010 34 0 0
12a 05/30/2010 03/25/2010 55 0 0
12a 10/25/2010 09/15/2010 40 0 0
I need the following record the shortest timeframe=34
id srv_date disch_date timeFrame tag1 tag2
12a 03/14/2010 02/10/2010 34 0 0
Could you, please, give me a hand?
Thank you for advance,
Irin