Date: Mon, 19 Jan 2009 20:29:47 -0600
Reply-To: Joe Matise <snoopy369@GMAIL.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Joe Matise <snoopy369@GMAIL.COM>
Subject: Re: collapse data while retain certain value: seek a better
solution than mine
In-Reply-To: <200901200212.n0JBkfth027956@malibu.cc.uga.edu>
Content-Type: text/plain; charset=ISO-8859-1
Use the MIN option in proc summary to get the first admdate per patientid.
like so.
data sample;
input patientID:$1. lengthStay:3. admDate:mmddyy10. ;
format admDate mmddyy10.;
datalines;
A 3 08/18/2007
A 5 02/27/2005
A 1 09/08/2006
B 2 01/05/2005
B 3 05/31/2003
C 56 03/19/2006
C 8 05/26/2005
C 2 12/25/2004
D 9 05/15/2005
;
run;
proc summary data=sample nway;
class patientID;
var lengthStay admdate;
output
out=stay(keep=patientid lengthstay firststay)
sum(lengthstay)=
min(admdate)=firststay;
run;
-Joe
On Mon, Jan 19, 2009 at 8:12 PM, Jerry <i89rt5@gmail.com> wrote:
> /*
> Hi,
>
> I have a sample data (see below), each record of which is a hospital
> admission with patient ID (patientID) and hospitalization days(lengthStay).
> Each patient could have multiple admissions, differentiated by admission
> dates(admDate). My purpose is to collapse the sample data by patient ID to
> get the total hospitalization days for each patient, while retain the
> earliest admission date for each patient. In other words, the resulting
> data
> is at unique patient level, with that each unique patient has the total
> hospitalization days from his/her multiple admissions, and the earliest
> admission date among his/her multiple admissions, if s/he does have
> multiple
> admissions.
>
> It took me 3 steps (see below) to achieve this purpose. I know they look
> stupid and are inefficient but I can't help! :-)
>
> I bet there must be many MUCH BETTER ways to get the same result, probably
> all 3 steps can be combined/reduced to 1 step. Could anyone please show me
> how to do so with succinct codes and/or better running time efficiency?
>
> Your input and time is greatly appreciated!
>
> Jerry
> */
>
> /*create sample data*/
> data sample;
> input patientID:$1. lengthStay:3. admDate:mmddyy10. ;
> format admDate mmddyy10.;
> datalines;
> A 3 08/18/2007
> A 5 02/27/2005
> A 1 09/08/2006
> B 2 01/05/2005
> B 3 05/31/2003
> C 56 03/19/2006
> C 8 05/26/2005
> C 2 12/25/2004
> D 9 05/15/2005
> ;
> run;
>
> /*step 1:
> summerize length of stay by unique patient id*/
>
> proc summary data=sample nway;
> class patientID;
> output
> out=stay(drop=_:)
> sum(lengthStay)=
> /autoname;
> run;
>
> /* step 2:
> get the earliest admission date for each unique patient*/
>
> /*first sort the data by patient id ad admDate*/
> proc sort data=sample(drop=lengthStay) out=date;
> by patientID admDate;
> run;
>
> /*then for each unique patient, retain his/her earliest admission date, use
> "NODUPKEY' option*/
> proc sort data=date nodupkey;
> by patientID;
> run;
>
> /*step 3:
> comebine data "stay" and "date" by patient id so that the resulting data is
> at unique patient level,
> with each unique patient has the total hospitalization days and the
> earliest
> admission date.
>
> The resulting final data should be:
>
> patientID lengthStay_sum admDate
>
> A 9 02/27/2005
> B 5 05/31/2003
> C 66 12/25/2004
> D 9 05/15/2005
> */
>
> data final;
> merge stay date;
> by patientID;
> run;
>
|