Date: Wed, 9 Feb 2011 12:39:20 -0800
Reply-To: E P <evp_sasl@YAHOO.CA>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: E P <evp_sasl@YAHOO.CA>
Subject: Re: Filling in missing data with NA
In-Reply-To: <AANLkTin_cmZ3OVx_EEwfY56mO3DPQo7i++3sCLe0imNA@mail.gmail.com>
Content-Type: text/plain; charset=utf-8
And for those of us who don't use proc summary, this works too...
hosp1 15676 Jul10 35
hosp1 15676 Aug10 34
hosp1 15676 Sep10 40
hosp1 15678 Jul10 15
hosp1 15678 Aug10 13
hosp1 15678 Sep10 14
hosp2 15676 Jul10 45
hosp2 15676 Aug10 40
hosp2 15676 Sep10 35
hosp2 15677 Jul10 20
hosp2 15677 Aug10 25
hosp2 15677 Sep10 19
;
datalong;inputhosp $ meas $ month $ rank $;datalines;run;procsql;/* get
distinct meas values */createtableunique_meas asselectdistinct(meas)
asumeasfromlong;/* get distinct hosp values */createtableunique_hosp
asselectdistinct(hosp) asuhospfromlong;/* full merge to get all combinations of
meas and hosp*/;createtableall_combos asselectunique_hosp.uhosp,
unique_meas.umeasfromunique_hosp, unique_meas/* merge all combinations with data
you have */createtablelong_filled as;selectlong.*, all_combos.*fromall_combos
left join longonlong.meas = all_combos.umeas andlong.hosp=all_combos.uhospquit
/* fill in missing values */datalong_filled;setlong_filled;ifmeas = ' 'thenmeas
= umeas;ifhosp = ' 'thenhosp = uhosp;run
/* sort, transpose */
;;
________________________________
From: "Data _null_;" <iebupdte@GMAIL.COM>
To: SAS-L@LISTSERV.UGA.EDU
Sent: Wed, February 9, 2011 3:03:47 PM
Subject: Re: Filling in missing data with NA
I often use PROC SUMMARY for this type of data expansion.
proc summary data=long nway completetypes;
class hosp meas;
class month / order=data;
output out=long2(drop=_:) idgroup(out(rank)=);
run;
On Wed, Feb 9, 2011 at 1:42 PM, Kirby, Ted <ted.kirby@lewin.com> wrote:
> I have the following code (the "long" dataset is a mocked up version of the
>real dataset that has 33 hospitals and up to 19 measures for each hospital).
>
> data long;
> input hosp $ meas $ month $ rank $;
> datalines;
> hosp1 15676 Jul10 35
> hosp1 15676 Aug10 34
> hosp1 15676 Sep10 40
> hosp1 15678 Jul10 15
> hosp1 15678 Aug10 13
> hosp1 15678 Sep10 14
> hosp2 15676 Jul10 45
> hosp2 15676 Aug10 40
> hosp2 15676 Sep10 35
> hosp2 15677 Jul10 20
> hosp2 15677 Aug10 25
> hosp2 15677 Sep10 19
> ;
> run;
>
> Notice that hosp1 is missing data for measure 15677 and hosp2 is missing data
>for measure 15678.
>
> proc transpose data=long out=wide(drop=_NAME_);
> by hosp meas;
> id month;
> var rank;
> run;
>
> proc print; run;
>
> yields what I want (almost):
>
> hosp meas Jul10 Aug10 Sep10
> hosp1 15676 35 34 40
> hosp1 15678 15 13 14
> hosp2 15676 45 40 35
> hosp2 15677 20 25 19
>
> The "almost" part is that I would like the final dataset to be:
>
> hosp meas Jul10 Aug10 Sep10
> hosp1 15676 35 34 40
> hosp1 15677 NA NA NA
> hosp1 15678 15 13 14
> hosp2 15676 45 40 35
> hosp2 15677 20 25 19
> hosp2 15678 NA NA NA
>
>
> Thus, at some point (either before the transposition or after) I need to add
>data to fill in the value of NA the "rank" field for each month if the hospital
>did not have data for that measure (and thus would have no rank). (I converted
>"rank" from a number to a character variable because I want to use the value
>"NA" if hospitals are missing the data.)
>
> I have tried to figure out how to do this with a macro variable that is a list
>of the measures or with a hash iterator but could not figure out how to
>implement logic that says:
>
> if not exist (data for this hospital and this measure) then add (a record
>for this combination) and set (each month variable equal to "NA").
>
> I hope this request makes sense. If not, please let me know.
>
>
> Ted Kirby
> Consultant,
> The Lewin Group, Inc.
> 3130 Fairview Park Drive, Suite 800
> Falls Church, VA 22042
> Phone: (703)269-5507
> Fax: (703)269-5501
> e-mail: ted.kirby@lewin.com <mailto:ted.kirby@lewin.com>
>
> ************* IMPORTANT - PLEASE READ ********************
>
> This e-mail, including attachments, may include confidential and/or proprietary
>information,
> and may be used only by the person or entity to which it is addressed. If the
>reader of this
> e-mail is not the intended recipient or his or her authorized agent, the reader
>is hereby
> notified that any dissemination, distribution or copying of this e-mail is
>prohibited. If you
> have received this e-mail in error, please notify the sender by replying to
>this message
> and delete this e-mail immediately.
>
>
|