Date: Fri, 20 Mar 2009 07:34:52 -0700
Reply-To: Richard <rdevenezia@WILDBLUE.NET>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Richard <rdevenezia@WILDBLUE.NET>
Organization: http://groups.google.com
Subject: Re: proc summary help
Content-Type: text/plain; charset=ISO-8859-1
On Mar 19, 4:10 pm, joh...@GMAIL.COM (Jake) wrote:
> Dear SAS-L:
>
> I have the following data and would like
> to select the max for measurement 1 (mea_1)
> per subject per measurement date (mea_date),
> and I would like to know if there is a
> way to do this with proc summary, vs. the
> datasets which follow the proc summary that
> I currently have? I want to retain the id,
> mea_1, mea_2 and mea_3 values in the final
> dataset from proc summary. I seem to remember
> doing something similar using idgroup, but
> I now sure it that will work in this case.
>
> Any suggestions?
>
> Thanks! Jake
>
> data
> measurements;
>
> format mea_1 mea_2 mea_3 3. mea_date
> date9.;
> length patient
> $4;
>
> patient='0001';
>
> mea_date='10Oct08'd;
>
> id=0; mea_1=40; mea_2=90; mea_3=60;
> output;
> id=1; mea_1=34; mea_2=98; mea_3=72;
> output;
> id=2; mea_1=68; mea_2=70; mea_3=50;
> output;
>
> patient='0002';
>
> mea_date='03Oct08'd;
>
> id=0; mea_1=50; mea_2=90; mea_3=88;
> output;
> id=1; mea_1=30; mea_2=98; mea_3=70;
> output;
> id=2; mea_1=52; mea_2=70; mea_3=72;
> output;
> id=3; mea_1=72; mea_2=90; mea_3=68;
> output;
> id=4; mea_1=30; mea_2=60; mea_3=74;
> output;
>
> run;
>
> proc sort
> data=measurements;
>
> by patient mea_date
> id;
>
> run;
>
> proc summary data=measurements nway
> missing;
> where id ^=
> 0;
>
> by patient
> mea_date;
>
> class
> id;
>
> var mea_1 mea_2
> mea_3;
>
> output out=mea_sum(where=(_type_=1) drop=_freq_) min= max=;
> run;
>
> proc sort
> data=mea_sum;
>
> by patient DESCENDING
> mea_1;
> run;
>
> data
> need;
>
> set
> mea_sum;
>
> by patient DESCENDING
> mea_1;
> if
> first.patient;
>
> run;
>
> title 'This is what I
> need';
>
> proc print
> data=need;
>
> run;
Jake:
Proc SUMMARY is used to compute statistics of a variable over more
than one row.
Your SUMMARY code has statements
by patient mea_date;
class id;
Which will restrict the processing to a single row!
The code you show for NEED selects a patients row having the highest
mea_1 (regardless of date).
A simple SQL query can perform this type of data slicing:
proc sql;
create table NEED2 as
select * from measurements
group by patient
having mea_1 = max(mea_1)
;
quit;
The resultant can have multiple rows per patients if there are more
than one row having the max mea_1 value.
--
Richard A. DeVenezia
http://www.devenezia.com