Date: Wed, 28 Jun 2006 16:53:38 -0500
Reply-To: Yu Zhang <zhangyu05@GMAIL.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Yu Zhang <zhangyu05@GMAIL.COM>
Subject: Re: Data manipulation and criteria
In-Reply-To: <20060628160122.42080.qmail@web37110.mail.mud.yahoo.com>
Content-Type: text/plain; charset=WINDOWS-1252; format=flowed
hi, Dave,
Here is a Proc Tabulate solution for your problem:
proc format;
value $ qtrfmt '1'='1'
'2'='2'
'3'='3'
'4'='4';
run;
proc tabulate data=one;
class type yearqt yearroll/preloadfmt;
var wgt;
table type*yearroll*yearqt,wgt/printmiss;
format yearqt $qtrfmt.;
run;
HTH
Yu
On 6/28/06, dave crimkey <d_crimkey@yahoo.com> wrote:
>
> I'm having a problem manipulating a dataset to fit a criteria test.
> I need to take the following data and summarize it by type, year and
> yearqt.
>
> The first problem I'm encountering is that not all yearqt's are in the
> data. I need all year/quarters whether they have data or not and of course
> Proc Freq doesn't do that. I thought the Sparse & Missing options would
> help and they do…but only if the first observation has all of the categories
> in it.
>
> Then I need the sum of the yearqts and the sum of the years values so I
> can calculate a percentage by type. My code is as follows:
>
> data one;
> input type $ yearqt $ yearroll wgt;
> cards;
> MT 200101 2001 0.24
> MT 200101 2001 0.86
> MT 200102 2001 0.45
> MT 200103 2001 0.40
> MT 200104 2001 0.36
> MT 200201 2002 0.69
> MT 200202 2002 0.27
> MT 200202 2002 0.91
> MT 200202 2002 0.50
> MT 200202 2002 0.77
> MT 200204 2002 0.21
> PA 200101 2001 0.87
> PA 200102 2001 0.55
> PA 200103 2001 0.23
> PA 200104 2001 0.02
> PA 200201 2002 0.11
> PA 200204 2002 0.89
> PA 200204 2002 0.22
> PA 200301 2003 0.02
> PA 200302 2003 0.11
> PA 200303 2003 0.89
> PA 200304 2003 0.22
> NO 200101 2001 0.23
> NO 200102 2001 0.45
> NO 200201 2002 0.87
> NO 200202 2002 0.55
> NO 200301 2003 0.23
> NO 200302 2003 0.45
> NO 200303 2003 0.87
> NO 200304 2003 0.55
> ;
> run;
> proc freq data=one;
> table type*yearqt /noprint missing sparse out=testxy
> (keep=type yearqt count) ;
> table type*yearroll /noprint missing sparse out=testyz
> (rename=(count=rcount) keep=type yearroll count) ;
> weight wgt;
> run;
>
> data testxyz;
> set testxy;
> if yearqt in(200101 200102 200103 200104) then yearroll = 2001;
> else if yearqt in(200201 200202 200203 200204) then yearroll = 2002;
> else if yearqt in(200301 200302 200303 200304) then yearroll = 2003;
>
> run;
> proc sort data=testxyz;
> by type yearroll yearqt;
> run;
> proc sort data=testyz;
> by type yearroll;
> run;
> data testall;
> merge testxyz testyz;
> by type yearroll;
> run;
> proc means sum noprint data = testall nway;
> class yearqt yearroll;
> var count rcount;
> output out = testab sum=qtcount rlcount ;
> run;
> proc sort data=testab (drop=_type_);
> by yearroll yearqt;
> run;
> proc sort data=testall;
> by yearroll yearqt;
> run;
> data testall2;
> merge testall (in=a) testab (in=b);
> by yearroll yearqt;
>
> shrqt=ROUND(100*count/qtcount,.0001);
> shrrl=ROUND(100*rcount/rlcount,.0001);
>
> run;
> proc sort data=testall2 (drop=_freq_);
> by type yearqt yearroll;
> run;
>
> It seems like there's too many data steps in it and there must be an
> easier way.
>
> I eventually need to select out types where every yearqt in the file has 1
> or 0 observations or (the most recent yearqt has less than a 10% shrqt and
> every yearroll averages less than 10% shrrl.) Can anyone suggest a better
> way of getting the data together?
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam? Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
>
|