Date: Thu, 13 Jul 2000 17:16:25 -0700
Reply-To: Cassell.David@EPAMAIL.EPA.GOV
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "David L. Cassell" <Cassell.David@EPAMAIL.EPA.GOV>
Subject: Re: Summarizing and choosing most populous ...
Content-type: text/plain; charset=us-ascii
David, you wrote:
> I need to:
>
> 1.) Sum all rev for each id (the easy part).
> 2.) Identify the most populous value of wc for each id.
> 3.) Identify the most populous value of sic for each id.
> 4.) Output one obs per id value.
You tried this with a DATA step. I'm sure that you'll see DATA step
solutions and PROC SQL solutions, but there is a canned PROC which
will do all this for you. PROC MEANS has a couple little-known options
in its output statement: MAXID and MINID . PROC SUMMARY will do
this too. I'll use the CLASS statement in my example so you can see that
it can handle a dataset your size even if the input is unsorted. Given your
DATA step:
proc summary data=a;
var rev;
class id;
output out=out1 sum=sumrev
maxid( rev(wc) rev(sic) ) = bigwc bigsic;
run;
proc print noobs data=out1;
var id sumrev bigwc bigsic;
run;
That's all there is to it. Using a BY statement instead of a CLASS statement
when the data are sorted on the by-variable will let you do this with a much
larger
number of distinct by-variables, since then SAS won't have to hold everything
in memory.
David
--
David Cassell, OAO Corp. Cassell.David@epa.gov
Senior computing specialist
mathematical statistician