|
David, you wrote:
> I need to:
>
> 1.) Sum all rev for each id (the easy part).
> 2.) Identify the most populous value of wc for each id.
> 3.) Identify the most populous value of sic for each id.
> 4.) Output one obs per id value.
You tried this with a DATA step. I'm sure that you'll see DATA step
solutions and PROC SQL solutions, but there is a canned PROC which
will do all this for you. PROC MEANS has a couple little-known options
in its output statement: MAXID and MINID . PROC SUMMARY will do
this too. I'll use the CLASS statement in my example so you can see that
it can handle a dataset your size even if the input is unsorted. Given your
DATA step:
proc summary data=a;
var rev;
class id;
output out=out1 sum=sumrev
maxid( rev(wc) rev(sic) ) = bigwc bigsic;
run;
proc print noobs data=out1;
var id sumrev bigwc bigsic;
run;
That's all there is to it. Using a BY statement instead of a CLASS statement
when the data are sorted on the by-variable will let you do this with a much
larger
number of distinct by-variables, since then SAS won't have to hold everything
in memory.
David
--
David Cassell, OAO Corp. Cassell.David@epa.gov
Senior computing specialist
mathematical statistician
|