|
Howard,
With the AUTONAME feature your structure 3.:
> 3. "List"
varname _stat_ value
var1 min 1
var1 max 3
var2 min 2
var2 max 8
is but a simple DATA step away, as long as your names are short enough to
allow this feature to work.
data w ( drop = i ) ;
array q (5) _abc1_ de2 fghij x y ;
do obs = 1 to 10 ;
do i = 1 to dim ( q ) ;
q[i] = ranuni(9554634) ;
end ;
output ;
end ;
run ;
proc summary data = w ;
var _all_ ;
output out=summary (drop = _freq_ _type_ ) mean= std= max= median= /
autoname ;
run ;
data q ( keep = varname _stat_ value ) ;
length varname $32 _stat_ $10 ;
set summary ;
array q (*) _numeric_ ;
do i = 1 to dim ( q ) ;
varname = vname ( q[i] ) ;
x = 33 - index ( reverse ( varname ) , "_" ) ;
_stat_ = substr ( varname , x + 1 ) ;
varname = substr ( varname , 1 , x - 1 ) ;
value = q[i] ;
output ;
end ;
run ;
In response to your question Mike Rhoads [MikeRhoads@westat.com] wrote in
part:
> Many, many years ago (1985 to be precise) I did a SUGI paper describing a
> macro I had written to get the PROC SUMMARY output in a better structure.
That paper was responsible for drawing my wife to Westat in 1986 and me a
few years later.
IanWhitlock@westat.com
-----Original Message-----
From: Howard_Schreier@ITA.DOC.GOV [mailto:Howard_Schreier@ITA.DOC.GOV]
Sent: Friday, October 25, 2002 4:14 PM
To: SAS-L@LISTSERV.UGA.EDU
Subject: MEANS/SUMMARY Output Datasets (Long)
Summary: Output datasets from MEANS/SUMMARY do not in general have a
convenient structure. ODS seems to have a parallel deficiency.
I started looking hard at this last week in an attempt to assist a
colleague. I thought I'd find an easy fix, but I don't see it yet.
Perhaps I'm missing something.
The problem arises when PROC MEANS (or SUMMARY) is asked to produce
multiple statistics for multiple analysis variables. The issue is the
shape of the output dataset.
Here is the input dataset I will use to illustrate.
data test;
input var1 var2;
cards;
1 2
3 8
3 6
;
Suppose I need the MIN and MAX stats for both variables. That's easy to do:
proc means data=test noprint;
output out=meansout(drop=_type_ _freq_) min= max= / autoname;
run;
The result:
var1_Min var2_Min var1_Max var2_Max
1 2 3 8
The output is in what I call the "spread" shape, with a single
observation holding all of the results (or, more generally, a single
observation for each CLASS level for each CLASS intersection type in
each BY group).
But that's not a particularly handy structure.
Any of of the following is usually preferable:
1. Variable x Statistic "Grid"
varname min max
var1 1 3
var2 2 8
2. Statistic x Variable "Grid"
_stat_ var1 var2
min 1 2
max 3 8
3. "List"
varname _stat_ value
var1 min 1
var1 max 3
var2 min 2
var2 max 8
These grid and list shapes lend themselves to subsetting, and it's
fairly easy to transpose any one into either of the other two.
One could post-process the output dataset from PROC MEANS. But that can
get messy. Consider what happens when the original variable names are so
long that when the statistic keywords are concatenated, the names may
exceed the 32-character limit. For example:
data demo;
retain ThisVariableNameIsTooLongA
ThisVariableNameIsTooLongB 0;
run;
proc means data=demo noprint;
output out=autonames sum= median= / autoname;
run;
Now display the generated variable names:
proc sql;
select name from dictionary.columns
where libname='WORK' and memname='AUTONAMES' and length(name)>10;
quit;
Result:
ThisVariableNameIsTooLongA_Sum
ThisVariableNameIsTooLongB_Sum
ThisVariableNameIsTooLo_Median
ThisVariableNameIsTooLo_Median2
AUTONAME has done its job, but some stems are truncated and the suffixes
are not consistent.
So it would be nice to be able to get PROC MEANS to build an output
dataset in one of the convenient shapes (grid or list).
SAS knows how to do this. After all, if one does not specify any
statistic keywords, and instead takes the default, the output dataset is
a Stat x Var grid:
proc means data=test noprint;
output out=meansout(drop=_type_ _freq_);
run;
Here's what MEANSOUT looks like:
_STAT_ var1 var2
N 3.00000 3.00000
MIN 1.00000 2.00000
MAX 3.00000 8.00000
MEAN 2.33333 5.33333
STD 1.15470 3.05505
But the documentation makes it pretty clear that this structure is not
available as an option when statistics keywords are coded. Too bad.
There are workarounds. Here's one:
proc means data=test noprint;
output out=minds(drop=_type_ _freq_) min=;
output out=maxds(drop=_type_ _freq_) max=;
run;
data both;
length _stat_ $ 8;
set minds(in=min)
maxds(in=max);
select;
when (min) _stat_ = 'min';
when (max) _stat_ = 'max';
otherwise;
end;
run;
BOTH looks like this:
_stat_ var1 var2
min 1 2
max 3 8
I consider this a little too wallpapery, and it clutters up the WORK
directory with a lot of datasets.
Another workaround transposes the original dataset (I would do it on the
fly via a VIEW, especially if it's really big) so that the original
variable names become levels of a CLASS variable:
data for_summ / view=for_summ;
set test;
keep varname varval;
length varname $ 32;
array numvar(*) _numeric_;
do i = 1 to dim(numvar);
call vname(numvar(i),varname);
varval = numvar(i) ;
output;
end;
run;
proc summary data=for_summ nway;
class varname;
var varval;
output out=vxsgrid(drop=_:) min=min max=max;
run;
Here's what VXSGRID looks like:
varname min max
var1 1 3
var2 2 8
VXSGRID can be transposed into either of the other target shapes.
To get a Stat x Var grid:
proc transpose data=vxsgrid out=sxvgrid(rename=(_name_=_stat_));
id varname;
var min max;
run;
Produces:
_stat_ var1 var2
min 1 2
max 3 8
To get a list structure:
proc transpose data=vxsgrid out=list(rename=(_name_=_stat_ col1=value));
by varname;
var min max;
run;
Produces:
varname _stat_ value
var1 min 1
var1 max 3
var2 min 2
var2 max 8
So there are workarounds. They could be streamlined and macro-ized.
But it occurred to me that ODS ought to be useful here. This code sends
the results of PROC MEANS to the listing destination (by default) and to
the output destination:
ods trace on;
ods output summary=stats_from_ods;
proc means data=test min max;
run;
ods output close;
ods trace off;
Here's what appears in the listing destination:
Variable Minimum Maximum
var1 1.0000000 3.0000000
var2 2.0000000 8.0000000
So I expected to get a Var x Stat grid structure in the output
destination. But in fact STATS_FROM_ODS looks like this:
VName_ VName_
var1 var1_Min var1_Max var2 var2_Min var2_Max
var1 1 3 var2 2 8
It is a dataset with the spread structure, very similar to the
one created with the OUTPUT statement and the AUTONAME option.
So the listing and output destinations have different structures,
despite the fact that there is but one ODS table definition, which can
be dumped with this step:
proc template; source base.summary; run;
My conclusion at this point is that ODS can't help here.
Comments and suggestions welcome.
|