LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (October 2002, week 5)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Tue, 29 Oct 2002 15:00:19 -0500
Reply-To:   Ian Whitlock <WHITLOI1@WESTAT.COM>
Sender:   "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:   Ian Whitlock <WHITLOI1@WESTAT.COM>
Subject:   Re: MEANS/SUMMARY Output Datasets (Long)
Comments:   To: "Howard_Schreier@ITA.DOC.GOV" <Howard_Schreier@ITA.DOC.GOV>
Content-Type:   text/plain; charset="iso-8859-1"

Howard,

With the AUTONAME feature your structure 3.:

> 3. "List"

varname _stat_ value

var1 min 1 var1 max 3 var2 min 2 var2 max 8

is but a simple DATA step away, as long as your names are short enough to allow this feature to work.

data w ( drop = i ) ; array q (5) _abc1_ de2 fghij x y ; do obs = 1 to 10 ; do i = 1 to dim ( q ) ; q[i] = ranuni(9554634) ; end ; output ; end ; run ;

proc summary data = w ; var _all_ ; output out=summary (drop = _freq_ _type_ ) mean= std= max= median= / autoname ; run ;

data q ( keep = varname _stat_ value ) ; length varname $32 _stat_ $10 ; set summary ; array q (*) _numeric_ ; do i = 1 to dim ( q ) ; varname = vname ( q[i] ) ; x = 33 - index ( reverse ( varname ) , "_" ) ; _stat_ = substr ( varname , x + 1 ) ; varname = substr ( varname , 1 , x - 1 ) ; value = q[i] ; output ; end ; run ;

In response to your question Mike Rhoads [MikeRhoads@westat.com] wrote in part:

> Many, many years ago (1985 to be precise) I did a SUGI paper describing a > macro I had written to get the PROC SUMMARY output in a better structure.

That paper was responsible for drawing my wife to Westat in 1986 and me a few years later.

IanWhitlock@westat.com -----Original Message----- From: Howard_Schreier@ITA.DOC.GOV [mailto:Howard_Schreier@ITA.DOC.GOV] Sent: Friday, October 25, 2002 4:14 PM To: SAS-L@LISTSERV.UGA.EDU Subject: MEANS/SUMMARY Output Datasets (Long)

Summary: Output datasets from MEANS/SUMMARY do not in general have a convenient structure. ODS seems to have a parallel deficiency.

I started looking hard at this last week in an attempt to assist a colleague. I thought I'd find an easy fix, but I don't see it yet. Perhaps I'm missing something.

The problem arises when PROC MEANS (or SUMMARY) is asked to produce multiple statistics for multiple analysis variables. The issue is the shape of the output dataset.

Here is the input dataset I will use to illustrate.

data test; input var1 var2; cards; 1 2 3 8 3 6 ;

Suppose I need the MIN and MAX stats for both variables. That's easy to do:

proc means data=test noprint; output out=meansout(drop=_type_ _freq_) min= max= / autoname; run;

The result:

var1_Min var2_Min var1_Max var2_Max

1 2 3 8

The output is in what I call the "spread" shape, with a single observation holding all of the results (or, more generally, a single observation for each CLASS level for each CLASS intersection type in each BY group).

But that's not a particularly handy structure.

Any of of the following is usually preferable:

1. Variable x Statistic "Grid"

varname min max

var1 1 3 var2 2 8

2. Statistic x Variable "Grid"

_stat_ var1 var2

min 1 2 max 3 8

3. "List"

varname _stat_ value

var1 min 1 var1 max 3 var2 min 2 var2 max 8

These grid and list shapes lend themselves to subsetting, and it's fairly easy to transpose any one into either of the other two.

One could post-process the output dataset from PROC MEANS. But that can get messy. Consider what happens when the original variable names are so long that when the statistic keywords are concatenated, the names may exceed the 32-character limit. For example:

data demo; retain ThisVariableNameIsTooLongA ThisVariableNameIsTooLongB 0; run;

proc means data=demo noprint; output out=autonames sum= median= / autoname; run;

Now display the generated variable names:

proc sql; select name from dictionary.columns where libname='WORK' and memname='AUTONAMES' and length(name)>10; quit;

Result:

ThisVariableNameIsTooLongA_Sum ThisVariableNameIsTooLongB_Sum ThisVariableNameIsTooLo_Median ThisVariableNameIsTooLo_Median2

AUTONAME has done its job, but some stems are truncated and the suffixes are not consistent.

So it would be nice to be able to get PROC MEANS to build an output dataset in one of the convenient shapes (grid or list).

SAS knows how to do this. After all, if one does not specify any statistic keywords, and instead takes the default, the output dataset is a Stat x Var grid:

proc means data=test noprint; output out=meansout(drop=_type_ _freq_); run;

Here's what MEANSOUT looks like:

_STAT_ var1 var2

N 3.00000 3.00000 MIN 1.00000 2.00000 MAX 3.00000 8.00000 MEAN 2.33333 5.33333 STD 1.15470 3.05505

But the documentation makes it pretty clear that this structure is not available as an option when statistics keywords are coded. Too bad.

There are workarounds. Here's one:

proc means data=test noprint; output out=minds(drop=_type_ _freq_) min=; output out=maxds(drop=_type_ _freq_) max=; run;

data both; length _stat_ $ 8; set minds(in=min) maxds(in=max); select; when (min) _stat_ = 'min'; when (max) _stat_ = 'max'; otherwise; end; run;

BOTH looks like this:

_stat_ var1 var2

min 1 2 max 3 8

I consider this a little too wallpapery, and it clutters up the WORK directory with a lot of datasets.

Another workaround transposes the original dataset (I would do it on the fly via a VIEW, especially if it's really big) so that the original variable names become levels of a CLASS variable:

data for_summ / view=for_summ; set test; keep varname varval; length varname $ 32; array numvar(*) _numeric_; do i = 1 to dim(numvar); call vname(numvar(i),varname); varval = numvar(i) ; output; end; run;

proc summary data=for_summ nway; class varname; var varval; output out=vxsgrid(drop=_:) min=min max=max; run;

Here's what VXSGRID looks like:

varname min max

var1 1 3 var2 2 8

VXSGRID can be transposed into either of the other target shapes.

To get a Stat x Var grid:

proc transpose data=vxsgrid out=sxvgrid(rename=(_name_=_stat_)); id varname; var min max; run;

Produces:

_stat_ var1 var2

min 1 2 max 3 8

To get a list structure:

proc transpose data=vxsgrid out=list(rename=(_name_=_stat_ col1=value)); by varname; var min max; run;

Produces:

varname _stat_ value

var1 min 1 var1 max 3 var2 min 2 var2 max 8

So there are workarounds. They could be streamlined and macro-ized.

But it occurred to me that ODS ought to be useful here. This code sends the results of PROC MEANS to the listing destination (by default) and to the output destination:

ods trace on; ods output summary=stats_from_ods; proc means data=test min max; run; ods output close; ods trace off;

Here's what appears in the listing destination:

Variable Minimum Maximum

var1 1.0000000 3.0000000 var2 2.0000000 8.0000000

So I expected to get a Var x Stat grid structure in the output destination. But in fact STATS_FROM_ODS looks like this:

VName_ VName_ var1 var1_Min var1_Max var2 var2_Min var2_Max

var1 1 3 var2 2 8

It is a dataset with the spread structure, very similar to the one created with the OUTPUT statement and the AUTONAME option.

So the listing and output destinations have different structures, despite the fact that there is but one ODS table definition, which can be dumped with this step:

proc template; source base.summary; run;

My conclusion at this point is that ODS can't help here.

Comments and suggestions welcome.


Back to: Top of message | Previous page | Main SAS-L page