I think the discrepancy between what would expect from the MEANS/SUMMARY ODS
output data set (given the appearance of the report output) and the actual
structure of the output data set, is that the latter actually represents the
structure of the ODS table, while trickery is involved in the former.
The table template gives each statistic its own column, which is the
"spread" layout that one gets in the output data set:
column class nobs id type ways (varname) (label) (min) (max) (range) (n)
(nmiss) (sumwgt) (sum) (mean) (uss) (css) (var) (stddev) (cv) (stderr) (t)
(probt) (lclm) (uclm) (skew) (kurt) (median) (mode) (q1) (q3) (qrange) (p1)
(p5) (p10) (p25) (p50) (p75) (p90) (p95) (p99);
Note the parens around the names in the column statement. According to
something I found in TFM, "If a column name appears in parentheses, PROC
TEMPLATE stacks the values of all variables that use that column definition
one below the other in the output object." This would explain why the
listing output looks as it does. Presumably the original column
definitions, rather than the "restacked" table, is used for the output data
I agree with your conclusion ("ODS can't help here").
From: Howard_Schreier@ITA.DOC.GOV [mailto:Howard_Schreier@ITA.DOC.GOV]
Sent: Tuesday, October 29, 2002 4:12 PM
Subject: Re: MEANS/SUMMARY Output Datasets (Long)
The thing I'm really wondering about is why ODS behaves as it does, and
whether there is a way to make it produce a different structure for the
On Fri, 25 Oct 2002 16:13:49 -0400, Howard_Schreier@ITA.DOC.GOV wrote:
>Summary: Output datasets from MEANS/SUMMARY do not in general have a
>convenient structure. ODS seems to have a parallel deficiency.
>I started looking hard at this last week in an attempt to assist a
>colleague. I thought I'd find an easy fix, but I don't see it yet.
>Perhaps I'm missing something.
>The problem arises when PROC MEANS (or SUMMARY) is asked to produce
>multiple statistics for multiple analysis variables. The issue is the
>shape of the output dataset.
>Here is the input dataset I will use to illustrate.
> data test;
> input var1 var2;
> 1 2
> 3 8
> 3 6
>Suppose I need the MIN and MAX stats for both variables. That's easy to do:
> proc means data=test noprint;
> output out=meansout(drop=_type_ _freq_) min= max= / autoname;
> var1_Min var2_Min var1_Max var2_Max
> 1 2 3 8
>The output is in what I call the "spread" shape, with a single
>observation holding all of the results (or, more generally, a single
>observation for each CLASS level for each CLASS intersection type in
>each BY group).
>But that's not a particularly handy structure.
>But it occurred to me that ODS ought to be useful here. This code sends
>the results of PROC MEANS to the listing destination (by default) and to
>the output destination:
> ods trace on;
> ods output summary=stats_from_ods;
> proc means data=test min max;
> ods output close;
> ods trace off;
>Here's what appears in the listing destination:
> Variable Minimum Maximum
> var1 1.0000000 3.0000000
> var2 2.0000000 8.0000000
>So I expected to get a Var x Stat grid structure in the output
>destination. But in fact STATS_FROM_ODS looks like this:
> VName_ VName_
> var1 var1_Min var1_Max var2 var2_Min var2_Max
> var1 1 3 var2 2 8
>It is a dataset with the spread structure, very similar to the
>one created with the OUTPUT statement and the AUTONAME option.
>So the listing and output destinations have different structures,
>despite the fact that there is but one ODS table definition, which can
>be dumped with this step:
> proc template; source base.summary; run;
>My conclusion at this point is that ODS can't help here.
>Comments and suggestions welcome.