Date: Tue, 30 Dec 2008 22:22:01 -0500
Reply-To: Ian Whitlock <iw1sas@GMAIL.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Ian Whitlock <iw1sas@GMAIL.COM>
Subject: Re: Output the Results of a Proc Freq To A SAS Dataset
Content-Type: text/plain; charset=ISO-8859-1
Summary: ODS or OUT= for multi freqs? (with code for OUT=)
#iw-value=1
In response to my ranting (APOLOGIES to Mary) Andrew wrote in part:
I also think we need to focus on a solution to the problem posed in
Gina's original post.
ODS may offer something of a solution here. It's not perfect, but
it may save some time and steps versus a completely manual
approach.
with the code snippet
ods listing close;
ods output crosstabfreqs(persist=proc match_all) =table;
proc freq data=elec;
by region;
tables office*(cesched serial trans);
run;
ods listing;
producing data sets TABLE and TABLE1-TABLE30. (I feel kind of dirty
when I have to work with the oddball TABLE and wonder at the
programmer too lazy to either have TABLE0 or go to TABLE31.
Moreover, it is a nightmare when you work interactively because the
second time you execute you get TABLE32-TABLE62 and have to remember
which is which.)
To the point of combining the sets Andrew remarks:
We now have 30 separate SAS data sets, each of which contains the
PROC FREQ-generated output from a cross-tabulation we requested.
These tables are created with a minimum of coding and if the number
of BY or TABLES statement variables changes, we don't have to do
very much to obtain all the tables we want after adding a variable
or two to the PROC FREQ "step," as our project needs change.
The difficulty is "lining up" which data set is associated with
which requested cross-tab. Unfortunately, the programmer will have
to figure that out by, most likely, using PROC EYEBALL to look at
each one. Additional data step programming may be necessary to
modify and then append the tables together, if need be. But, at
the very least, we have all the PROC FREQ-generated tables in SAS
data sets for subsequent use.
Yes, this is the crux of the problem - easy to make data sets, a
nightmare to combine them. Andrew's solution leaves the problem to
the end. Since I have put recent thought into the problem I decided
to push it to a complete general solution (Mary left out cross tabs
and Andrew left out one way freqs, and both omitted the hard part.)
The basic problem is that PROC FREQ was never designed to carry all
the information needed to combine tables. ODS makes the classic
mistake of burying information in the variable names instead of data.
So the question is where and when to get the combining information. I
decided to make it a user's problem, like Andrew, but to put it up
front where a macro can make use of it. Moreover, I decided to make a
macro for standardizing one OUT= data set at a time, so the code is
compromise between what could be done if a managing macro collected
required information from the user and used system information where
possible to set up and run the PROC FREQ.
To test the code I returned to the original specification with made up
data.
proc format ;
value yn 0 = "NO" 1 = "YES" ;
run ;
data abc ;
do plan = 1 to 3 ;
do obs = 1 to 20 * ranuni(123) ;
cnt = round ( ranuni(123) ) ;
h_cmc = round(ranuni(123)) ;
clm_dx_1_cd = round ( ranuni(123) ) ;
clm_dx_2_cd = round ( ranuni(123) ) ;
H_LDLC_SCRN = round ( ranuni(123) ) ;
H_LDLC_LT100 = substr("ABCDEF", 3*round(ranuni(123))+1, 3) ;
output ;
end ;
end ;
format hmc YN. ;
run ;
proc freq data=abc ;
by plan;
table cnt / out=f1 noprint missing ;
table h_cmc / out=f2 noprint missing ;
table h_cmc*clm_dx_1_cd / out=f3 noprint missing ;
table h_cmc*clm_dx_2_cd / out=f4 noprint missing ;
table H_LDLC_SCRN*H_LDLC_LT100 / out=f5 noprint missing ;
run ;
Note that both character and numeric data with formats have been
considered. The 5 tables are standardized with
%standard_freq(data = f1, tablenum = 1
, vars = plan cnt
, fmts = best12. best6.
)
%standard_freq(data = f2, tablenum = 2
, vars = plan h_cmc
, fmts = best12. yn6.
)
%standard_freq(data = f3, tablenum = 3
, vars = plan h_cmc clm_dx_1_cd
, fmts = best12. yn6. best8.
)
%standard_freq(data = f4, tablenum = 4
, vars = plan h_cmc clm_dx_2_cd
, fmts = best12. yn6. best8.
)
%standard_freq(data = f5, tablenum = 5
, vars = plan H_LDLC_SCRN H_LDLC_LT100
, fmts = best12. best6. $char8.
)
Moreover, with this set up it is easy to insure, by eyeball that each
column of values has a consistent width. The code is repetitive, but
it could be collected by parsing or generating the PROC FREQ and using
PROC CONTENTS information for data set ABC.
The advantage of having this structure is that it doesn't care where
or when the freq output sets were created, and it doesn't preclude
writing a managing macro to make the calls to STANDARD_FREQ. It would
be interesting to see the corresponding macro for the ODS data sets.
The one advantage that ODS has is that there is more information. On
the other hand, for me the cell counts are everything.
Here is the macro - surprisingly short and simple once one puts the
burden on the user to write the calls.
%macro standard_freq ( data = f3, outsuffix = _c
, tablenum = 1
, vars = plan h_cmc clm_dx_1_cd
, fmts = best12. $char12. best12.
) ;
%local vcnt out;
%let vcnt = %sysfunc(countw(&vars)) ;
%let out = &data&outsuffix ;
data &out ( keep = name: v: count ) ;
retain tablenum &tablenum ;
set &data ;
%do i = 1 %to &vcnt ;
name&i = "%scan(&vars,&i)" ;
v&i = put(%scan(&vars,&i),%scan(&fmts,&i,%str( )) -L) ;
%end ;
run ;
%mend standard_freq ;
--
Ian Whitlock