Date: Fri, 31 May 2002 08:48:01 -0700
Reply-To: Paul choate <pchoate@GSOS.NET>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Paul choate <pchoate@GSOS.NET>
Organization: http://groups.google.com/
Subject: Re: Sums & Avgs of Variables in Dataset
Content-Type: text/plain; charset=ISO-8859-1
You may want to describe your situtaion a bit more clearly. I hope
I'm not over-simplifying, but -
I'm not clear why you need proc transpose, I'd use it as a last
resort. Usually whatever you need to do with your data can be done
without transforming the dataset. In SAS you _usually_ use a datastep
to operate on individual observations and procedures to operate across
observations.
When working with large number of variables in a row I'd use an array
statement, it allows you to operate on a large number of variables
easily. There are implicit and explicit methods of referencing the
variables. Generally, you define the array and then use a do-loop and
subscripted array elements. The results are within one observation at
a time.
The sum and average functions both allow variable lists also.
sumvars=sum(of x1-x55) or avevars=mean(of x1--a23). Note the two
styles of lists. Again this is on a single observation (row).
You need to be careful with how you handle missing values. The sum
and average functions operate on non-missing values and ignore missing
values. Arithmetic operators return a missing value when they operate
on missing values.
Working across many observations I'd use proc means or summary
(essentially the same procedure) to compute sums and averages. Again,
be sure you know how SAS is handling your missing observations. There
is a "missing" option. You can either sum and average across the
whole dataset, or use "class" variables or sorted "by" variables to
sum and mean on groups within your data. Note that you can define an
output dataset in the procedure, and so you can run a dataset through
multiple proc summary's to collapse it on one set of dimensions and
then another set. This is very useful.
I would guess your problem can be solved with some operations in a
datastep and then a subsequent proc summary. By using datsteps and
proc summary's in combination you can "crunch" data in almost any way
you need. Proc transpose would be more commonly used for
restructuring a dataset in preparation to joining it to other data
that is stored in a different structure.
Hope that helps.
pchoate@gsos.net
wpr <wpr@midsouth.rr.nospam.com> wrote in message news:<3CF6C2BB.29EF520E@midsouth.rr.nospam.com>...
> I have a large dataset, about 90,000 rows and 200 variables. I need to
> get the summation of some of the variables and the average of the
> remaining variables, for each variable.
>
> Here's what I tried to do:
> 1. Proc Transpose; to get the variable names into the field named _name_
>
> 2. from the resulting dataset, create two more: one for fields to be
> summed, the other for fields to be averaged, using the value in _name_
> for screening
>
> This didn't work! The log says something about columns and lines, but I
> haven't a clue what this means.
>
> I have four datasets with this information that I need to do this for
> and I do not want to manually enter the variable names (about 1,000).
>
> Does anyone have any ideas either how to get my Proc Transpose idea to
> work or something else?
>
> Thanks very much.