Date: Sat, 1 Jul 2006 13:30:37 -0700
Reply-To: wardnine@HOTMAIL.COM
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: wardnine@HOTMAIL.COM
Organization: http://groups.google.com
Subject: Re: Dataset Re-format Question
In-Reply-To: <200606301951.k5UIWkqU026042@mailgw.cc.uga.edu>
Content-Type: text/plain; charset="iso-8859-1"
"Howard Schreier <hs AT dc-sug DOT org>" wrote:
> >On Wed, 28 Jun 2006 19:17:32 -0700, wardnine@HOTMAIL.COM wrote:
> >
> >>Arthur Tabachneck wrote:
> >>> Julie,
> >>>
> >>> Try proc transpose. For example:
> >>>
> >>> data have;
> >>> input ID $ A B C D Z;
> >>> cards;
> >>> 12423 434 12.1 42 242 3.4
> >>> 04253 3.4 9.9 42 45 243
> >>> 64753 3.6 532 23 333 43
> >>> 74467 9.7 436 26 2.4 73
> >>> 09463 8.4 45 7.8 232 5.6
> >>> 25523 22 416 43 187 4.5
> >>> ;
> >>> run;
> >>>
> >>> proc sort data=have;
> >>> by id;
> >>> run;
> >>>
> >>> proc transpose data=have out=want;
> >>> by id;
> >>> run;
> >>>
> >>> HTH,
> >>> Art
> >>> ---------
> >>
> >>Yeah this worked, thanks!! I'm also wondering how I can get the
> >>original dataset I showed in my first post into a format where all the
> >>variable names and amounts for an id appear on one row instead of
> >>multiple rows? So instead of:
> >>
> >>ID VARIABLE AMOUNT
> >>12423 A 434
> >>12423 B 12.1
> >>12423 C 42
> >>04253 A 3.4
> >>04253 B 9.9
> >>04253 C 42
> >>
> >>I want something like this:
> >>
> >>ID VAR1 AMT1 VAR2 AMT2 VAR3 AMT3
> >>12423 A 434 B 12.1 C 42
> >>04253 A 3.4 B 9.9 C 42
>
> ID VAR1 AMT1 VAR2 AMT2 VAR3 AMT3
> 12423 A 434 B 12.1 C 42
> 04253 A 3.4 B 9.9 C 42
>
> What if there are missing values? Suppose the 9.9 were a missing value.
> Would you want
>
>
> ID VAR1 AMT1 VAR2 AMT2 VAR3 AMT3
> 12423 A 434 B 12.1 C 42
> 04253 A 3.4 B . C 42
>
> or
>
> ID VAR1 AMT1 VAR2 AMT2 VAR3 AMT3
> 12423 A 434 B 12.1 C 42
> 04253 A 3.4 . C 42
>
> or
>
> ID VAR1 AMT1 VAR2 AMT2 VAR3 AMT3
> 12423 A 434 B 12.1 C 42
> 04253 A 3.4 C 42 .
>
I'd want the first scenario. Missing values won't be a problem because
all variables with missing values would have their amounts set to zero
in a prior program anyway. So the amount for B would be 0, not
missing... no variables will have any missing values at this point in
the coding.
> ?? In the first case, or if there are no missing values, the VARn variables
> do not vary; might not it be better to maintain the values A, B, etc. in
> metadata, or in a separate supporting table?
>
> How about the order? Is it OK if all of the VARn are together followed by
> all of the AMTn, or do you require the pairs adjacent?
>
I would want the pairs adjacent. I will ultimately be exporting this
SAS dataset to a (comma-delimited) text file so, as I said in my
previous post, the variable names really don't matter. What matters is
the position the A, B, C, etc. and numerical data is in so I would
really prefer if it was A, for example, in one column followed by the
amount for A in the next column, B in the column after that followed by
the amount for B in the next column and so on. We would then have a
record layout that would let us know what type of data is in which
position (e.g., the 20th column always has the amount for the variable
J, the 23rd column always contains the variable name K, etc.).
Actually, now that I think about it, in this situation since, as you
said, the VARn variables don't vary, I wouldn't need variable names in
the final dataset, it would be better to just have the amounts. You are
right. So in this case whether or not the pairs are adjacent is
irrelevant.
Is it possible to do this with SAS? I did look through the SAS-L
archives and took a look at that mr2rm macro but it appeared you have
to download 5-6 other programs besides the main macro to make things
work. Is it possible to get this done with fewer lines
lines of coding?
> Just about anything is possible. However, normalized data structures usually
> prove to be more useful. Where is this headed?
As I state above, the SAS dataset will be exported into a text file
with one ID in each row
of the text file. Without getting into specifics, there is more
processing that is done on this data after it is converted to a text
file and that processing is much easier if all the data for one ID is
in one row (so that there is only one occurence of the ID in the file)
rather than having numerous occurences of the same ID.
Again, any help with this is appreciated!
Julie
|