Date: Sat, 1 Jul 2006 19:29:19 -0400
Reply-To: "Howard Schreier <hs AT dc-sug DOT org>" <nospam@HOWLES.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Howard Schreier <hs AT dc-sug DOT org>" <nospam@HOWLES.COM>
Subject: Re: Dataset Re-format Question
On Sat, 1 Jul 2006 13:30:37 -0700, wardnine@HOTMAIL.COM wrote:
[snip]
>> >>I want something like this:
>> >>
>> >>ID VAR1 AMT1 VAR2 AMT2 VAR3 AMT3
>> >>12423 A 434 B 12.1 C 42
>> >>04253 A 3.4 B 9.9 C 42
>>
>> ID VAR1 AMT1 VAR2 AMT2 VAR3 AMT3
>> 12423 A 434 B 12.1 C 42
>> 04253 A 3.4 B 9.9 C 42
>>
>> What if there are missing values? Suppose the 9.9 were a missing value.
>> Would you want
>>
>>
>> ID VAR1 AMT1 VAR2 AMT2 VAR3 AMT3
>> 12423 A 434 B 12.1 C 42
>> 04253 A 3.4 B . C 42
>>
>> or
>>
>> ID VAR1 AMT1 VAR2 AMT2 VAR3 AMT3
>> 12423 A 434 B 12.1 C 42
>> 04253 A 3.4 . C 42
>>
>> or
>>
>> ID VAR1 AMT1 VAR2 AMT2 VAR3 AMT3
>> 12423 A 434 B 12.1 C 42
>> 04253 A 3.4 C 42 .
>>
>
>I'd want the first scenario. Missing values won't be a problem because
>all variables with missing values would have their amounts set to zero
>in a prior program anyway. So the amount for B would be 0, not
>missing... no variables will have any missing values at this point in
>the coding.
>
>
>> ?? In the first case, or if there are no missing values, the VARn variables
>> do not vary; might not it be better to maintain the values A, B, etc. in
>> metadata, or in a separate supporting table?
>>
>> How about the order? Is it OK if all of the VARn are together followed by
>> all of the AMTn, or do you require the pairs adjacent?
>>
>
>I would want the pairs adjacent. I will ultimately be exporting this
>SAS dataset to a (comma-delimited) text file so, as I said in my
>previous post, the variable names really don't matter. What matters is
>the position the A, B, C, etc. and numerical data is in so I would
>really prefer if it was A, for example, in one column followed by the
>amount for A in the next column, B in the column after that followed by
>the amount for B in the next column and so on. We would then have a
>record layout that would let us know what type of data is in which
>position (e.g., the 20th column always has the amount for the variable
>J, the 23rd column always contains the variable name K, etc.).
>Actually, now that I think about it, in this situation since, as you
>said, the VARn variables don't vary, I wouldn't need variable names in
>the final dataset, it would be better to just have the amounts. You are
>right. So in this case whether or not the pairs are adjacent is
>irrelevant.
>
>Is it possible to do this with SAS? I did look through the SAS-L
>archives and took a look at that mr2rm macro but it appeared you have
>to download 5-6 other programs besides the main macro to make things
>work. Is it possible to get this done with fewer lines
>lines of coding?
>
>> Just about anything is possible. However, normalized data structures usually
>> prove to be more useful. Where is this headed?
>
>As I state above, the SAS dataset will be exported into a text file
>with one ID in each row
>of the text file. Without getting into specifics, there is more
>processing that is done on this data after it is converted to a text
>file and that processing is much easier if all the data for one ID is
>in one row (so that there is only one occurence of the ID in the file)
>rather than having numerous occurences of the same ID.
>
>Again, any help with this is appreciated!
>
>Julie
Isn't this pretty much what you said you had at the start?
On Tue, 27 Jun 2006 20:16:40 -0700, wardnine@HOTMAIL.COM wrote:
>Hey - I have a SAS dataset where the first column is a five digit
>identification number (variable name = ID). I then have 26 more columns
>that contain amounts associated with each ID in the file. For
>simplicity's sake, let's say the variable name for the second column is
>A, the third column is B, the fourth column is C, etc... until the
>final column which is named Z. So the dataset (when you view it in SAS)
>basically looks like the table below (the formatting of the table will
>probably be off but I hope you get the idea):
>
> ID A B C D ...... Z
>12423 434 12.1 42 242 3.4
>04253 3.4 9.9 42 45 243
>64753 3.6 532 23 333 43
>74467 9.7 436 26 2.4 73
>09463 8.4 45 7.8 232 5.6
>25523 22 416 43 187 4.5
The variable names are A, B, C etc. instead of AMT1, AMT2, AMT3 etc., but
you said that does not matter.
|