| Date: | Fri, 6 Oct 2006 13:22:58 -0700 |
| Reply-To: | David L Cassell <davidlcassell@MSN.COM> |
| Sender: | "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU> |
| From: | David L Cassell <davidlcassell@MSN.COM> |
| Subject: | Re: Dataset transform |
| In-Reply-To: | <200610061632.k96FvKPG012530@mailgw.cc.uga.edu> |
| Content-Type: | text/plain; format=flowed |
|---|
rajasekhargo@YAHOO.COM wrote:
>
>Dear SAS Users:
>
>I am trying to transform a dataset in the following way.
>
>My input table is like this:
>
>ID date num1 num2 num3
>A 1 3 5 4
>B 1 8 3 2
>C 1 2 6 7
>A 2 4 8 9
>B 2 9 0 1
>C 2 0 2 3
>A 3 2 3 4
>B 3 8 6 7
>C 3 6 4 5
>
>The output I want is like this. The variable names become
>oldvarname_datevalue
>
>ID num1_1 num1_2 num1_3 num2_1 num2_2 num2_3 num3_1 num3_2 num3_3
>A 3 4 2 5 8 3 4 9 4
>B 8 9 8 3 0 6 2 1 7
>C 2 0 6 6 2 4 7 3 5
>
>One solution might be to use Proc Transpose. But my input table is very
>large (about 1/4 million records and hundreds of variables). So I am
>concerned that any procedures that work from disk access might be slower.
>Moreover, using proc transpose, can we create variable names like I
>described above?
>
>I considered using hash tables (storing the data from the input dataset in
>memory and writing it out to an output dataset). But I couldn't figure out
>how to transform the shape of the o/p dataset.
>
>If there is any other solution, that is also welcome (like
>multidimensional arrays?).
>
>I appreciate your assistance.
>
>Thanks,
>Raj
First, please re-consider doing this. What you are asking for is a really
bad database design that will make later analyses much harder and make
data steps much more complex. If you have these data already in
temproal order, then you ought to be leaving the numbers as is in
order to use them properly in later SAS time series analyses. If you
only want the numbers like this for *display* purposes, look into PROC
TABULATE and PROC REPORT to string variable levels across the top
of your table.
Second, since the data have to be read once from disk no matter what,
I think that PROC TRANSPOSE would be as fast as anything else. Since
it can take advantage of your by-variables, it will also be smart enough
not to need all your RAM either.
Third, if you want complete control over your variable names, then
a data step may be the way to go. I don't see that you need a
hash solution, but you could try one.
And finally, please don't do this. If you have a really good reason,
then you can use some of the code that has already been given to
you. But most of the time that people ask this question, they do NOT
have a really good reason to do it.
HTH,
David
--
David L. Cassell
mathematical statistician
Design Pathways
3115 NW Norwood Pl.
Corvallis OR 97330
_________________________________________________________________
Share your special moments by uploading 500 photos per month to Windows Live
Spaces
http://clk.atdmt.com/MSN/go/msnnkwsp0070000001msn/direct/01/?href=http://www.get.live.com/spaces/features
|