Date: Sun, 25 Sep 2005 15:37:20 +0000
Reply-To: toby dunn <tobydunn@HOTMAIL.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: toby dunn <tobydunn@HOTMAIL.COM>
Subject: Re: Hosed by Transpose
In-Reply-To: <3pnskvFb8o6eU1@individual.net>
Content-Type: text/plain; format=flowed
Richard,
Hmmmmmmm makes me wonder why SAS even allows one to have the same var in the
var statement more than once. What purpose does it serve?
Toby Dunn
From: "Richard A. DeVenezia" <radevenz@IX.NETCOM.COM>
Reply-To: "Richard A. DeVenezia" <radevenz@IX.NETCOM.COM>
To: SAS-L@LISTSERV.UGA.EDU
Subject: Hosed by Transpose
Date: Sun, 25 Sep 2005 11:06:00 -0400
Hosed
<jargon> A somewhat humorous variant of "down", used
primarily by Unix hackers. "Hosed" implies a condition
thought to be relatively easy to reverse. It is also widely
used of people in the mainstream sense of "in an extremely
unfortunate situation".
Proc transpose creates one row for _every_ variable listed in the VAR
statement...(and now the hosing) even when a variable is listed more than
once.
-------------------------------------
data foo;
do rowId = 1 to 1;
array V X Y Z;
do _n_ = 1 to dim (V);
counter+1;
V[_n_] = counter;
end;
output;
end;
label
X = 'X factor'
Y = 'Y factor'
Z = 'Z factor'
;
drop counter;
run;
proc print;run;
--- output ---
Obs Id X Y Z
1 1 1 2 3
--- output ---
proc transpose data=foo out=bar;
by rowId;
var x y x y x;
run;
proc print;run;
--- output ---
Obs rowId _NAME_ _LABEL_ COL1
1 1 X X factor 1
2 1 Y Y factor 2
3 1 X X factor 1
4 1 Y Y factor 2
5 1 X X factor 1
--- output ---
...Even when a variable is listed more than once.
This aspect of TRANSPOSE can transform legitimate data into head scratching
misinformation.
Consider the case when TRANSPOSE is used to convert dozens of variables into
a categorical form. By unfortunate circumstance, one or more variables in
those dozens are listed more than once. Then a summary is computed,
grouping information based on the transposed _NAME_ or _LABEL_.
Since the variables listed in the VAR statement are not distinct you will
get undesired repetition (as demonstrated above) and miscounts!
salient and tacitly presumed information
rowId _NAME_ freq
1 X 1
1 Y 1
misinformation due to repetition of variables in transpose VAR statement
rowId _NAME_ freq
1 X 3
1 Y 2
The SAS log provides _no_ warnings about repeated variables, so user beware.
One would hope that in a future release SAS will warn automatically, or
have an option to cause a warning or error when this situation arises.
--
Richard A. DeVenezia
http://www.devenezia.com/