Date: Sun, 31 Dec 2006 10:51:09 -0500
Reply-To: "Howard Schreier <hs AT dc-sug DOT org>" <nospam@HOWLES.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Howard Schreier <hs AT dc-sug DOT org>" <nospam@HOWLES.COM>
Subject: Re: Comparison across columns??
On Sat, 30 Dec 2006 18:26:05 -0500, Margaret Asmani
<margaretasmani@HOTMAIL.CO.UK> wrote:
>Thanks very much everybody for the very useful suggestions.
>
>I use Florio's code as it's convenient not having to list the variables
>again considering that I have a large number. A follow-up question: is it
>possible to compute frequencies only for 1, i.e. the lowest values only?
>
>Happy New Year!!
>
>Margaret
Here's a variation on Florio's solution which should give you that flexibility.
data have;
input A B C D E;
cards;
25 32 65 32 89
65 32 47 89 96
41 52 63 96 45
12 45 65 98 78
45 56 89 78 63
45 89 56 89 56
56 89 56 45 78
45 56 56 78 89
12 45 98 65 78
12 12 32 35 56
;
Generate row numbers:
data numbered;
set have;
rownum + 1;
run;
Transpose, but into a normalized structure:
proc transpose data = numbered out = long(where = (_name_^='rownum') );
by rownum;
var _numeric_;
run;
Compute ranks:
proc rank data = long out=ranked ties=low;
by rownum;
var col1;
ranks Rank;
run;
Derive frequencies for entire set:
proc freq data = ranked;
tables _name_*rank / list nocum nopercent;
run;
Result:
_NAME_ Rank Frequency
---------------------------
A 1 8
A 2 1
A 3 1
B 1 2
B 2 5
B 3 1
B 4 1
B 5 1
C 2 4
C 3 2
C 4 2
C 5 2
D 1 1
D 2 1
D 3 1
D 4 5
D 5 2
E 2 2
E 3 1
E 4 3
E 5 4
To restrict output, just include the requisite WHERE statement:
proc freq data = ranked;
where rank=1;
tables _name_*rank / list nocum nopercent;
run;
Result:
_NAME_ Rank Frequency
---------------------------
A 1 8
B 1 2
D 1 1
On Fri, 29 Dec 2006 17:31:10 -0500, Florio Arguillas <foa2@CORNELL.EDU> wrote:
>Hi Margaret,
>
>Try this code. This takes into consideration the ties in your data set.
>
>The macro variable t in this code returns the total number of
>observations. If you do not want to use macros in your code, simply
>replace the &t's with the total number of observations in your
>dataset (in your example it is 10), and delete the call symput statement.
>
>The code first transposes the data to prep it for ranking to
>determine which variable has the lowest value, which variable has
>the next higher value, the next higher value etc. (as you
>requested). Once ranked, the dataset is re-transposed so that its
>structure will correspond with your source data set. Once transposed
>the source (HAVE) and output (NEED) data sets are merged
>(NEEDED). The Freq procedure displays how many times var A to E have
>been ranked 1 (lowest) to 5 (highest). Values of 1.5's, 2.5's, 3.5's
>and 4.5's indicate ties.
>
>HTH,
>
>Florio
>
>
>
>data have;
>input A B C D E;
>call symput ('t', compress(_n_));
>cards;
>25 32 65 32 89
>65 32 47 89 96
>41 52 63 96 45
>12 45 65 98 78
>45 56 89 78 63
>45 89 56 89 56
>56 89 56 45 78
>45 56 56 78 89
>12 45 98 65 78
>12 12 32 35 56
>;
>run;
>
>proc transpose data = have out = need;
>var a b c d e;
>run;
>
>proc rank data = need out=need;
>var col1-col&t;
>ranks rcol1-rcol&t;
>run;
>
>proc transpose data = need out = need (drop = _name_ _label_) prefix = RankOf;
>var rcol1-rcol&t;
>id _name_;
>run;
>
>data needed;
> merge have need;
>run;
>
>proc freq data = needed;
>table rankofa--rankofe;
>run;
>
>
>
>
>At 02:17 PM 12/29/2006, Margaret Asmani wrote:
>>Hi,
>>
>>I have 20 columns (variables) and want to compare their values.
>>For example,
>>
>>A B C D E
>>25 32 65 32 89
>>65 32 47 89 96
>>41 52 63 96 45
>>12 45 65 98 78
>>45 56 89 78 63
>>45 89 56 89 56
>>56 89 56 45 78
>>45 56 56 78 89
>>12 45 98 65 78
>>12 12 32 35 56
>>
>>and want to compare the values i.e 25,32,65,32,89 and at the end be able to
>>tell how many times variable A had the lowest value, which variable had the
>>next higher value, the next higher value etc.
>>
>>Any suggestions?