Date: Thu, 12 Jun 2008 19:43:20 -0400
Reply-To: "Howard Schreier <hs AT dc-sug DOT org>"
<schreier.junk.mail@GMAIL.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Howard Schreier <hs AT dc-sug DOT org>"
<schreier.junk.mail@GMAIL.COM>
Subject: Re: group count and plot
On Tue, 10 Jun 2008 14:27:18 -0400, Amit Sharma <amitshamit@REDIFFMAIL.COM>
wrote:
>Dear All,
>
>As I am new to the SAS world, am again stuck and need help.
>
>There is a dataset (name = ds) having var 'A' and 'B'.
>'A' is a binary var (attribute = 0 or 1). I want to plot a graph of
>fraction of '0's versus fraction of '1's of var A. These values of 'A' are
>already arranged (and have to be kept like that only) in increasing order
>of a var B (var B is a continous variable).
>
>So, I want to find that "when one of the attributes(say '0') reaches a
>multiple of decile (10%, 20%, 30%...100%), what part of the other
>attribute had been covered till then".
>
>For plotting the graph, I would want this kind of a table:
>PLEASE NOTE THAT one of the attribute will surely reach 100% before the
>other. So, there have to to be 11 pair of x and y values.
>
>Dummy Table: plot_this
>Attribute=0 Attribute=1
>10% 3%
>20% 7%
>30% 12%
>40% 19%
>50% 24%
>60% 30%
>70% 37%
>80% 57%
>90% 74%
>100% 89%
>100% 100%
>
>Thanks and Regards,
>Amit
Test data:
%let size=16;
data ds;
do _n_ = 1 to &size;
A = round(ranuni(23) + _n_ / (2 * &size) );
B + floor(ranuni(23) * 5);
output;
end;
run;
Note that you might have had more replies, and sooner, if you had provided
such data.
Next count the 1's and 0's:
proc summary data=ds nway;
class a;
output out=count0(rename = (_freq_=count0) where = (a=0) );
output out=count1(rename = (_freq_=count1) where = (a=1) );
run;
Now it's possible to compute the cumulative counts and percentages:
data plotpoints(drop = count0 count1);
if _n_=1 then do;
set count0(keep = count0);
set count1(keep = count1);
end;
seq + 1;
set ds;
A_0cum + (1 - a);
A_1cum + a ;
A_0cumpct = 100 * a_0cum / count0;
A_1cumpct = 100 * a_1cum / count1;
run;
Results:
seq A B A_0cum A_1cum A_0cumpct A_1cumpct
1 0 1 1 0 20 0.000
2 0 5 2 0 40 0.000
3 0 6 3 0 60 0.000
4 1 7 3 1 60 9.091
5 0 9 4 1 80 9.091
6 1 9 4 2 80 18.182
7 1 11 4 3 80 27.273
8 0 15 5 3 100 27.273
9 1 19 5 4 100 36.364
10 1 20 5 5 100 45.455
11 1 22 5 6 100 54.545
12 1 25 5 7 100 63.636
13 1 27 5 8 100 72.727
14 1 29 5 9 100 81.818
15 1 32 5 10 100 90.909
16 1 35 5 11 100 100.000
Then you can subset and interpolate, but I just let the last two columns go
into the plot:
proc gplot data=plotpoints;
plot A_1cumpct * A_0cumpct;
run;
quit;