Date: Sun, 4 May 2008 22:07:40 -0400
Reply-To: "Howard Schreier <hs AT dc-sug DOT org>"
<schreier.junk.mail@GMAIL.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Howard Schreier <hs AT dc-sug DOT org>"
<schreier.junk.mail@GMAIL.COM>
Subject: Re: proc tabulate, sum/pctsum question
On Fri, 2 May 2008 18:49:56 +0300, Mugu Gai Pan <mugu@PCN.NO> wrote:
>suppose I had the following data
>
>data test;input color blend $ saturation day $;_x=1;
>datalines;
>8 M 1 2-Mm
>9 M 1 2-Mm
>8 M 1 1-Jn
>9 M 1 1-Jn
>;;
>data _null_;set ;put(_all_)(=);run;
>title;
>dm 'clear output';
>proc tabulate noseps formchar=' ';
>class color blend saturation day; var _x;
>table blend=' '*
> (day=' '* all=' '),
> color=' '*
> (_x=' ')*
> (N='#'*F=4. PCTSUM<blend*day*color>='%'*F=5.1) / rts=20;
>run;
>
>this produces the output of 4 cells, each contributing 25% of the data, that
>is fine.
>
>but, what if I wanted a sum of the color (subtype) and I wanted to know PCT
>subtype 1 (M1) of the sum(M1,M2) ?
>The goal would be to report only the color M1 but show it as a percent of
>the sum of M1 and M2.
>
>data test;input color blend $ saturation day $;_x=1;
>datalines;
>8 M1 1 2-Mm
>9 M1 1 2-Mm
>8 M1 1 1-Jn
>9 M1 1 1-Jn
>8 M2 1 2-Mm
>9 M2 1 2-Mm
>8 M2 1 1-Jn
>9 M2 1 1-Jn
>8 M2 1 2-Mm
>9 M2 1 2-Mm
>8 M2 1 1-Jn
>9 M2 1 1-Jn
>;;
I'm going to try a simpler example. First, data:
data simplertest;
input Color Blend $ Amount;
cards;
8 M1 1
8 M1 2
8 M3 3
9 M1 1
9 M3 2
9 M2 1
;
The code for the straightforward table:
proc tabulate data=simplertest formchar="|-+++|+|+++";
class color blend;
var amount;
table blend
,
color * ( n
amount * pctsum ) * f=best6.;
run;
It produces this table:
+----------------------+---------------------------+
| | Color |
| |-------------+-------------|
| | 8 | 9 |
| |------+------+------+------|
| | |Amount| |Amount|
| | |------| |------|
| | N |PctSum| N |PctSum|
|----------------------+------+------+------+------|
|Blend | | | | |
|----------------------| | | | |
|M1 | 2| 30| 1| 10|
|----------------------+------+------+------+------|
|M2 | .| .| 1| 10|
|----------------------+------+------+------+------|
|M3 | 1| 30| 1| 20|
+----------------------+------+------+------+------+
Wanted: a way to suppress one or more rows (ie, values of BLEND), while
keeping the remaining rows unchanged (am I right about the essence of the
problem?). For example, omitting M3:
+----------------------+---------------------------+
| | Color |
| |-------------+-------------|
| | 8 | 9 |
| |------+------+------+------|
| | |Amount| |Amount|
| | |------| |------|
| | N |PctSum| N |PctSum|
|----------------------+------+------+------+------|
|Blend | | | | |
|----------------------| | | | |
|M1 | 2| 30| 1| 10|
|----------------------+------+------+------+------|
|M2 | .| .| 1| 10|
+----------------------+------+------+------+------+
I've found that such problems can sometimes be solved by pre-aggregating the
data and then using PROC TABULATE for the presentation of the results. To
continue with the example, aggregate with PROC SQL:
proc sql;
create view fortabulate as
select *
from
(select color, sum(amount) as denom
from simplertest)
natural join
(select color, blend, count(*) as many, sum(amount) as Amount
from simplertest
group by color, blend);
quit;
Result:
Color Blend many Amount denom
------------------------------------------------
8 M1 2 3 10
8 M3 1 3 10
8 M1 2 3 10
8 M3 1 3 10
8 M1 2 3 10
8 M3 1 3 10
9 M1 1 1 10
9 M2 1 1 10
9 M3 1 2 10
9 M1 1 1 10
9 M2 1 1 10
9 M3 1 2 10
9 M1 1 1 10
9 M2 1 1 10
9 M3 1 2 10
Finally, the PROC TABULATE code:
proc tabulate data=fortabulate formchar="|-+++|+|+++";
class color blend;
where blend NE 'M3';
var many amount denom;
table blend
,
color * ( many='' * mean='N'
amount * pctsum<denom> ) * f=best6.;
run;