Date: Wed, 18 Jun 2008 13:38:41 -0500
Reply-To: Mary <mlhoward@avalon.net>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Mary <mlhoward@AVALON.NET>
Subject: Re: How to dynamically run a array ??
Content-Type: text/plain; charset="iso-8859-1"
It looks like your question is how to produce a summary with percents and counts. I had asked this question on 6/4/2008, and Howard gave an elegant response- basically normalize your data then proceed. Here is is again below:
-Mary
**** Howard's response ***
If I understand, you start with a data set which can be modeled by:
data have(drop=qnum);
do ID = 100 to 109;
array qq(*) Q1-Q3;
do qnum = 1 to dim(qq);
qq(qnum) = ceil( 5 * ranuni(123) );
end;
output;
end;
run;
Result:
Obs ID Q1 Q2 Q3
1 100 4 2 1
2 101 5 2 2
3 102 4 2 1
4 103 1 4 3
5 104 5 2 4
6 105 3 3 5
7 106 1 5 4
8 107 4 4 2
9 108 3 5 1
10 109 4 4 2
You need this summary:
Obs Response Q1_N Q1_Pct Q2_N Q2_Pct Q3_N Q3_Pct
1 1 2 20 0 0 3 30
2 2 0 0 4 40 3 30
3 3 2 20 1 10 1 10
4 4 4 40 3 30 2 20
5 5 2 20 2 20 1 10
I would first normalize the given data:
proc transpose data=have out=long(rename=(_name_=Q col1=Response) );
by id;
var q : ;
run;
Result:
Obs ID Q Response
1 100 Q1 4
2 100 Q2 2
3 100 Q3 1
4 101 Q1 5
5 101 Q2 2
6 101 Q3 2
7 102 Q1 4
8 102 Q2 2
9 102 Q3 1
10 103 Q1 1
11 103 Q2 4
12 103 Q3 3
13 104 Q1 5
14 104 Q2 2
15 104 Q3 4
16 105 Q1 3
17 105 Q2 3
18 105 Q3 5
19 106 Q1 1
20 106 Q2 5
21 106 Q3 4
22 107 Q1 4
23 107 Q2 4
24 107 Q3 2
25 108 Q1 3
26 108 Q2 5
27 108 Q3 1
28 109 Q1 4
29 109 Q2 4
30 109 Q3 2
Next, aggregate:
proc sql;
create table aggregated as
select q, response, count(*) as N
from long
group by q, response;
Result:
Obs Q Response N
1 Q1 1 2
2 Q1 3 2
3 Q1 4 4
4 Q1 5 2
5 Q2 2 4
6 Q2 3 1
7 Q2 4 3
8 Q2 5 2
9 Q3 1 3
10 Q3 2 3
11 Q3 3 1
12 Q3 4 2
13 Q3 5 1
I find in every project there is inevitably some little semi-critical detail
that ends up requiring the most complicated code. In this case it is the two
non-existent intersections in this table. You probably want zeros for them
in your end result, not missing values. So, impute those zeroes:
create table withzeroes as
select q, response, coalesce(n,0) as N
from aggregated
natural right join
( select * from (select distinct q from aggregated)
cross join
(select distinct response from aggregated)
);
Result:
Obs Q Response N
1 Q1 1 2
2 Q1 2 0
3 Q1 3 2
4 Q1 4 4
5 Q1 5 2
6 Q2 1 0
7 Q2 2 4
8 Q2 3 1
9 Q2 4 3
10 Q2 5 2
11 Q3 1 3
12 Q3 2 3
13 Q3 3 1
14 Q3 4 2
15 Q3 5 1
Next, expand the table by computing the percents and package everything for
PROC TRANSPOSE:
create view fortranspose as
select response, catt(q,'_N') as Stat, n as Value
from withzeroes
union all
select response, catt(q,'_Pct'), 100 * n / sum(n)
from withzeroes
group by q
order by response, stat;
quit;
Result:
Obs Response Stat Value
1 1 Q1_N 2
2 1 Q1_Pct 20
3 1 Q2_N 0
4 1 Q2_Pct 0
5 1 Q3_N 3
6 1 Q3_Pct 30
7 2 Q1_N 0
8 2 Q1_Pct 0
9 2 Q2_N 4
10 2 Q2_Pct 40
11 2 Q3_N 3
12 2 Q3_Pct 30
13 3 Q1_N 2
14 3 Q1_Pct 20
15 3 Q2_N 1
16 3 Q2_Pct 10
17 3 Q3_N 1
18 3 Q3_Pct 10
19 4 Q1_N 4
20 4 Q1_Pct 40
21 4 Q2_N 3
22 4 Q2_Pct 30
23 4 Q3_N 2
24 4 Q3_Pct 20
25 5 Q1_N 2
26 5 Q1_Pct 20
27 5 Q2_N 2
28 5 Q2_Pct 20
29 5 Q3_N 1
30 5 Q3_Pct 10
Finally, re-shape:
proc transpose data=fortranspose out=want(drop=_name_);
by response;
id stat;
var value;
run;
Result:
Obs Response Q1_N Q1_Pct Q2_N Q2_Pct Q3_N Q3_Pct
1 1 2 20 0 0 3 30
2 2 0 0 4 40 3 30
3 3 2 20 1 10 1 10
4 4 4 40 3 30 2 20
5 5 2 20 2 20 1 10
The above is a series of relatively simple steps generating intermediate
results, but developing them depends on knowing the overall requirement.
Howard also added this:
I think I was too focused on SQL. The two CREATE TABLE statements (to
aggregate, then impute the zeros) can be replaced with just:
proc summary data=long completetypes nway;
class q response;
output out=withzeroes(rename=(_freq_=n) drop=_type_);
run;
And, if the end result is for presentation rather than subsequent
processing, the CREATE VIEW statement and the PROC TRANSPOSE which follows
can be replaced with:
proc tabulate data=withzeroes noseps formchar="|-+-+|+|+-+";
class q response;
var n;
table response
,
q='' * n='' * (sum='N' pctsum<response>='PctN') * f=best6.;
run;
Result:
+----------------------------------------------------------------+
| | Q1 | Q2 | Q3 |
| |-------------+-------------+-------------|
| | N | PctN | N | PctN | N | PctN |
|----------------------+------+------+------+------+------+------|
|Response | | | | | | |
|1 | 2| 20| 0| 0| 3| 30|
|2 | 0| 0| 4| 40| 3| 30|
|3 | 2| 20| 1| 10| 1| 10|
|4 | 4| 40| 3| 30| 2| 20|
|5 | 2| 20| 2| 20| 1| 10|
+----------------------------------------------------------------+
In other words, once the given table is normalized, the task can be
completed with a fairly simple PROC SUMMARY and a fairly simple PROC TABULATE.
----- Original Message -----
From: SAS_learner
To: SAS-L@LISTSERV.UGA.EDU
Sent: Wednesday, June 18, 2008 12:41 PM
Subject: How to dynamically run a array ??
hello guys,
I have a dataset like Cm3 like this now
_1 _2 _3
1 2 3
6 3 1
3 3 3
6 3 1
3 3 3
1 2 3
and I am following code to get the Percents and % to that count. Now my
problem is that this code has to changed every time there is increment in
the treatment group and I had to increase or decrease the row number ?? Can
I tell the SAS to look out how many variables that are starting _ and run
the following code any Ideas.
data aaa;
length grp1 grp2 grp3 allgrp $20.;
set cm3 (keep= _:) ;
_total=sum(_1,_2,_3);/*Need to Pick the number of individual treatment */
array _dgrp(4) _1 _2 _3 _total;
array _grppc(4) grppc1 - grppc3 allgrppc;
array _grp(4) $ grp1 - grp3 allgrp;
array _d (4) _d1 -_d4 (&d1 &d2 &d3 &dtotal);
do i=1 to 4;
_grppc(i)=round(_dgrp(i)/_d(i)*100,.1);
_grp(i)=put(_dgrp(i),3.)||' ('||put(_grppc(i),5.1)||'%)';
end;
run;
thanks