Date: Mon, 25 Jan 2010 09:25:45 -0400
Reply-To: Muthia Kachirayan <muthia.kachirayan@GMAIL.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Muthia Kachirayan <muthia.kachirayan@GMAIL.COM>
Subject: Re: creating cross-tab for multiple (38) variables
In-Reply-To: <hjcq7u$gvg$1@news.eternal-september.org>
Content-Type: text/plain; charset=ISO-8859-1
Greg,
I understand your problem as to find ,
[1] the combinations of choices when several of the 38 choices are selected
and indicated by the RESPONSE variable, where 1 stands for the particular
choice is being selected and 0, otherwise, for each of the ID.
[2] the frequency of the combinations for all IDs.
If this is true then go ahead.
Here is a small data set, HAVE, with 3 IDs each with a maximum of 5 possible
choices to select from and if a choice is made, the corresponding variable,
RESP, gets 1 else 0. The solutions that follow are still applicable when
each ID has all 5 choices; further, the data set need only be grouped by ID
but not necessarily be sorted by ID.
data have;
input id choice $4. resp;
cards;
10 AAAA 1
10 BB 0
10 CCC 0
10 D 1
10 EE 0
20 AAAA 1
20 CCC 0
20 D 1
20 EE 0
30 BB 1
30 EE 1
;
run;
The desired combination of choices by ID will be:
Choices_ Choices_
Obs id str combi
1 10 10010 AAAA|D
2 20 10010 AAAA|D
3 30 01001 BB|EE
The frequency of combinations are as:
Choices_ Choices_
Obs combi count
1 AAAA|D 2
2 BB|EE 1
The following data step creates a string variable, Choices_str, having a
length for 5 chracters corresponding to the maximum number of choices for an
ID in the data set. If the choice happens to be 'AAAA' then its first
character will be marked as '1' and if it is 'EE' then the 5-th character
will be marked as '1' and similarly for other choices. All those choices not
selected will receive '0's. This arrangement keeps the counting of
Choice_str across all IDs easy. The use of array K[ ] helps to locate the
choice to mark the Choice_str as well as to prepare the concatenated string,
Choices_combi , for the selected choices. Proc FORMAT or hash table can also
be used for this purpose.
data need;
array k[5] $4. _temporary_ ('AAAA','BB','CCC','D', 'EE');
length id 8. Choices_str $5.;
Choices_str = '00000';
do until(last.id);
set have;
by id notsorted;
length Choices_combi $15.;
if resp = 1 then do;
m = WHICHC(choice, of k[*]);
substr(Choices_str,m,1) = '1';
end;
if last.id then do;
do i = 1 to dim(k);
if substr(Choices_str,i,1) eq '1' then
Choices_combi = catx('|',Choices_combi,k[i]);
end;
output;
Choices_str = '00000';
Choices_combi = ' ';
end;
end;
keep id Choices_str Choices_combi ;
run;
Another data step can be used to count the number of occurrences of
Choices_str or Proc Summary can be used.
proc summary data = need nway;
class Choices_combi;
output out = want(rename = (_FREQ_ = Choices_count) drop = _TYPE_)
;
run;
The data set, NEED and the data set produced by Proc Summary, WANT are
displayed already.
Instead of the above two steps, 9.2 Hash object may be used to get the
results in one step. This step uses the enhanced hash object features like
SUMINC:'var' parameter specified when the hash object is created as well as
.REF() . More on the use of these enhancements is expected when Dorfman
presents his paper in the ensuing SAS Global Forum.
The data set, NEED, gives the count of Choices combination and the data set,
OUT_01, gives the Choices_str for each ID. One can transform this get the
concatenated string too.
data need;
if _n_ = 1 then do;
array k[5] $4. _temporary_ ('AAAA','BB','CCC','D', 'EE');
length id 8. Choices_combi $16. Choices_count 8. Choices_str $5.;
declare hash h(SUMINC:'count', hashexp:20);
h.definekey('Choices_str');
h.definedone();
declare hash hid(ordered:'a');
hid.definekey('id');
hid.definedata('id','Choices_str');
hid.definedone();
end;
do while (not done);
set have end = done;
by id notsorted;
if resp = 1 then do;
if hid.find() ne 0 then Choices_str = '00000';
m = WHICHC(choice, of k[*]);
substr(Choices_str,m,1) = '1';
hid.replace();
end;
if last.id then do; count = 1; h.REF(); end;
end;
declare hiter hi('h');
do rc = hi.first() by 0 while(rc = 0);
h.sum(sum:Choices_count);
do i = 1 to dim(k);
if substr(Choices_str,i,1) eq '1' then
Choices_combi = catx('|',Choices_combi,k[i]);
end;
output;
Choices_combi = ' ';
rc = hi.next();
end;
hid.output(dataset:'out_01');
stop;
keep Choices_combi Choices_count;
run;
Even with SAS 9.1 with no enhancements, it can be done as below:
data need;
if _n_ = 1 then do;
array k[5] $4. _temporary_ ('AAAA','BB','CCC','D', 'EE');
length id 8. Choices_str $5. Choices_combi $16. Choices_count 8. ;
declare hash h();
h.definekey('Choices_str');
h.definedata('Choices_str','Choices_count');
h.definedone();
declare hash hid(ordered:'a');
hid.definekey('id');
hid.definedata('id','Choices_str');
hid.definedone();
end;
do while (not done);
set have end = done;
by id notsorted;
if resp = 1 then do;
if hid.find() ne 0 then Choices_str = '00000';
m = WHICHC(choice, of k[*]);
substr(Choices_str,m,1) = '1';
hid.replace();
end;
if last.id then do;
if h.find() ne 0 then Choices_count = 0;
Choices_count + 1;
h.replace();
end;
end;
declare hiter hi('h');
do rc = hi.first() by 0 while(rc = 0);
do i = 1 to dim(k);
if substr(Choices_str,i,1) eq '1' then Choices_combi =
catx('|',Choices_combi,k[i]);
end;
output;
Choices_combi = ' ';
rc = hi.next();
end;
hid.output(dataset:'out_01');
stop;
keep Choices_combi Choices_count;
run;
For the 38 choices per ID, the following step generates a sample data. This
data set is used in the following data step and Proc Summary. Hash versions
can be modified(lengths as well as array) to run the new sample data set.
data have;
length id 8;
array k[38] $4. _temporary_
('e_01','e_02','e_03','e_04','e_05','e_06','e_07','e_08','e_09','e_10',
'e_11','e_12','e_13','e_14','e_15','e_16','e_17','e_18','e_19','e_20',
'e_21','e_22','e_23','e_24','e_25','e_26','e_27','e_28','e_29','e_30',
'e_31','e_32','e_33','e_34','e_35','e_36','e_37','e_38');
array in[38] _temporary_;
do id = 1 to 10;
do j = 1 to ceil(ranuni(123) * 38);
do while(1);
m = ceil(ranuni(123) * 38);
if in[m] = . then do;
choice = k[m];
resp = (ranuni(123) > 0.2);
in[m] = 1;
output;
leave;
end;
end;
end;
call missing(of in[*]);
end;
drop m j;
run;
data need;
array k[38] $4. _temporary_
('e_01','e_02','e_03','e_04','e_05','e_06','e_07','e_08','e_09','e_10',
'e_11','e_12','e_13','e_14','e_15','e_16','e_17','e_18','e_19','e_20',
'e_21','e_22','e_23','e_24','e_25','e_26','e_27','e_28','e_29','e_30',
'e_31','e_32','e_33','e_34','e_35','e_36','e_37','e_38');
length id 8.;
length Choices_str $38.;
Choices_str = '00000000000000000000000000000000000000';
do until(last.id);
set have;
by id;
length Choices_combi $180.;
if resp = 1 then do;
m = WHICHC(choice, of k[*]);
substr(Choices_str,m,1) = '1';
end;
if last.id then do;
do i = 1 to dim(k);
if substr(Choices_str,i,1) eq '1' then
Choices_combi = catx('|',Choices_combi,k[i]);
end;
output;
Choices_str = '00000000000000000000000000000000000000';
Choices_combi = ' ';
end;
end;
keep id Choices_str Choices_combi ;
run;
Kind regards,
Muthia Kachirayan
On Fri, Jan 22, 2010 at 2:19 PM, Greg Dubrow <dubrowg___@
___b*e*rk***el*e****y.e&d*u> wrote:
> data _null_; wrote:
>
> It is not clear to me what combinations you want. This example
>> program, does not model the data very will. It does count all
>> combinations that exist in the responses. A more detail example of
>> the desired output would be helpful.
>>
>
> What I'd love to get is a data set that produces all of the reported
> combinations and keeps it attached to the student id. e.g. -
>
> id combo
> 12345 thai cambodian
> 12346 thai cambodian vietnamese
> 12347 chicano korean
> 12348 white black
> 12349 black
> 12350 latino
>
> etc.
>
> If this isn't possible then I'd be happy producing output of each of the
> possible combos (and if possible put the label names in place of the
> variable names) that looks like:
>
> Combo count
> ethnic_a ethnic_b 25
> ethnic_ ethnic_b ethnic_c 3
> ethnic_c ethnic_d 8
> (or have the labels put in place of names)
>
> I'll try what you suggested, see what happens.
>
> thanks,
>
> -Greg
>
|