LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (January 2010, week 4)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Mon, 25 Jan 2010 09:25:45 -0400
Reply-To:     Muthia Kachirayan <muthia.kachirayan@GMAIL.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Muthia Kachirayan <muthia.kachirayan@GMAIL.COM>
Subject:      Re: creating cross-tab for multiple (38) variables
In-Reply-To:  <hjcq7u$gvg$1@news.eternal-september.org>
Content-Type: text/plain; charset=ISO-8859-1

Greg,

I understand your problem as to find ,

[1] the combinations of choices when several of the 38 choices are selected and indicated by the RESPONSE variable, where 1 stands for the particular choice is being selected and 0, otherwise, for each of the ID.

[2] the frequency of the combinations for all IDs. If this is true then go ahead.

Here is a small data set, HAVE, with 3 IDs each with a maximum of 5 possible choices to select from and if a choice is made, the corresponding variable, RESP, gets 1 else 0. The solutions that follow are still applicable when each ID has all 5 choices; further, the data set need only be grouped by ID but not necessarily be sorted by ID.

data have; input id choice $4. resp; cards; 10 AAAA 1 10 BB 0 10 CCC 0 10 D 1 10 EE 0 20 AAAA 1 20 CCC 0 20 D 1 20 EE 0 30 BB 1 30 EE 1 ; run;

The desired combination of choices by ID will be:

Choices_ Choices_ Obs id str combi 1 10 10010 AAAA|D 2 20 10010 AAAA|D 3 30 01001 BB|EE

The frequency of combinations are as:

Choices_ Choices_ Obs combi count 1 AAAA|D 2 2 BB|EE 1

The following data step creates a string variable, Choices_str, having a length for 5 chracters corresponding to the maximum number of choices for an ID in the data set. If the choice happens to be 'AAAA' then its first character will be marked as '1' and if it is 'EE' then the 5-th character will be marked as '1' and similarly for other choices. All those choices not selected will receive '0's. This arrangement keeps the counting of Choice_str across all IDs easy. The use of array K[ ] helps to locate the choice to mark the Choice_str as well as to prepare the concatenated string, Choices_combi , for the selected choices. Proc FORMAT or hash table can also be used for this purpose.

data need; array k[5] $4. _temporary_ ('AAAA','BB','CCC','D', 'EE'); length id 8. Choices_str $5.; Choices_str = '00000'; do until(last.id); set have; by id notsorted; length Choices_combi $15.;

if resp = 1 then do; m = WHICHC(choice, of k[*]); substr(Choices_str,m,1) = '1'; end; if last.id then do; do i = 1 to dim(k); if substr(Choices_str,i,1) eq '1' then Choices_combi = catx('|',Choices_combi,k[i]); end; output; Choices_str = '00000'; Choices_combi = ' '; end; end; keep id Choices_str Choices_combi ; run;

Another data step can be used to count the number of occurrences of Choices_str or Proc Summary can be used.

proc summary data = need nway; class Choices_combi; output out = want(rename = (_FREQ_ = Choices_count) drop = _TYPE_) ; run;

The data set, NEED and the data set produced by Proc Summary, WANT are displayed already.

Instead of the above two steps, 9.2 Hash object may be used to get the results in one step. This step uses the enhanced hash object features like SUMINC:'var' parameter specified when the hash object is created as well as .REF() . More on the use of these enhancements is expected when Dorfman presents his paper in the ensuing SAS Global Forum. The data set, NEED, gives the count of Choices combination and the data set, OUT_01, gives the Choices_str for each ID. One can transform this get the concatenated string too.

data need; if _n_ = 1 then do; array k[5] $4. _temporary_ ('AAAA','BB','CCC','D', 'EE'); length id 8. Choices_combi $16. Choices_count 8. Choices_str $5.; declare hash h(SUMINC:'count', hashexp:20); h.definekey('Choices_str'); h.definedone(); declare hash hid(ordered:'a'); hid.definekey('id'); hid.definedata('id','Choices_str'); hid.definedone(); end; do while (not done); set have end = done; by id notsorted; if resp = 1 then do; if hid.find() ne 0 then Choices_str = '00000'; m = WHICHC(choice, of k[*]); substr(Choices_str,m,1) = '1'; hid.replace(); end; if last.id then do; count = 1; h.REF(); end; end; declare hiter hi('h'); do rc = hi.first() by 0 while(rc = 0); h.sum(sum:Choices_count); do i = 1 to dim(k); if substr(Choices_str,i,1) eq '1' then Choices_combi = catx('|',Choices_combi,k[i]); end; output; Choices_combi = ' '; rc = hi.next(); end; hid.output(dataset:'out_01'); stop; keep Choices_combi Choices_count; run;

Even with SAS 9.1 with no enhancements, it can be done as below:

data need; if _n_ = 1 then do; array k[5] $4. _temporary_ ('AAAA','BB','CCC','D', 'EE'); length id 8. Choices_str $5. Choices_combi $16. Choices_count 8. ;

declare hash h(); h.definekey('Choices_str'); h.definedata('Choices_str','Choices_count'); h.definedone(); declare hash hid(ordered:'a'); hid.definekey('id'); hid.definedata('id','Choices_str'); hid.definedone(); end; do while (not done); set have end = done; by id notsorted; if resp = 1 then do; if hid.find() ne 0 then Choices_str = '00000'; m = WHICHC(choice, of k[*]); substr(Choices_str,m,1) = '1'; hid.replace(); end; if last.id then do; if h.find() ne 0 then Choices_count = 0; Choices_count + 1; h.replace(); end; end; declare hiter hi('h'); do rc = hi.first() by 0 while(rc = 0); do i = 1 to dim(k); if substr(Choices_str,i,1) eq '1' then Choices_combi = catx('|',Choices_combi,k[i]); end; output; Choices_combi = ' '; rc = hi.next(); end; hid.output(dataset:'out_01'); stop; keep Choices_combi Choices_count; run;

For the 38 choices per ID, the following step generates a sample data. This data set is used in the following data step and Proc Summary. Hash versions can be modified(lengths as well as array) to run the new sample data set.

data have; length id 8; array k[38] $4. _temporary_ ('e_01','e_02','e_03','e_04','e_05','e_06','e_07','e_08','e_09','e_10',

'e_11','e_12','e_13','e_14','e_15','e_16','e_17','e_18','e_19','e_20',

'e_21','e_22','e_23','e_24','e_25','e_26','e_27','e_28','e_29','e_30', 'e_31','e_32','e_33','e_34','e_35','e_36','e_37','e_38'); array in[38] _temporary_; do id = 1 to 10; do j = 1 to ceil(ranuni(123) * 38); do while(1); m = ceil(ranuni(123) * 38); if in[m] = . then do; choice = k[m]; resp = (ranuni(123) > 0.2); in[m] = 1; output; leave; end; end; end; call missing(of in[*]); end; drop m j; run;

data need; array k[38] $4. _temporary_ ('e_01','e_02','e_03','e_04','e_05','e_06','e_07','e_08','e_09','e_10',

'e_11','e_12','e_13','e_14','e_15','e_16','e_17','e_18','e_19','e_20',

'e_21','e_22','e_23','e_24','e_25','e_26','e_27','e_28','e_29','e_30', 'e_31','e_32','e_33','e_34','e_35','e_36','e_37','e_38'); length id 8.; length Choices_str $38.; Choices_str = '00000000000000000000000000000000000000'; do until(last.id); set have; by id; length Choices_combi $180.; if resp = 1 then do; m = WHICHC(choice, of k[*]); substr(Choices_str,m,1) = '1'; end; if last.id then do; do i = 1 to dim(k); if substr(Choices_str,i,1) eq '1' then Choices_combi = catx('|',Choices_combi,k[i]); end; output; Choices_str = '00000000000000000000000000000000000000'; Choices_combi = ' '; end; end; keep id Choices_str Choices_combi ; run;

Kind regards,

Muthia Kachirayan

On Fri, Jan 22, 2010 at 2:19 PM, Greg Dubrow <dubrowg___@ ___b*e*rk***el*e****y.e&d*u> wrote:

> data _null_; wrote: > > It is not clear to me what combinations you want. This example >> program, does not model the data very will. It does count all >> combinations that exist in the responses. A more detail example of >> the desired output would be helpful. >> > > What I'd love to get is a data set that produces all of the reported > combinations and keeps it attached to the student id. e.g. - > > id combo > 12345 thai cambodian > 12346 thai cambodian vietnamese > 12347 chicano korean > 12348 white black > 12349 black > 12350 latino > > etc. > > If this isn't possible then I'd be happy producing output of each of the > possible combos (and if possible put the label names in place of the > variable names) that looks like: > > Combo count > ethnic_a ethnic_b 25 > ethnic_ ethnic_b ethnic_c 3 > ethnic_c ethnic_d 8 > (or have the labels put in place of names) > > I'll try what you suggested, see what happens. > > thanks, > > -Greg >


Back to: Top of message | Previous page | Main SAS-L page