| Date: | Tue, 16 Feb 2010 19:44:08 -0400 |
| Reply-To: | Muthia Kachirayan <muthia.kachirayan@GMAIL.COM> |
| Sender: | "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU> |
| From: | Muthia Kachirayan <muthia.kachirayan@GMAIL.COM> |
| Subject: | Re: sample random selection problem |
| In-Reply-To: | <2fc7f3341002161427i3cff00a1ka659cc96721584ae@mail.gmail.com> |
| Content-Type: | text/plain; charset=windows-1252 |
The following changes are required to generate random number for selecting
the Controls..
[1] add the statement
num_Controls = num;
between the following 2 statements.
tot_num = num;
CC_Flag = 1;
[2] Change
random1 = ceil(ranuni(123) * tot_num);
to
random1 = ceil(ranuni(123) * num_Controls);
[3] Change
random2 = ceil(ranuni(123) * tot_num);
to
random2 = ceil(ranuni(123) * num_Controls);
[4] Change
drop rc num tot_num random1 random2 ID CaseID selected found ;
to
drop rc num tot_num random1 random2 ID CaseID selected found num_Controls;
On Tue, Feb 16, 2010 at 6:27 PM, Muthia Kachirayan <
muthia.kachirayan@gmail.com> wrote:
> df ss,
>
> The following data sets have been used.
>
> data controls;
> input ID age sex race;
> cards;
> 20 18 2 2
> 49 18 2 2
> 51 18 2 2
> ;
> run;
>
> data cases;
> input ID age sex race;
> cards;
> 1 18 2 2
> 2 18 2 2
> 3 18 2 2
> 4 20 1 1
> run;
>
> Using the above datasets, the program that follows produced the output
> dataset:
>
> Old_ID CCID CC_Flag age sex race
> 1 1 1 18 2 2
> 51 1 2 18 2 2
> 20 1 2 18 2 2
> 2 2 1 18 2 2
> 49 2 2 18 2 2
> 3 3 1 18 2 2
> 4 4 0 20 1 1
>
> The variable Old_ID gives the corresponding observation numbers of the
> Case and its matched Controls from the respective data sets. This helps to
> link to the data sets if needed. CCID is the newly generated ID for the
> Case/Controls. CC_Flag gives 1 for the Case and 2 for the matched Controls.
> For successful match there should be one 1 and two 2's fro given CCID(See
> CCID = 1). In some cases there are not enough controls to choose from -
> only single 2 will be present (See CCID = 2). If a Case is present but not
> matched Controls then CCID is given a value of zero(See CCID =4).
>
> The following program uses hash tables to store the Controls data and
> suitable checks are made to choose matched Controls.
> The program can give a data set with ordered Controls and the selection
> status for the matching. Similarly it gives unique Control with the
> frequency of its occurrence. Uncomment the two statements at the end of the
> program to get them.
>
> Extensive testing has not been done and I leave it to OP to bring up any
> issues not addressed in this program.
>
> data need;
>
> if _n_ = 1 then do;
> length Old_ID CCID CC_Flag 8.;
> declare hash h(ordered:'y', hashexp:20);
> h.definekey('age','sex','race','num');
> h.definedata('id','age','sex','race','num','selected');
> h.definedone();
> declare hash cont();
> cont.definekey('age','sex','race');
> cont.definedata('age','sex','race','num');
> cont.definedone();
> *** load the hash tables;
> do until(z);
> set controls end = z;
> if cont.find() ne 0 then num = 0;
> num + 1;
> selected = 'N';
> h.add();
> cont.replace();
> end;
> end;
> do until(eof);
> set cases (rename = (ID = CaseID)) end = eof;
> CCID + 1;
> rc = cont.find();
> if rc ne 0 then do;
> CC_Flag = 0; **** no match for the Case;
> Old_ID = CaseID;
> output;
> end;
> else do;
> tot_num = num;
> CC_Flag = 1;
> Old_ID = CaseID;
> output; **** Case ;
> *** Choose First Control that was not already selected;
> do while(1);
> random1 = ceil(ranuni(123) * tot_num);
> num = random1;
> do rc = h.find() by 0 while( rc = 0 and selected = 'N');
> CC_Flag = 2;
> Old_ID = ID;
> output; **** First Control;
> selected = 'Y';
> h.replace();
> found = 1;
> end;
> if found = 1 then leave;
> end;
> *** Choose Second Control that was not already selected;
> do while(tot_num > 1);
> random2 = ceil(ranuni(123) * tot_num);
> if random2 ne random1 then do;
> num = random2;
> do rc = h.find() by 0 while(rc = 0 and selected = 'N');
> CC_Flag = 2;
> Old_ID = ID;
> output; **** Second Control;
> selected = 'Y';
> h.replace();
> found = 0;
> end;
> tot_num +- 1;
> end;
> if found = 0 then leave;
> end;
> end;
> end;
> *h.output(dataset:'out_01'); **** Ordered Controls ;
> *cont.output(dataset:'out_02'); **** Unique Controls with its frequency;
> stop;
> drop rc num tot_num random1 random2 ID CaseID selected found;
> run;
>
> proc print data = need ;
> run;
>
>
> Kind regards,
>
> Muthia Kachirayan
>
>
>
>
>
> On Mon, Feb 15, 2010 at 1:38 PM, df ss <tggsun@yahoo.com> wrote:
>
>> I have two datasets, one for case, another for control, they are identical
>> structure, control data is much larger than case one. I want to randomly
>> select 2 controls for each case based on case’s age, sex, race, etc. The
>> problem I have is I am using each case (age, sex, race,...) to find exactly
>> matched (same age, sex, race,...) two controls in control dataset(same
>> structured) -- randomly selected if control has more observations than two,
>> and no duplicate controls. In my final dataset, I want both cases and
>> controls in it, each observation has one unique ID, pair ID (same pair ID
>> for three records - one case and matched two controls), and one more
>> variable indicate case control status (with 2 values, such as case=1,
>> control=2).
>> Controls can only used once in the control data set.
>>
>> Case data:
>> ID age sex race
>> 1 12 1 1
>> 2 18 2 2
>> 3 30 1 3
>> 4 56 2 1
>> …
>> 100…
>>
>> Control data:
>> ID age sex race
>> 10 12 1 1
>> 20 18 2 2
>> 30 30 1 1
>> 40 50 2 2
>>
>> …
>>
>> 10000…
>>
>>
>>
>> My logic is for each case, I select all controls who meet the selection
>> criteria from control data, and then in this selected control pool I can
>> randomly pick two controls and assign the pair ID and case control status.
>> Then I will go back to another case, do the same thing, -- I used a fixed
>> control for each loop. I have difficulties to make the control data changed
>> (i.e., excluding controls being picked up in the previous loop).
>>
>> Do you have experience on similar problem?
>>
>> Thanks,
>>
>> dd sf
>>
>>
>>
>
>
>
|