Date: Tue, 16 Feb 2010 19:44:08 -0400 Muthia Kachirayan "SAS(r) Discussion" Muthia Kachirayan Re: sample random selection problem <2fc7f3341002161427i3cff00a1ka659cc96721584ae@mail.gmail.com> text/plain; charset=windows-1252

The following changes are required to generate random number for selecting the Controls..

num_Controls = num; between the following 2 statements. tot_num = num; CC_Flag = 1;

[2] Change

random1 = ceil(ranuni(123) * tot_num); to random1 = ceil(ranuni(123) * num_Controls);

[3] Change

random2 = ceil(ranuni(123) * tot_num); to random2 = ceil(ranuni(123) * num_Controls);

[4] Change

drop rc num tot_num random1 random2 ID CaseID selected found ; to drop rc num tot_num random1 random2 ID CaseID selected found num_Controls;

On Tue, Feb 16, 2010 at 6:27 PM, Muthia Kachirayan < muthia.kachirayan@gmail.com> wrote:

> df ss, > > The following data sets have been used. > > data controls; > input ID age sex race; > cards; > 20 18 2 2 > 49 18 2 2 > 51 18 2 2 > ; > run; > > data cases; > input ID age sex race; > cards; > 1 18 2 2 > 2 18 2 2 > 3 18 2 2 > 4 20 1 1 > run; > > Using the above datasets, the program that follows produced the output > dataset: > > Old_ID CCID CC_Flag age sex race > 1 1 1 18 2 2 > 51 1 2 18 2 2 > 20 1 2 18 2 2 > 2 2 1 18 2 2 > 49 2 2 18 2 2 > 3 3 1 18 2 2 > 4 4 0 20 1 1 > > The variable Old_ID gives the corresponding observation numbers of the > Case and its matched Controls from the respective data sets. This helps to > link to the data sets if needed. CCID is the newly generated ID for the > Case/Controls. CC_Flag gives 1 for the Case and 2 for the matched Controls. > For successful match there should be one 1 and two 2's fro given CCID(See > CCID = 1). In some cases there are not enough controls to choose from - > only single 2 will be present (See CCID = 2). If a Case is present but not > matched Controls then CCID is given a value of zero(See CCID =4). > > The following program uses hash tables to store the Controls data and > suitable checks are made to choose matched Controls. > The program can give a data set with ordered Controls and the selection > status for the matching. Similarly it gives unique Control with the > frequency of its occurrence. Uncomment the two statements at the end of the > program to get them. > > Extensive testing has not been done and I leave it to OP to bring up any > issues not addressed in this program. > > data need; > > if _n_ = 1 then do; > length Old_ID CCID CC_Flag 8.; > declare hash h(ordered:'y', hashexp:20); > h.definekey('age','sex','race','num'); > h.definedata('id','age','sex','race','num','selected'); > h.definedone(); > declare hash cont(); > cont.definekey('age','sex','race'); > cont.definedata('age','sex','race','num'); > cont.definedone(); > *** load the hash tables; > do until(z); > set controls end = z; > if cont.find() ne 0 then num = 0; > num + 1; > selected = 'N'; > h.add(); > cont.replace(); > end; > end; > do until(eof); > set cases (rename = (ID = CaseID)) end = eof; > CCID + 1; > rc = cont.find(); > if rc ne 0 then do; > CC_Flag = 0; **** no match for the Case; > Old_ID = CaseID; > output; > end; > else do; > tot_num = num; > CC_Flag = 1; > Old_ID = CaseID; > output; **** Case ; > *** Choose First Control that was not already selected; > do while(1); > random1 = ceil(ranuni(123) * tot_num); > num = random1; > do rc = h.find() by 0 while( rc = 0 and selected = 'N'); > CC_Flag = 2; > Old_ID = ID; > output; **** First Control; > selected = 'Y'; > h.replace(); > found = 1; > end; > if found = 1 then leave; > end; > *** Choose Second Control that was not already selected; > do while(tot_num > 1); > random2 = ceil(ranuni(123) * tot_num); > if random2 ne random1 then do; > num = random2; > do rc = h.find() by 0 while(rc = 0 and selected = 'N'); > CC_Flag = 2; > Old_ID = ID; > output; **** Second Control; > selected = 'Y'; > h.replace(); > found = 0; > end; > tot_num +- 1; > end; > if found = 0 then leave; > end; > end; > end; > *h.output(dataset:'out_01'); **** Ordered Controls ; > *cont.output(dataset:'out_02'); **** Unique Controls with its frequency; > stop; > drop rc num tot_num random1 random2 ID CaseID selected found; > run; > > proc print data = need ; > run; > > > Kind regards, > > Muthia Kachirayan > > > > > > On Mon, Feb 15, 2010 at 1:38 PM, df ss <tggsun@yahoo.com> wrote: > >> I have two datasets, one for case, another for control, they are identical >> structure, control data is much larger than case one. I want to randomly >> select 2 controls for each case based on case’s age, sex, race, etc. The >> problem I have is I am using each case (age, sex, race,...) to find exactly >> matched (same age, sex, race,...) two controls in control dataset(same >> structured) -- randomly selected if control has more observations than two, >> and no duplicate controls. In my final dataset, I want both cases and >> controls in it, each observation has one unique ID, pair ID (same pair ID >> for three records - one case and matched two controls), and one more >> variable indicate case control status (with 2 values, such as case=1, >> control=2). >> Controls can only used once in the control data set. >> >> Case data: >> ID age sex race >> 1 12 1 1 >> 2 18 2 2 >> 3 30 1 3 >> 4 56 2 1 >> … >> 100… >> >> Control data: >> ID age sex race >> 10 12 1 1 >> 20 18 2 2 >> 30 30 1 1 >> 40 50 2 2 >> >> … >> >> 10000… >> >> >> >> My logic is for each case, I select all controls who meet the selection >> criteria from control data, and then in this selected control pool I can >> randomly pick two controls and assign the pair ID and case control status. >> Then I will go back to another case, do the same thing, -- I used a fixed >> control for each loop. I have difficulties to make the control data changed >> (i.e., excluding controls being picked up in the previous loop). >> >> Do you have experience on similar problem? >> >> Thanks, >> >> dd sf >> >> >> > > >

Back to: Top of message | Previous page | Main SAS-L page