LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (February 2010, week 3)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Tue, 16 Feb 2010 19:44:08 -0400
Reply-To:   Muthia Kachirayan <muthia.kachirayan@GMAIL.COM>
Sender:   "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:   Muthia Kachirayan <muthia.kachirayan@GMAIL.COM>
Subject:   Re: sample random selection problem
In-Reply-To:   <2fc7f3341002161427i3cff00a1ka659cc96721584ae@mail.gmail.com>
Content-Type:   text/plain; charset=windows-1252

The following changes are required to generate random number for selecting the Controls..

[1] add the statement

num_Controls = num; between the following 2 statements. tot_num = num; CC_Flag = 1;

[2] Change

random1 = ceil(ranuni(123) * tot_num); to random1 = ceil(ranuni(123) * num_Controls);

[3] Change

random2 = ceil(ranuni(123) * tot_num); to random2 = ceil(ranuni(123) * num_Controls);

[4] Change

drop rc num tot_num random1 random2 ID CaseID selected found ; to drop rc num tot_num random1 random2 ID CaseID selected found num_Controls;

On Tue, Feb 16, 2010 at 6:27 PM, Muthia Kachirayan < muthia.kachirayan@gmail.com> wrote:

> df ss, > > The following data sets have been used. > > data controls; > input ID age sex race; > cards; > 20 18 2 2 > 49 18 2 2 > 51 18 2 2 > ; > run; > > data cases; > input ID age sex race; > cards; > 1 18 2 2 > 2 18 2 2 > 3 18 2 2 > 4 20 1 1 > run; > > Using the above datasets, the program that follows produced the output > dataset: > > Old_ID CCID CC_Flag age sex race > 1 1 1 18 2 2 > 51 1 2 18 2 2 > 20 1 2 18 2 2 > 2 2 1 18 2 2 > 49 2 2 18 2 2 > 3 3 1 18 2 2 > 4 4 0 20 1 1 > > The variable Old_ID gives the corresponding observation numbers of the > Case and its matched Controls from the respective data sets. This helps to > link to the data sets if needed. CCID is the newly generated ID for the > Case/Controls. CC_Flag gives 1 for the Case and 2 for the matched Controls. > For successful match there should be one 1 and two 2's fro given CCID(See > CCID = 1). In some cases there are not enough controls to choose from - > only single 2 will be present (See CCID = 2). If a Case is present but not > matched Controls then CCID is given a value of zero(See CCID =4). > > The following program uses hash tables to store the Controls data and > suitable checks are made to choose matched Controls. > The program can give a data set with ordered Controls and the selection > status for the matching. Similarly it gives unique Control with the > frequency of its occurrence. Uncomment the two statements at the end of the > program to get them. > > Extensive testing has not been done and I leave it to OP to bring up any > issues not addressed in this program. > > data need; > > if _n_ = 1 then do; > length Old_ID CCID CC_Flag 8.; > declare hash h(ordered:'y', hashexp:20); > h.definekey('age','sex','race','num'); > h.definedata('id','age','sex','race','num','selected'); > h.definedone(); > declare hash cont(); > cont.definekey('age','sex','race'); > cont.definedata('age','sex','race','num'); > cont.definedone(); > *** load the hash tables; > do until(z); > set controls end = z; > if cont.find() ne 0 then num = 0; > num + 1; > selected = 'N'; > h.add(); > cont.replace(); > end; > end; > do until(eof); > set cases (rename = (ID = CaseID)) end = eof; > CCID + 1; > rc = cont.find(); > if rc ne 0 then do; > CC_Flag = 0; **** no match for the Case; > Old_ID = CaseID; > output; > end; > else do; > tot_num = num; > CC_Flag = 1; > Old_ID = CaseID; > output; **** Case ; > *** Choose First Control that was not already selected; > do while(1); > random1 = ceil(ranuni(123) * tot_num); > num = random1; > do rc = h.find() by 0 while( rc = 0 and selected = 'N'); > CC_Flag = 2; > Old_ID = ID; > output; **** First Control; > selected = 'Y'; > h.replace(); > found = 1; > end; > if found = 1 then leave; > end; > *** Choose Second Control that was not already selected; > do while(tot_num > 1); > random2 = ceil(ranuni(123) * tot_num); > if random2 ne random1 then do; > num = random2; > do rc = h.find() by 0 while(rc = 0 and selected = 'N'); > CC_Flag = 2; > Old_ID = ID; > output; **** Second Control; > selected = 'Y'; > h.replace(); > found = 0; > end; > tot_num +- 1; > end; > if found = 0 then leave; > end; > end; > end; > *h.output(dataset:'out_01'); **** Ordered Controls ; > *cont.output(dataset:'out_02'); **** Unique Controls with its frequency; > stop; > drop rc num tot_num random1 random2 ID CaseID selected found; > run; > > proc print data = need ; > run; > > > Kind regards, > > Muthia Kachirayan > > > > > > On Mon, Feb 15, 2010 at 1:38 PM, df ss <tggsun@yahoo.com> wrote: > >> I have two datasets, one for case, another for control, they are identical >> structure, control data is much larger than case one. I want to randomly >> select 2 controls for each case based on case’s age, sex, race, etc. The >> problem I have is I am using each case (age, sex, race,...) to find exactly >> matched (same age, sex, race,...) two controls in control dataset(same >> structured) -- randomly selected if control has more observations than two, >> and no duplicate controls. In my final dataset, I want both cases and >> controls in it, each observation has one unique ID, pair ID (same pair ID >> for three records - one case and matched two controls), and one more >> variable indicate case control status (with 2 values, such as case=1, >> control=2). >> Controls can only used once in the control data set. >> >> Case data: >> ID age sex race >> 1 12 1 1 >> 2 18 2 2 >> 3 30 1 3 >> 4 56 2 1 >> … >> 100… >> >> Control data: >> ID age sex race >> 10 12 1 1 >> 20 18 2 2 >> 30 30 1 1 >> 40 50 2 2 >> >> … >> >> 10000… >> >> >> >> My logic is for each case, I select all controls who meet the selection >> criteria from control data, and then in this selected control pool I can >> randomly pick two controls and assign the pair ID and case control status. >> Then I will go back to another case, do the same thing, -- I used a fixed >> control for each loop. I have difficulties to make the control data changed >> (i.e., excluding controls being picked up in the previous loop). >> >> Do you have experience on similar problem? >> >> Thanks, >> >> dd sf >> >> >> > > >


Back to: Top of message | Previous page | Main SAS-L page