LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (November 2005, week 3)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Thu, 17 Nov 2005 17:25:06 -0800
Reply-To:     David L Cassell <davidlcassell@MSN.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         David L Cassell <davidlcassell@MSN.COM>
Subject:      Re: PROC SQL Help
Comments: To: BREWERDI@UCMAIL.UC.EDU
In-Reply-To:  <9871F0C990DF9B4681F1EB312A4D02F80A263C0C@ucmail8.ad.uc.edu>
Content-Type: text/plain; format=flowed

BREWERDI@UCMAIL.UC.EDU wrote back to me personally instead of to SAS-L: >David, > >Thanks for your kind suggestions. > >The one-to-one matching must be done by matching on begin_month, gender, >racegroup, and agegroup. If it were only begin_month, I wouldn't need >any elaborate coding scheme and I could use what you suggested. > >So I might have say 300 control records with month=7, but I need to >match the other three criteria as well; if I do a merge, I will get a >many-to-many or 1-to-many match, which would be erroneous. I don't >understand SQL and am learning by the seat of my pants. I tried the >merge and it was not working...too many matches, so I thought I would >look into the sql route. > >For example, I might have 8 case records that match on gender, agegroup, >and racrgroup; those eight records have one record each for >begin_month=7,8,9, and 12 and four records for month=11. That is what I >need to select from the control population (only 8 records, 1-to-1, per >the PI...I am just a grunt following orders =:) ). > >If you have any additional suggestions, they would be greatly >appreciated. > >Thanks again for your time and expertise.

Okay, now this is *exactly* why you want to muck about with PROC SURVEYSELECT. You're setting up properly to get your control for your cases. You say that you have exactly one case record for each of 8 different combinations of begin_month, gender, racegroup, and agegroup. If you don't feel comfortable doing this in PROC SQL, then do it in a data step (which just takes more steps) since you'll have a 1:many merge. If you were looking at a n:m merge, you couldn't attempt that as a straight data step merge. I'll show you the data step code. Sig or someone else will no doubt chime in with a single PROC SQL step to get the output. But since you say you don't grok SQL, I'll skip that for now.

proc sort data=cases out=cases2(keep=begin_month gender racegroup agegroup); by begin_month gender racegroup agegroup; run;

proc sort data=controls; by begin_month gender racegroup agegroup; run;

data combo; merge cases2(in=incases) controls; by begin_month gender racegroup agegroup; if incases; run;

Now all the above could be done in a single PROC SQL step. With n=4000, you probably won't notice much of a difference in time needed, either way. With a much larger data set, the PROC SL approach would transparently let you toss the problem to the SL optmizer, which would probably use a hash to pick out the relevant records before needing to do the sort at the end. That would be faster.

But now that the data are trimmed down to the relevant 8 cases by our merge, we can use PROC SURVEYSELECT to pick K controls (here we'll use 1) per cell. We'll cheat and call these cells our strata, and do what is really a stratified sample.

proc surveyselect data=combo out=YourSample sampsize=1 seed=947374; strata begin_month gender racegroup agegroup; run;

That's it. It's fairly straightforward to perform case-control sampling using this proc, but it's not a well-known technique. Normally, people do it using hideously complex macros.

HTH, David -- David L. Cassell mathematical statistician Design Pathways 3115 NW Norwood Pl. Corvallis OR 97330

_________________________________________________________________ Is your PC infected? Get a FREE online computer virus scan from McAfeeŽ Security. http://clinic.mcafee.com/clinic/ibuy/campaign.asp?cid=3963


Back to: Top of message | Previous page | Main SAS-L page