LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (July 2007, week 4)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Thu, 26 Jul 2007 14:21:47 -0700
Reply-To:   David L Cassell <davidlcassell@MSN.COM>
Sender:   "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:   David L Cassell <davidlcassell@MSN.COM>
Subject:   Re: Modeling Question--Transforming a Variable
In-Reply-To:   <1184695961.675990.20540@g12g2000prg.googlegroups.com>
Content-Type:   text/plain; format=flowed

shiling99@YAHOO.COM sagely replied: > >With 10 million and 0.3% of them are events. You have 30000 events. I >would suggest you randomly select 20000/15000 among them and left >10000/15000 for validation purpose . And you also randomly 20000/15000 >from no events. In this way, you cut down the size a lot and make your >developping process easier and does not loss much of estimation >efficiency. If you still think it is too big, you may scale down >further. Be careful in calculating the sample weight or you may use >proc surveyselect. proc surveyselect will spit out the weight for you. >This approach will have much higher efficiency than a simple random >sampling in which there are less events. > >HTH

Good points all. Thanks for pointing this out.

Let me add a couple thoughts:

Due to the small proportion we are sampling for, we may need to stratify on additional variables so we get enough values in various categories of auxiliary variables. This is a generally-ignored problem, because it requires stopping and thinking about the data. :-)

Since the poster will end up with a survey sample from a finite population (the original database), and that sample may be fairly complex (at a minimum, we will have differing sampling weights and probably stratification), we need to use PROC SURVEYLOGISTIC instead of PROC LOGISTIC to get the right variances.

HTH, David -- David L. Cassell mathematical statistician Design Pathways 3115 NW Norwood Pl. Corvallis OR 97330

_________________________________________________________________ http://liveearth.msn.com


Back to: Top of message | Previous page | Main SAS-L page