LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (May 2010, week 1)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Wed, 5 May 2010 11:35:27 -0400
Reply-To:     oloolo <dynamicpanel@YAHOO.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         oloolo <dynamicpanel@YAHOO.COM>
Subject:      Re: Rare Events and SRS with or w/o replacement
Comments: To: Pam Shin <pg2003sf@YAHOO.COM>

for 1), I would not use SRSWR but SRSWOR on the non-events, and probablly you should sample 50% on events and 10% on non-events. However, Murphy's weighting approach is better as you can use more information, or use your second approach, i.e boostrap

for 2), Yes, but only when you do SRSWR on both events and non-events, that is you conduct systematic random sampling with replacement on both subsamples. This is bootstrap approach is what underlies bagging and usually will get you a more stable estimates by averaging.

just my $.02

On Wed, 5 May 2010 05:44:47 -0400, Pam Shin <pg2003sf@YAHOO.COM> wrote:

>I have a rare event (0.1%) and I am building a logistic regression model. >To oversample, we take all the events and only a sample of non-events to >increase the event rate to say 50%. I have some questions around the >whole oversampling process- > >1) Should the sampling among the non-events be SRS with replacement or >without replacement? To me SRSWR seems correct as it will mirror the >actual data. > >2) To account for model building on only a select non-events, can we use >the following process to get stable coeffs- >where different samples say 10000 are generated taking all events and SRS >w/o replacement of non-events. Keeping the model fixed, then the coeffs >are estimated over each sample and then the significance is tested over >these models. The final model is then the average of each coeffs. Is >this

Back to: Top of message | Previous page | Main SAS-L page