Date: Wed, 5 May 2010 11:35:27 -0400
Reply-To: oloolo <dynamicpanel@YAHOO.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: oloolo <dynamicpanel@YAHOO.COM>
Subject: Re: Rare Events and SRS with or w/o replacement
for 1), I would not use SRSWR but SRSWOR on the non-events, and probablly
you should sample 50% on events and 10% on non-events. However, Murphy's
weighting approach is better as you can use more information, or use your
second approach, i.e boostrap
for 2), Yes, but only when you do SRSWR on both events and non-events, that
is you conduct systematic random sampling with replacement on both
subsamples. This is bootstrap approach is what underlies bagging and
usually will get you a more stable estimates by averaging.
just my $.02
On Wed, 5 May 2010 05:44:47 -0400, Pam Shin <pg2003sf@YAHOO.COM> wrote:
>I have a rare event (0.1%) and I am building a logistic regression model.
>To oversample, we take all the events and only a sample of non-events to
>increase the event rate to say 50%. I have some questions around the
>whole oversampling process-
>
>1) Should the sampling among the non-events be SRS with replacement or
>without replacement? To me SRSWR seems correct as it will mirror the
>actual data.
>
>2) To account for model building on only a select non-events, can we use
>the following process to get stable coeffs-
>where different samples say 10000 are generated taking all events and SRS
>w/o replacement of non-events. Keeping the model fixed, then the coeffs
>are estimated over each sample and then the significance is tested over
>these models. The final model is then the average of each coeffs. Is
>this
|