Date: Mon, 4 May 2009 23:20:27 +0800
Reply-To: Murphy Choy <goladin@GMAIL.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Murphy Choy <goladin@GMAIL.COM>
Subject: Re: Low probabilities estimates / Logistic Regression.
In-Reply-To: <FE10F31634E7F34B87AA143D59608541020CAFCD@EX-CMS01.westat.com>
Content-Type: text/plain; charset="us-ascii"
Hi Ed,
A suggestion will be to read this paper by Sigurd.
www2.sas.com/proceedings/forum2008/143-2008.pdf
--
Regards,
Murphy Choy
Certified Advanced Programmer for SAS V9
Certified Basic Programmer for SAS V9
DataShaping Certified SAS Professional
-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of Sigurd
Hermansen
Sent: Monday, May 04, 2009 10:32 PM
To: SAS-L@LISTSERV.UGA.EDU
Subject: Re: Low probabilities estimates / Logistic Regression.
Ed:
Other than efficiency of use of computing resources, I don't see any
compelling advantages to the sampling method that you are planning to use. A
summary of your data with counts for each class will work just as well when
you use the counts as weights in your statistical procedure. I've posted
explanations on SAS-L of how to compute counts efficiently by adapting Paul
Dorfman's hash indexing methods.
I do see advantages to repeated resampling from a sample, obtaining
estimates from many samples, and combining estimates. David Cassell, one of
the 'L's statistical wizards emeritus, has written several SGF papers on
resampling methods. Averaging estimates and "boosting", done properly, makes
estimates from a sample more robust. The distribution of estimates under
resampling helps the analyst assess the influence of outliers and omitted
variables. Yes, sample, but do so in a way that improves the chances of
obtaining accurate predictions.
S
-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of SAS
User
Sent: Monday, May 04, 2009 9:40 AM
To: SAS-L@LISTSERV.UGA.EDU
Subject: Re: Low probabilities estimates / Logistic Regression.
Thanks for answering Murphy,Target near to 1 (1-1,005) something like that
(because I deleted some outliers) Non target weight is close to 5. I made
the logistic regression as I described. Any better code to get the roc curve
and estimated probabilities (with
oversampling?)
Maybe proc surveylogistic?
Thanks,
Ed.
2009/5/2 Murphy Choy <goladin@gmail.com>
> Hi Ed,
>
> I am not sure whether what you did is correct but below are some
> suggestions for oversampling.
>
> First, set the weight of your target as 1.
>
> After which set the weight for the complement of target as P(Non
> target)/P(Target), where the P refers to the proportion of the item in
> the population.
>
> Once the above is set up, you can use the logistic regression as
> below.
>
> Below is an example of the above method applied.
>
> Example Case:
>
> 2000 cases of target.
>
> I will randomly select 2000 cases of non target.
>
> Set weight of target as 1.
>
> Set weight of non target as (0.96)/(0.4)=2.4
>
> Do the logistic regression.
>
> I am using an example from a SAS book called Credit Risk Scorecards.
>
> --
> Regards,
> Murphy Choy
>
> Certified Advanced Programmer for SAS V9
> Certified Basic Programmer for SAS V9
> DataShaping Certified SAS Professional
>
> -----Original Message-----
> From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of
> SAS User
> Sent: Saturday, May 02, 2009 9:04 PM
> To: SAS-L@LISTSERV.UGA.EDU
> Subject: Re: Low probabilities estimates / Logistic Regression.
>
> Murphy:I'm using oversampling.
> 1/5 for a group of the target.
> 1/1 for the other group.
> I used code like this:
> proc logistic;
> class [vars];
> model [var]=[vars];
> weight [var_weight];
> run;
> What am I doing wrong?
> Ed.
>
> 2009/4/30 Murphy Choy <goladin@gmail.com>
>
> Hi,
> >
> > Are you using oversampling?
> >
> > --
> > Regards,
> > Murphy Choy
> >
> > Certified Advanced Programmer for SAS V9
> > Certified Basic Programmer for SAS V9
> > DataShaping Certified SAS Professional
> >
> > -----Original Message-----
> > From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of
> > SAS User
> > Sent: Friday, May 01, 2009 7:38 AM
> > To: SAS-L@LISTSERV.UGA.EDU
> > Subject: Low probabilities estimates / Logistic Regression.
> >
> > Hello:
> > I'm making a logistic regression (using proc logistic with weights)
> > to model a very rare event and I'm obtaining as predictions very low
> > probabilities of
> > that event. Closer to 0 the most part of the observations the few with
> > cases
> > with highest probabilities for this case are observations with P(event)
> > near
> > 0,5.. Is that right?
> > Thanks a lot,
> > Ed.
> >
> >
>
>
|