LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (May 2009, week 1)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Mon, 4 May 2009 23:20:27 +0800
Reply-To:     Murphy Choy <goladin@GMAIL.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Murphy Choy <goladin@GMAIL.COM>
Subject:      Re: Low probabilities estimates / Logistic Regression.
Comments: To: Sigurd Hermansen <HERMANS1@WESTAT.COM>
In-Reply-To:  <FE10F31634E7F34B87AA143D59608541020CAFCD@EX-CMS01.westat.com>
Content-Type: text/plain; charset="us-ascii"

Hi Ed,

A suggestion will be to read this paper by Sigurd.

www2.sas.com/proceedings/forum2008/143-2008.pdf

-- Regards, Murphy Choy

Certified Advanced Programmer for SAS V9 Certified Basic Programmer for SAS V9 DataShaping Certified SAS Professional

-----Original Message----- From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of Sigurd Hermansen Sent: Monday, May 04, 2009 10:32 PM To: SAS-L@LISTSERV.UGA.EDU Subject: Re: Low probabilities estimates / Logistic Regression.

Ed: Other than efficiency of use of computing resources, I don't see any compelling advantages to the sampling method that you are planning to use. A summary of your data with counts for each class will work just as well when you use the counts as weights in your statistical procedure. I've posted explanations on SAS-L of how to compute counts efficiently by adapting Paul Dorfman's hash indexing methods.

I do see advantages to repeated resampling from a sample, obtaining estimates from many samples, and combining estimates. David Cassell, one of the 'L's statistical wizards emeritus, has written several SGF papers on resampling methods. Averaging estimates and "boosting", done properly, makes estimates from a sample more robust. The distribution of estimates under resampling helps the analyst assess the influence of outliers and omitted variables. Yes, sample, but do so in a way that improves the chances of obtaining accurate predictions. S

-----Original Message----- From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of SAS User Sent: Monday, May 04, 2009 9:40 AM To: SAS-L@LISTSERV.UGA.EDU Subject: Re: Low probabilities estimates / Logistic Regression.

Thanks for answering Murphy,Target near to 1 (1-1,005) something like that (because I deleted some outliers) Non target weight is close to 5. I made the logistic regression as I described. Any better code to get the roc curve and estimated probabilities (with oversampling?) Maybe proc surveylogistic?

Thanks, Ed.

2009/5/2 Murphy Choy <goladin@gmail.com>

> Hi Ed, > > I am not sure whether what you did is correct but below are some > suggestions for oversampling. > > First, set the weight of your target as 1. > > After which set the weight for the complement of target as P(Non > target)/P(Target), where the P refers to the proportion of the item in > the population. > > Once the above is set up, you can use the logistic regression as > below. > > Below is an example of the above method applied. > > Example Case: > > 2000 cases of target. > > I will randomly select 2000 cases of non target. > > Set weight of target as 1. > > Set weight of non target as (0.96)/(0.4)=2.4 > > Do the logistic regression. > > I am using an example from a SAS book called Credit Risk Scorecards. > > -- > Regards, > Murphy Choy > > Certified Advanced Programmer for SAS V9 > Certified Basic Programmer for SAS V9 > DataShaping Certified SAS Professional > > -----Original Message----- > From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of > SAS User > Sent: Saturday, May 02, 2009 9:04 PM > To: SAS-L@LISTSERV.UGA.EDU > Subject: Re: Low probabilities estimates / Logistic Regression. > > Murphy:I'm using oversampling. > 1/5 for a group of the target. > 1/1 for the other group. > I used code like this: > proc logistic; > class [vars]; > model [var]=[vars]; > weight [var_weight]; > run; > What am I doing wrong? > Ed. > > 2009/4/30 Murphy Choy <goladin@gmail.com> > > Hi, > > > > Are you using oversampling? > > > > -- > > Regards, > > Murphy Choy > > > > Certified Advanced Programmer for SAS V9 > > Certified Basic Programmer for SAS V9 > > DataShaping Certified SAS Professional > > > > -----Original Message----- > > From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of > > SAS User > > Sent: Friday, May 01, 2009 7:38 AM > > To: SAS-L@LISTSERV.UGA.EDU > > Subject: Low probabilities estimates / Logistic Regression. > > > > Hello: > > I'm making a logistic regression (using proc logistic with weights) > > to model a very rare event and I'm obtaining as predictions very low > > probabilities of > > that event. Closer to 0 the most part of the observations the few with > > cases > > with highest probabilities for this case are observations with P(event) > > near > > 0,5.. Is that right? > > Thanks a lot, > > Ed. > > > > > >


Back to: Top of message | Previous page | Main SAS-L page