LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (January 2005, week 2)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Fri, 14 Jan 2005 13:28:16 -0500
Reply-To:     "Nick ." <ni14@MAIL.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         "Nick ." <ni14@MAIL.COM>
Subject:      Statistical question--oversampled responders
Content-Type: text/plain; charset="iso-8859-1"

Hello Dear SAS experts,

I desperately need a solution to this statistical problem.

I am working with an outside vendor about getting modeling data. This vendor gives the data to our company and then a team looks at it and then send it over to me for modeling. Today I found out this horrible (?) thing that they had done. Here is the situation:

Someone has 7 million records of data. About 22,000 responders and the remaining nonresponders. This translates to a 0.3% response rate. So here is what they did.

They took out ALL 22,000 responders from the 7 million records and from the remaining non-responder population they randomly selected 950,000 records. So, they send me over a dataset of about 972,000 records having a response rate of about 2.3%. I built the model on that and today I find out that they had done that to me!!!! Clearly, I cannot use the model based on the 972,000 records to score the 7 million records due to the oversampling of the responders. As a sidenote, please keep in mind that I used a software package (KXEN) to built the model. I didn't use SAS and PROC LOGISTIC.

My question is this: Know the information above and knowing that I don't use PROC LOGISTIC, how can modify my left hand side of the equation, i.e. the responders to truly reflect the 0.3% response rate? I told them to go back and give me a RANDOM sample of 1 million records out of the 7 million (that would solve my problem) but they won't do that. They want me to built the model on the data they sent me and then somehow ***adjust*** for the oversampling of the responders.

Help please.

Thanks.

NICK -- ___________________________________________________________ Sign-up for Ads Free at Mail.com http://promo.mail.com/adsfreejump.htm


Back to: Top of message | Previous page | Main SAS-L page