Date: Fri, 17 Mar 2006 17:56:46 -0500
Reply-To: Pavlo Row <pavlo@INORBIT.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Pavlo Row <pavlo@INORBIT.COM>
Subject: Statistical Question--PROC LOGISTIC
Content-Type: text/plain; charset="iso-8859-1"
Hello,
I have a data set with 2 million records. This data set has only
about 6,000 responders, i.e. the response rate is very low at about
0.30% or so. I have many fields (about 150 or so) to model with PROC
LOGISTIC. Statistically speaking, to speed up the modeling process
and quickly find the best modeling candidates/variables, is it ok if
I kept all of the responders and only about 500K of the 2million or
so nonresponders? That would give me a response rate of about 1% or
so (True response rate as mentioned above is about 0.30% or so). This
way I will be working with a dataset of only about 500K + 6K records
as opposed to the 2million or so records. This, I am thinking, ought
to speed things up a bit!
Thank you.
pavlo
--
___________________________________________________
Play 100s of games for FREE! http://games.mail.com/
|