LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (December 2005, week 1)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Wed, 7 Dec 2005 13:25:09 -0800
Reply-To:     David L Cassell <davidlcassell@MSN.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         David L Cassell <davidlcassell@MSN.COM>
Subject:      Re: Subject: Logist Model Build--How big a dataset to use
In-Reply-To:  <200512072026.jB7K3ur4020446@mailgw.cc.uga.edu>
Content-Type: text/plain; format=flowed

ni14@MAIL.COM wrote back: >and D.C. responded (in part) > > >>> >The best way to avoid unnecessary variables creeping into the model >(assuming you are doing >the right things in the model building process) is to include as many >observations as you can. >The possible bad effects of any bad data can be swamped if you have enough >good data. :-)

>D.C. wanted me to site the book I got my question inspired from. I promised >I would list the book reference: > >STATISTICAL MODELING and ANALYSIS for DATABASE MARKETING by Bruce Ratner > >On p.36 (under logistic Chapter 3) he says: > >"There is statistical factoid that states if the true model can be built >with small data, then the model built with extra big data produces large >prediction error variance. Data analysts are never aware of the true model, >but are guided when building it by the principle of simplicity. Therefore, >it is wisest to build the model with small data. If the predictions are >good, then the model is a good approximation of the true model; if >predictions are not acceptable, then the EDA procedure prescribes an >increase data sixe (by adding predictor variables and individuals) until >the model produces good predictions. The data size, with which the model >produces good predictions, is big enough. If extra big data are used, >unnecessary variables tend to creep into the model, thereby increasing the >prediction-error variance."

I don't find this well-written, but as far as I can tell, the author is urging you to avoid having too many *variables*, not to avoid having too many data points.

So I stand by my previous blithering. :-)

David -- David L. Cassell mathematical statistician Design Pathways 3115 NW Norwood Pl. Corvallis OR 97330

_________________________________________________________________ On the road to retirement? Check out MSN Life Events for advice on how to get there! http://lifeevents.msn.com/category.aspx?cid=Retirement


Back to: Top of message | Previous page | Main SAS-L page