LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (June 2006, week 5)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Thu, 29 Jun 2006 10:29:51 -0700
Reply-To:     Daqing Zhao <dlouiszhao@GMAIL.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Daqing Zhao <dlouiszhao@GMAIL.COM>
Subject:      Re: Variable or model selection methods
Comments: To: Sigurd Hermansen <HERMANS1@westat.com>
In-Reply-To:  <CA8F89971ADA9F47A6C915BA23978442011A7C82@MAILBE2.westat.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Thanks for the message.

I take that there are cases where domain expert or content specialist knows what the important drivers are. If you are trying to predict the trajectory of some planets, you don't want to include factors other than the force field, time and initial conditions.

There are cases where you don't know what drives the target variable, such as the stock price of some company or cause of some cancer. You try to find predictors and that's part of the game.

I know someone who was anal about Markov Blanket, which to me is a defintion rather than a recipe.

Regards,

Daqing

On 6/28/06, Sigurd Hermansen <HERMANS1@westat.com> wrote: > > Daqing: > To paraphrase our resident scourge of all things stepwise, 'All stepwise > methods are wrong. Other automatic model selection methods are wrong, > too, but not as bad as stepwise'. > > No, that's not a paraphrase. It's a summary. > > Why does he disparage step-wise methods? Perhaps he believes that > content specialists should know more than a computer chip about what > determines what. He may also know that models selected stepwise don't > hold up well when applied to samples other than the samples used to > estimate them. > > Given the current state of the art, I prefer stochastic gradient > boosting (say, TreeNet) as an exploratory tool, though content > specialists should agree on almost all predictors in a model. The LASSO > and other regularization methods may help deal with collinear > predictors, but outliers and leverage points in a sample may still leave > you with a bad model. For now the SAS-L Archives have enough postings on > model selection to keep you occupied for the next few months.... > Sig > > > -----Original Message----- > From: owner-sas-l@listserv.uga.edu [mailto:owner-sas-l@listserv.uga.edu] > On Behalf Of Daqing Zhao > Sent: Wednesday, June 28, 2006 2:29 PM > To: SAS-L@listserv.uga.edu > Subject: Variable or model selection methods > > > Hi All, > > I often need to select a limited number of important variables from a > large set for prediction and often wonder what the best methodology is. > Of course different people say different ones are the best. I have all > kinds of variables, categorical, binary, numeric, ordinal and many of > them correlated and sparse. > > Can someone recommend a good method for doing that? > > Some say proc logistic stepwise is bad. How about CART gini index > reduction, Lasso, leave one variable out, and there are also mutual > information, Markov blanket, and others? Comments on accuracy, > robustness (for type of variables, missing data, outliers, etc), and > efficiency (need > googleplex) ? > > Thanks, > > Daqing >


Back to: Top of message | Previous page | Main SAS-L page