Date: Thu, 3 May 2007 15:01:03 -0400
Reply-To: Steve Denham <steven.c.denham@MONSANTO.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Steve Denham <steven.c.denham@MONSANTO.COM>
Subject: Re: interpreting PROC GLM output
On Thu, 3 May 2007 09:52:58 -0700, shankar.stat@GMAIL.COM wrote:
>whoa! what I have I got into!!..my simple query about lsmeans
>interpretation is about to put my whole modeling activity in
>suspicion:-)..
>
>well Pete, lets start with variable selection..the variables I am
>talking about are related to customer, competition, demographic/
>lifestyle, and some other business information, nationwide..
>
>an initial analysis with VARCLUS and PRINCOMP gave some insights..and
>then I used PROC CORR to test for collinearity and selected about 200
>of thousand variables..
>
>and then I have tried PROC GLMSELECT (lasso and lar) in addition to
>stepwise selection method (yes, I did try it various SLSTAY and
>SLENTRY values with stepwise)..
>
>I also looked at normal plots for variables and transformed variables
>(mostly LOG or SQRT to remove skewness)..
>and I dealt with missing values using variety of methods (mean
>substitution, regression or spline substitution using PROC MI)
>
>but honestly, I would say , at the end of all this variable selection
>activity--my manager was more helpful in understanding whether a
>particular variable really makes sense (he knows the business
>well)....
>
>I look forward to your paper on stepwise..I wonder what you are
>recommending instead of stepwise?..
>
>thanks;
>Shankar;
>
Just on a hunch, Shankar, I think Pete will recommend pretty much what you
listed there--especially about the part where you listen to the subject
matter folks about whether a particular variable really makes sense.
Just no stepwise stuff.
Or all possible subsets.
Steve Denham
Mathematical Biologist
Monsanto Co.
|