Date: Fri, 2 Dec 2005 05:36:20 -0800
Reply-To: Paige Miller <paige.miller@ITT.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Paige Miller <paige.miller@ITT.COM>
Subject: Re: Influential observations
Content-Type: text/plain; charset="iso-8859-1"
David L Cassell wrote:
> HERMANS1@WESTAT.COM wrote:
> >This sounds suspiciously like a discussion of purely statistical issues,
> >usually my cue to exit stage left, but what you are saying matches up
> >closely with what I remember from earlier experience in econometric
> >modelling. The term 'reduced form' model meant a model used in
> >prediction as opposed to explanation. It included endogenous variables
> >as well as exogenous variables in a single equation, so it had some
> >degree of functional interdependence among variables as well as
> >unexplained collinearity. While collinearity reduced efficiency of
> >parameter estimation, exclusion of relevant variables introduced bias;
> >so, depending on the degrees of loss of efficiency and biases
> >introduced, it often made sense to trade collinearity and for bias
> Which is exactly why ridge regression was invented.
And exactly why PLS was invented. A paper by Frank and Friedman in
Technometrics (I don't have the exact reference) around 1992 showed
that both PLS and Ridge Regression have dramatically lower mean square
errors than ordinary least squares regression (either using the "full"
model or using variable selection methods). Both PLS and Ridge
Regression are biased, so the dramatically lower mean square errors
comes from the reduction in the variance of the parameter estimates.
While Ridge Regression "won" by generally having the lowest mean square
errors in Frank and Friedman's simulations, they were not dramatically
lower than PLS (and both were dramatically lower than any flavor of
OLS). The choice of whether to use PLS or Ridge Regression depends on
the application, and the non-quantifiable aspects of the methods; many
(including myself) believe that PLS is oftentimes more interpretable
and almost always leads itself to nice graphical displays (that Ridge
Regression does not offer) which help the understanding of the model
fit and the understanding of the data.
> >And what happens when we exclude one of a pair of collinear variables
> >and the remaining variable contains errors? Wouldn't the model with
> >collinear predictors yield more accurate predictions? I know that CART
> >and related models take advantage of proxies for missing values and
> >errors. That seems true in regression models as well.
> You would think that having two highly correlated variables in the model
> would *have*
> to be better than keeping only one. But the adjusted R-squared might
> disagree with
> you. If you maintain an extra variable which causes your X'X matrix to go
> rapidly toward
> being shockingly ill-conditioned, then you need some manner of adjustment.
> Of course,
> we now have those kinds of adjustments at our fingertips. We have
> and generalized sweep operators under the hood to make the computations
> We can actually *do* ridge regression and other methods in milliseconds
> instead of days.
> >Since it appear that I agree with you and David and Peter don't,
> >shouldn't you reconsider your stand on this issue? I usually find
> >statisticians on the opposite side of any stand that I take on any
> Funny! (Hopefully, other people will recognize that your tongue is firmly
> in your
> cheek here.) Are you going to sign Mikeeeeee's name to this one too? :-)
> I don't disagree with Paige. We're just both pushing a point of view based
> on the
> type of multi-collinearity problem we see the most often. Paige sees cases
> you can have maybe *dozens* of collinear variables that have subject-matter
> properties leading you want to keep all of them when possible. I'm more
> to see a set of variables for which data reduction would be meaningful, and
> useful. I think that real answers require actual hands-on EDA, rather than
> what Paige
> and Peter and I can provide over SAS-L. But the array of suggestions gives
> poster more options. And that's good.
David and I are most likely looking at different types of data; and
were I in his shoes, I would most likely be applying different methods
and thinking about things differently as well.
But the comment that "I usually find statisticians on the opposite side
of any stand that I take on any issue!" is an interesting one. PLS has
gained slow and grudging acceptance among statisticians, many still
avoid it like the plague (including many in the group that I used to
work in). I won't bother speculating why that is. And yet one can find
zillions (well, okay, I'm exaggerating, maybe 2/3 of a zillion)
articles in refereed publications where PLS has been used successfully
on real world problems.