|
The problem, Jim John, arises not exactly when the independent variables are
correlated, but when they are (1) linearly correlated, and (2) the
correlation is nearly 1. Between a variable and its square there is no
linear correlation, except perhaps an approximately linear correlation for
small ranges of variation. The real problem is, to be more precise, that no
independent variable can be a perfect linear function of the rest of
independent variables. Imagine, for instance, having one variable called
TODAY, another variable DATEOFBIRTH, and a third variable AGETODAY. One of
them is redundant.
In that hypothetical case, one of the independent variables would be
redundant, and the matrix of covariances would be singular (i.e. will have a
zero determinant). Since computing the coefficients of regression involves
dividing by that determinant, it would involve dividing by zero, and no real
solution would exist. When the determinant is NEARLY zero, such as
0.000000001, a small change in any of the variables may cause large changes
in the estimated coefficients, leading to unstable solutions.
Moderate (or even relatively high) correlations among independent variables
do not have this effect, and can be tolerated. The TOLERANCE criterion in
the REGRESSION command (available in STEPWISE methods for instance) is used
to decide whether or not to accept a new variable in the equation. The
TOLERANCE criterion sets up the minimum value required for the determinant,
below which a new variable is not included because it would cause practical
multi-collinearity i.e. a very unstable solution.
Hope this clarifies the issue.
Hector
-----Original Message-----
From: SPSSX(r) Discussion [mailto:SPSSX-L@LISTSERV.UGA.EDU] On Behalf Of
jimjohn
Sent: 18 July 2008 18:33
To: SPSSX-L@LISTSERV.UGA.EDU
Subject: Multicollinearity confusion
I'm a little confused. So, multicollinearity is a problem that can affect
our
regression results when the independent variables are correlated with each
other. But many times, I see regression models like this:
y = B0 + B1 *Factor1 + B2 * (Factor1)^squared
So, wouldn't Factor 1 and (Factor 1)^squared be highly correlated, thus
resulting in a big collinearity problem? Any ideas why its ok here? Thanks.
--
View this message in context:
http://www.nabble.com/Multicollinearity-confusion-tp18538040p18538040.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.
=====================
To manage your subscription to SPSSX-L, send a message to
LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
=====================
To manage your subscription to SPSSX-L, send a message to
LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
|