Date: Fri, 18 Jul 2008 19:15:20 -0300 Hector Maletta "SPSSX(r) Discussion" Hector Maletta Re: Multicollinearity confusion To: jimjohn <18538040.post@talk.nabble.com> text/plain; charset="us-ascii"

The problem, Jim John, arises not exactly when the independent variables are correlated, but when they are (1) linearly correlated, and (2) the correlation is nearly 1. Between a variable and its square there is no linear correlation, except perhaps an approximately linear correlation for small ranges of variation. The real problem is, to be more precise, that no independent variable can be a perfect linear function of the rest of independent variables. Imagine, for instance, having one variable called TODAY, another variable DATEOFBIRTH, and a third variable AGETODAY. One of them is redundant. In that hypothetical case, one of the independent variables would be redundant, and the matrix of covariances would be singular (i.e. will have a zero determinant). Since computing the coefficients of regression involves dividing by that determinant, it would involve dividing by zero, and no real solution would exist. When the determinant is NEARLY zero, such as 0.000000001, a small change in any of the variables may cause large changes in the estimated coefficients, leading to unstable solutions. Moderate (or even relatively high) correlations among independent variables do not have this effect, and can be tolerated. The TOLERANCE criterion in the REGRESSION command (available in STEPWISE methods for instance) is used to decide whether or not to accept a new variable in the equation. The TOLERANCE criterion sets up the minimum value required for the determinant, below which a new variable is not included because it would cause practical multi-collinearity i.e. a very unstable solution. Hope this clarifies the issue.

Hector

-----Original Message----- From: SPSSX(r) Discussion [mailto:SPSSX-L@LISTSERV.UGA.EDU] On Behalf Of jimjohn Sent: 18 July 2008 18:33 To: SPSSX-L@LISTSERV.UGA.EDU Subject: Multicollinearity confusion

I'm a little confused. So, multicollinearity is a problem that can affect our regression results when the independent variables are correlated with each other. But many times, I see regression models like this: y = B0 + B1 *Factor1 + B2 * (Factor1)^squared

So, wouldn't Factor 1 and (Factor 1)^squared be highly correlated, thus resulting in a big collinearity problem? Any ideas why its ok here? Thanks. -- View this message in context: http://www.nabble.com/Multicollinearity-confusion-tp18538040p18538040.html Sent from the SPSSX Discussion mailing list archive at Nabble.com.

===================== To manage your subscription to SPSSX-L, send a message to LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

Back to: Top of message | Previous page | Main SPSSX-L page