Date: Thu, 1 Oct 2009 11:27:04 -0300 Hector Maletta "SPSSX(r) Discussion" Hector Maletta Re: Multiple Linear Regression vs a series of simple linear regression on the presence of multicollinearity To: "Whanger, J. Mr. CTR" <002B0C651145334D91A510A9D4BA1B6E178E2F@NHCNE-EX-VS.nmed.ds.med.navy.mil> text/plain; charset="us-ascii"

Not out of my head: I just remembered the piece of information but not the precise source. Unfortunately I am now travelling and have little chance to look in the books I suspect have the answer. However, it is quite common knowledge that at least 30/50 cases are needed for a normal distribution to take shape. Hector -----Original Message----- From: Whanger, J. Mr. CTR [mailto:James.Whanger@med.navy.mil] Sent: 01 October 2009 10:53 To: Hector Maletta; SPSSX-L@LISTSERV.UGA.EDU Subject: RE: Re: Multiple Linear Regression vs a series of simple linear regression on the presence of multicollinearity

Hector,

Is there any chance you have a citation for the monte carlo experiments you mentioned?

Thanks,

Jim

-----Original Message----- From: SPSSX(r) Discussion [mailto:SPSSX-L@LISTSERV.UGA.EDU] On Behalf Of Hector Maletta Sent: Wednesday, September 30, 2009 4:25 PM To: SPSSX-L@LISTSERV.UGA.EDU Subject: Re: Multiple Linear Regression vs a series of simple linear regression on the presence of multicollinearity

In addition to Bruce's comment: 1. In multiple regression, each coefficient tells you by how much the DV changes for a unit change in one IV, keeping the other IV constant. Since IVs are inter-correlated, it is no surprise that once you keep 99 of them constant, an increase of the 100th actually decreases the DV. 2. Having N=100 limits the number of IV you can use. The old rule of thumb is that you should never attempt anything with less than 10 cases per variable. You are above that threshold (5 predictors with 100 cases = 20 cases per predictor), but even that threshold is far too low: ten (or 20) cases per variable leave you with large margins of error. Linear regression assumes that errors are normally distributed, but Monte Carlo sampling experiments suggest that errors are likely to be not normally distributed when sample size is less than 30-50 cases (per variable). This would imply that you cannot use more than 2-3 independent variables with 100 cases. Of course the final result's significance would depend also on the coefficient of variation of each variable (SD/mean), their inter-correlation and other things, but those figures suggest you better get a larger sample if you are attempting such a regression exercise.

Hector

-----Original Message----- From: SPSSX(r) Discussion [mailto:SPSSX-L@LISTSERV.UGA.EDU] On Behalf Of Bruce Weaver Sent: 30 September 2009 16:54 To: SPSSX-L@LISTSERV.UGA.EDU Subject: Re: Multiple Linear Regression vs a series of simple linear regression on the presence of multicollinearity

eins wrote: > > I am conducting a multiple Linear regression with 5 predictors, all > variables are continuous and n=100. Before doing linear regression > analysis, I did first a simple correlation analysis and found that all

> the predictors have positive and significant correlation with the > outcome variable. There are highly correlated predictors. > Surprisingly, when I did the multiple linear regression, two of the > predictors have negative B coefficients, Beta coeffcient less than > -1.0, VIF of greater than 10, Eigenvalue of zero, condition index of > >30.. These are indication of multicollinearity problem. > > Is it a right alternative to do simple linear regression, one > predictor at a time, instead of multiple regression? In case this > alternative is wrong, what makes it wrong? What information would be > lost in doing a series of simple regression, rather than multiple regression. > > Thank you. > Eins >

The negative coefficients for a couple variables suggests that you have one more more "suppressor variables". If you Google on that term, you should find lots of hits, including some notes by textbook author David Howell.

Regarding your second question, if you run 5 simple linear regressions, you'll have no control for confounding. The fact that you were running a multiple regression model in the first place suggests that this is not what you want. If the excessive multicollinearity is due to one variable, I would try just removing it.