Date: Mon, 19 Sep 2005 14:21:11 -0300
Reply-To: Hector Maletta <hmaletta@fibertel.com.ar>
Sender: "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From: Hector Maletta <hmaletta@fibertel.com.ar>
Subject: Re: General questions: Linear Regression
In-Reply-To: <10357.1127148909@www49.gmx.net>
Content-Type: text/plain; charset="US-ASCII"
Karl, see responses to your questions below.
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:SPSSX-L@LISTSERV.UGA.EDU]
> On Behalf Of Karl Koch
> Sent: Monday, September 19, 2005 1:55 PM
> To: SPSSX-L@LISTSERV.UGA.EDU
> Subject: General questions: Linear Regression
>
> Hello all,
>
> I have a few questions which I would like to ask here
> regarding linear regression analysis in SPSS. I have
> performed a linear regression with three IVs and one DV.
>
> I would like to find the regression function that models best
> the data in order to make predictions. I have 3 IVs but only
> 2IVs do stat. sig.
> contribute to the variation of the DV.
>
> I did a normal (simultanious) linear regression. I get the
> following model with its coefficients (The ANOVA table tells
> me that this model is
> significant.):
>
> Coefficients
> -----------
> Model B t sig.
> 1 (Constant) 4.200 58.972 .000
> FactorA -.779 -18.288 .000
> FactorB -.022 -.622 .535
> FactorC -1.601 -25.350 .000
>
>
> Furthermore, the model summary tells me an R square of 0.30
> which means that the model accounts for 30 % of the variance
> in the DV.
>
> Now some questions:
>
> 1) How does this translate to the regression function Y =
> alpha + beta1 * FactorA + beta2 * FactorC ? I only got one R
> square value for the ENTIRE model...
Karl:
The R2 coefficient is the squared correlation coefficient between predicted
and observed values of your dependent variable. Therefore you get one R2 per
equation, regardless of the number of IV in that equation.
At the same time, your results show that the estimate for the coefficient of
FactorB is not significantly different from zero: you have a probability of
0.53 that the true population coefficient is zero. By standard procedure,
you should exclude Factor B from your equation. This would only marginally
reduce R2, but add to the strength of your results and predictions.
>
> 2) Where can I find out how much a R square of 0.30 (30%)
> really means? Is this a strong effect? Can somebody provide
> me with some approaches of how this could be interpreted?
R2 compares the full variability of the IV with the variability attributable
to your IV. It is perfectly possible that the DV varies due to many other
factors, besides the ones you have singled out in your equation. The results
tells you that 70% of the differences in DV cannot be attributed to Factors
A, B or C.
For purposes of explanation, this does not matter much: if you find an
effect that is statistically significant (i.e. an effect that most likely
represents some real effect at the level of the population), that finding is
important even if there are other sources of variability. For example,
suppose you discover a risk factor that causes 10% of the variability in
survival time after a heart transplant. Even if other factors are
responsible for the other 90%, your finding is nonetheless valuable.
For purposes of prediction, instead, a low R2 may be bad news: if you
predict the DV based on your three factors, you fill still have 70% of the
variability originally observed in the DV, i.e. large variability around
your predicted values, and therefore your predictions may be wrong most of
the time even if based on sound facts and real effects.
>
> 3) When performing the regression analysis, SPSS offers in
> the "Save" dialog box the "Mahalanobis" distance. Does
> somebody here know more details about this option - I could
> not find a lot in the help... The reason why I am asking is
> that one book suggests to tick this box without further
> explaination and I usually want to know that I am messing up :-)
The Mahalanobis distance function is one way of measuring the distance
between observed and predicted values. Unlike Euclidean distance (that uses
the difference between individual values and the mean) it takes also into
account the covariance between variables. It is used to determine whether an
observation is an outlier, taking into account all the relationships among
the variables involved. Some explanations may be found at
http://www.usq.edu.au/users/senior/Assessment/RAVLT-Mahalanobis-Analysis.htm
(but a Google of "Mahalanobis distance" will show many more).
> I am somehow missing the link between the SPSS results and
> the more theoretical knowledge in the books. Perhaps somebody
> more experineced here can help me out?
You may find useful to read some books on Statistical Analysis using SPSS,
like the books by Marija Norusis.
Hector