LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (September 2005)Back to main SPSSX-L pageJoin or leave SPSSX-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Mon, 19 Sep 2005 14:21:11 -0300
Reply-To:     Hector Maletta <hmaletta@fibertel.com.ar>
Sender:       "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From:         Hector Maletta <hmaletta@fibertel.com.ar>
Subject:      Re: General questions: Linear Regression
Comments: To: Karl Koch <TheRanger@gmx.net>
In-Reply-To:  <10357.1127148909@www49.gmx.net>
Content-Type: text/plain; charset="US-ASCII"

Karl, see responses to your questions below.

> -----Original Message----- > From: SPSSX(r) Discussion [mailto:SPSSX-L@LISTSERV.UGA.EDU] > On Behalf Of Karl Koch > Sent: Monday, September 19, 2005 1:55 PM > To: SPSSX-L@LISTSERV.UGA.EDU > Subject: General questions: Linear Regression > > Hello all, > > I have a few questions which I would like to ask here > regarding linear regression analysis in SPSS. I have > performed a linear regression with three IVs and one DV. > > I would like to find the regression function that models best > the data in order to make predictions. I have 3 IVs but only > 2IVs do stat. sig. > contribute to the variation of the DV. > > I did a normal (simultanious) linear regression. I get the > following model with its coefficients (The ANOVA table tells > me that this model is > significant.): > > Coefficients > ----------- > Model B t sig. > 1 (Constant) 4.200 58.972 .000 > FactorA -.779 -18.288 .000 > FactorB -.022 -.622 .535 > FactorC -1.601 -25.350 .000 > > > Furthermore, the model summary tells me an R square of 0.30 > which means that the model accounts for 30 % of the variance > in the DV. > > Now some questions: > > 1) How does this translate to the regression function Y = > alpha + beta1 * FactorA + beta2 * FactorC ? I only got one R > square value for the ENTIRE model...

Karl: The R2 coefficient is the squared correlation coefficient between predicted and observed values of your dependent variable. Therefore you get one R2 per equation, regardless of the number of IV in that equation. At the same time, your results show that the estimate for the coefficient of FactorB is not significantly different from zero: you have a probability of 0.53 that the true population coefficient is zero. By standard procedure, you should exclude Factor B from your equation. This would only marginally reduce R2, but add to the strength of your results and predictions.

> > 2) Where can I find out how much a R square of 0.30 (30%) > really means? Is this a strong effect? Can somebody provide > me with some approaches of how this could be interpreted?

R2 compares the full variability of the IV with the variability attributable to your IV. It is perfectly possible that the DV varies due to many other factors, besides the ones you have singled out in your equation. The results tells you that 70% of the differences in DV cannot be attributed to Factors A, B or C. For purposes of explanation, this does not matter much: if you find an effect that is statistically significant (i.e. an effect that most likely represents some real effect at the level of the population), that finding is important even if there are other sources of variability. For example, suppose you discover a risk factor that causes 10% of the variability in survival time after a heart transplant. Even if other factors are responsible for the other 90%, your finding is nonetheless valuable.

For purposes of prediction, instead, a low R2 may be bad news: if you predict the DV based on your three factors, you fill still have 70% of the variability originally observed in the DV, i.e. large variability around your predicted values, and therefore your predictions may be wrong most of the time even if based on sound facts and real effects.

> > 3) When performing the regression analysis, SPSS offers in > the "Save" dialog box the "Mahalanobis" distance. Does > somebody here know more details about this option - I could > not find a lot in the help... The reason why I am asking is > that one book suggests to tick this box without further > explaination and I usually want to know that I am messing up :-)

The Mahalanobis distance function is one way of measuring the distance between observed and predicted values. Unlike Euclidean distance (that uses the difference between individual values and the mean) it takes also into account the covariance between variables. It is used to determine whether an observation is an outlier, taking into account all the relationships among the variables involved. Some explanations may be found at http://www.usq.edu.au/users/senior/Assessment/RAVLT-Mahalanobis-Analysis.htm (but a Google of "Mahalanobis distance" will show many more).

> I am somehow missing the link between the SPSS results and > the more theoretical knowledge in the books. Perhaps somebody > more experineced here can help me out?

You may find useful to read some books on Statistical Analysis using SPSS, like the books by Marija Norusis.

Hector


Back to: Top of message | Previous page | Main SPSSX-L page