Date: Wed, 21 Sep 2011 14:49:50 +0100
Reply-To: Muir Houston <Muir.Houston@GLASGOW.AC.UK>
Sender: "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From: Muir Houston <Muir.Houston@GLASGOW.AC.UK>
Subject: Re: output from linear regression
In-Reply-To: A<0LRV009PUI2RIK70@relay-auth-2.ms.rz.rwth-aachen.de>
Content-Type: multipart/alternative;
First problem is your reliance on an entry method - stepwise procedures
are frowned upon due to a number of issues - mainly they are divorced
from theory and existing research
Copied from http://www.stata.com/support/faqs/stat/stepwise.html
Here are some of the problems with stepwise variable selection.
1. It yields R-squared values that are badly biased to be high.
2. The F and chi-squared tests quoted next to each variable on the
printout do not have the claimed distribution.
3. The method yields confidence intervals for effects and predicted
values that are falsely narrow; see Altman and Andersen (1989).
4. It yields p-values that do not have the proper meaning, and the
proper correction for them is a difficult problem.
5. It gives biased regression coefficients that need shrinkage (the
coefficients for remaining variables are too large; see Tibshirani
[1996]).
6. It has severe problems in the presence of collinearity.
7. It is based on methods (e.g., F tests for nested models) that
were intended to be used to test prespecified hypotheses.
8. Increasing the sample size does not help very much; see Derksen
and Keselman (1992).
9. It allows us to not think about the problem.
10. It uses a lot of paper.
"All possible subsets" regression solves none of these problems.
Hope this helps
Muir
Muir Houston, HNC, BA (Hons), M.Phil., PhD, FHEA
Social Justice, Place and Lifelong Education Research
School of Education
University of Glasgow
0044+141-330-4699
R3L+ Project - Adult education in the light of the European Quality
Strategy
http://www.learning-regions.net/ <http://www.learning-regions.net/>
GINCO Project - Grundtvig International Network of Course Organisers
http://www.ginconet.eu/ <http://www.ginconet.eu/>
From: SPSSX(r) Discussion [mailto:SPSSX-L@LISTSERV.UGA.EDU] On Behalf Of
Leon Galushko
Sent: 21 September 2011 13:31
To: SPSSX-L@LISTSERV.UGA.EDU
Subject: output from linear regression
Hi,
i have some troubles with understanding of output from multivariate
linear regression...
As predictors there are some 25 variables and on the other side is
dependent variable (from medical research),
which represents 'quality of life" (between 0 and 100 points, more
points implies more quality of life after operation).
I have chosen backward procedure, so after n steps remained only some
medical predictors with significant influence....
Now the problem: one have some predictors which should have obviously
negative influence on my dependent variable, which is 'quality of life',
such predictors for example are 'surgery complication' (0: no, 1 yes) OR
'tumor length' have indeed significant positive one,
like Beta = .253, p < .000 for 'tumor length'. It can't be logical that
people with big tumors have significantly better 'quality of life' after
operation nor
with more surgical complications....(one should see instead Beta = -
.253 p < .000).
On the other side another predictor - variables gained negative or
positive significant influence, which could be logically well explained.
Could it be that by processing linear regression with backward procedure
are some intern steps,
which makes signs (if it is plus then empty, minus as '-') for Beta -
Values in an output irrelevant?
How else could this be explained?
Thanks,
Leon
[text/html]