LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (July 2006)Back to main SPSSX-L pageJoin or leave SPSSX-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
=========================================================================
Date:         Mon, 31 Jul 2006 17:08:44 +0200
Reply-To:     Marta García-Granero
              <biostatistics@terra.es>
Sender:       "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From:         Marta García-Granero
              <biostatistics@terra.es>
Organization: Asesoría Bioestadística
Subject:      Re: Addition of covariates in forward regression analyses
In-Reply-To:  <329A68716B57D54E8D39FD3F8A4A84DF06E0770B@um-mail0136.unimaas.nl>
Content-Type: text/plain; charset=ISO-8859-15

Hi Gjalt-Jorn

PGP> [if this question is inappropriate (as it discusses a topic not limited PGP> to SPSS) please tell me; I could not find rules prohibiting this online]

As I told you in my private message to you: theoretical questions are neither inappropriate nor prohibited in this list.

OK. I have let you several days to "digest" the PDF file with chapter 7 of Rawlings' book, concerning multiple regression models. Now we can start discussing your questions (this thread is of course open to everyone who wants to add something/correct anything of what I say here).

First of all:

Why do you split your dataset in several subgroups according to a categorical variable? You loose power working with smaller sample sizes. You could add the categorical variable to your model as an extra covariate, and check if the models you obtain are different by addign interaction terms between that variable and the predictors of interest (see below for a very simple example).

What I would do instead is select a random sample of cases (around 10%), keep it aside and develop my model with the other 90%. This "small" sample could be used later to cross-validate the model (evaluating shrinkage).

Now, if you are interested in adding "distal covariates" only if they explain an important (don't use the word "significant" here) of the variance, then you should consider examining the change in adjusted R-square, or the decrease in residual variance, instead of simply the significance of the variable in the full model.

Now, the example I mentioned:

DATA LIST LIST/ deadspac height age group (4 F4). BEGIN DATA 44 110 5 1 31 116 5 0 43 124 6 1 45 129 7 1 56 131 7 1 79 138 6 0 57 142 6 1 56 150 8 1 58 153 8 1 92 155 9 0 78 156 7 0 64 159 8 1 88 164 10 0 112 168 11 0 101 174 14 0 END DATA.

VAR LABEL deadspac'Pulmonary anatomical deadspace (ml)'. VAR LABEL height'Height (cm)'. VAR LABEL age'Age (years)'. VAR LABEL group'Status'. VAL LABEL group 0'Normal' 1'Asthma'.

* Two independent models: one for normal children and another one for asthmatic *. SORT CASES BY group . SPLIT FILE SEPARATE BY group .

REGRESSION /STATISTICS COEFF OUTS CI R ANOVA /NOORIGIN /DEPENDENT deadspac /METHOD=ENTER height age .

* As you can see, in looks like in normal children, neither height nor age is significant (although the model is significant) *.

SPLIT FILE OFF.

REGRESSION /STATISTICS COEFF OUTS CI R ANOVA /NOORIGIN /DEPENDENT deadspac /METHOD=ENTER height age group .

COMPUTE grphgt=group*height.

REGRESSION /STATISTICS COEFF OUTS CI R ANOVA /NOORIGIN /DEPENDENT deadspac /METHOD=ENTER height age group grphgt.

PGP> In forward selection multiple linear regression, which of these factors PGP> influence whether a covariate is added to the model?

PGP> - the size of the regression weight the covariate would get PGP> - the standard error of that regression weight PGP> - the complete sample size

PGP> I suspect that both the size & standard error of the regression weight PGP> are of influence, and that the sample size influences the standard error PGP> of the regression weight.

PGP> If you don't want to know why I'm asking this, you can stop reading now PGP> :-) PGP> In any case thanks in advance :-)

PGP> Why I want to know this:

PGP> I am conducting several very exploratory regression analyses, regressing PGP> the same covariates on the same criterion in a number of different PGP> subsamples (persons with a different value on a certain variable; in PGP> this case for example ecstasy use status (non-users, users & ex-users)). PGP> I use the forward method to probe which covariates yield a significant PGP> addition to the model. The covariates are placed in six blocks (on the PGP> basis of theoretical proximity to the criterion; the idea is that more PGP> distal covariates only enter the model if they explain a significant PGP> portion of the criterion variance over and above the more proximal PGP> covariates already in the model). P to enter is .05. (peripheral PGP> question: am I correct in assuming that this is the p-value associated PGP> with the t-value of the beta of the relevant covariate?)

PGP> The sample sizes of the samples are unequal (e.g., ranging from 200 to PGP> 500). I get the strong impression that the number of covariates in the PGP> final model depends on the sample size. This would imply that covariates PGP> with less 'impact' would be added to the model when the model is PGP> developed with a larger sample (e.g., with equal standard errors of the PGP> parameter weight, when a covariate increases 1 standard deviation, an PGP> increase of the criterion of 0.2 * Y's standard deviation could suffice PGP> (lead to inclusion) with n=500, but not with n=200).

PGP> If this correct? And if so, is there a way to 'correct' the p-to-enter PGP> for sample size, so that all final models comprise covariates with PGP> roughly equal relevance? (except for selecting sub-subsamples from all PGP> subsamples of the size of the smallest subsample)

PGP> My goal in the end is to cursorily compare the models in the different PGP> subsamples (no, sorry, I'm not going to use SEM; given the amount of PGP> potential predictors, the sample sizes are too small). This is not very PGP> 'fair' if the model in one subsample has lower thresholds for PGP> 'inclusion' than the model in another.

PGP> If what I'm trying is completely insade/stupid/otherwise unadvisable, PGP> I'm of course eager to learn :-)

-- Regards, Dr. Marta García-Granero,PhD mailto:biostatistics@terra.es Statistician

--- "It is unwise to use a statistical procedure whose use one does not understand. SPSS syntax guide cannot supply this knowledge, and it is certainly no substitute for the basic understanding of statistics and statistical thinking that is essential for the wise choice of methods and the correct interpretation of their results".


Back to: Top of message | Previous page | Main SPSSX-L page