**Date:** Thu, 24 Jul 2008 10:00:22 +0200
**Reply-To:** Marta García-Granero <mgarciagranero@gmail.com>
**Sender:** "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
**From:** Marta García-Granero <mgarciagranero@gmail.com>
**Subject:** Re: Multiple Regression with Continuous and Categorical Variables
**In-Reply-To:** <4887E308.80802@sfos.uaf.edu>
**Content-Type:** text/plain; charset=ISO-8859-1; format=flowed
Briana H. Witteveen escribió:
> I know that to use categorical independent variables in multiple
> regression you must create dummy variables. How do you include the dummy
> variables in a multiple regression model that also includes several
> continuous independent variables? Is it possible to use dummy variables
> and continuous in a stepwise regression?
>
Briana:

Scott Millis has already given you a decalogue of reasons for avoiding
stepwise regression. I could add one to his collection of reasons:
stepwise regression doesn't handle properly dummy coded categorical
variables. The final model might lack one of the dummy variables,
rendering the effect of the categorical variable uninterpretable.
Stepwise regression has been ironically called "unwise" regression
(Leamer, 1985). Avoid it. Period.

Take a look at chapter 4 of the book "Applied Logistic Regression"
Hosmer&Lemeshow (1989). They give excellent guidelines to model
development. Basically:

1) Univariate analysis

2) Select those variables that should be included for next step:
- Those that showed interesting results in univariate analysis (this
doesn't necessarily mean "significant")
- Those that your experience tells you that they might play an important
role (confounding and/or effect modifier). In Epidemiology/medical
research, gender and age are typical variables.

3) Build a model with all the variables you selected in the previous
step. Examine their adjusted effect and remove carefully those that look
non important. Check the effect of the removal of one variable in the
slopes of the rest. Important changes (above 10% is a good reference)
will show you that the variable you removed plays a role in the model
and should stay in it. If you suspect a variable is involved in
interactions (see next step), it should never be removed (hierarchical
rule). The final model is called the "main effects model"

4) Examine the existence of interaction between variables. Limit the
interaction terms according to these conditions:
- They should be statistically significant
- Meaningful: if you can't explain from a solid theoretical point of
view the presence of the interaction, then discard it
- Hierarchical rule: if an interaction term is present in a model, then
both main effects should also be. Stepwise regression tends to mess with
the rule, BTW

Your final model should be then validated (using an independent dataset).

Quoting Campbell (Statistics at Square Two, 2001): 'Do not forget that
models are simply an approximation to reality. "All models are wrong,
but some are useful" '

HTH,
Marta García-Granero

--
For miscellaneous statistical stuff, visit:
http://gjyp.nl/marta/

=====================
To manage your subscription to SPSSX-L, send a message to
LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD