Date: Tue, 23 Dec 2003 09:20:40 -0800
Reply-To: cassell.david@EPAMAIL.EPA.GOV
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "David L. Cassell" <cassell.david@EPAMAIL.EPA.GOV>
Subject: Re: Stepwise Poisson Regression
Content-type: text/plain; charset=US-ASCII
"DePuy, Venita" <depuy001@DCRI.DUKE.EDU> helpfully replied:
> I would say you could do it manually (and painfully).
>
> Stepwise, in a nutshell, involves putting all the variables in the
model.
> Then removing the one which decreases the R2 by the smallest amount.
Then,
> after each removal, consider adding a previously deleted variable if
it is
> significant. (you can specify p value levels for removal and
addition).
But the rules for stepwise selection are based on i.i.d. normal errors
under
the assumptions of linear regression. And Poisson regression doesn't
quite
have that, so the standard values for removal and addition wouldn't
apply.
> A slightly revised version, which I would do in your shoes (if anyone
on the
> list sees something wrong with this, please tell me?), is to remove
the
> variable with the highest p value (assuming it's non significant).
Then re
> run, remove the variable with the highest p value in the revised
model.
> Then, try adding the first variable to the model to see if it's
significant
> - probably not, but that double check is what makes it stepwise
instead of
> backwards regression.
> Now you have a model with 2 variables removed; run the model, remove
the 3rd
> variable as above; then check to see if one of the first two variables
> removed should be re-added.
> Continue along those lines until you have every variable in the model
> significant.
>
> Note the difference between stepwise (as I was taught it in school)
and the
> method above: stepwise removes variables based on the smallest drop in
R^2
> value. "My" method removes based on highest p value. The variable
> selections should be pretty close, but may not be exactly the same.
Venita's method might take some time to evaluate. You would need to
prove
for the purposes of your research that it would yield an acceptable
result,
in the same way that people have established the rules for stepwise
selection.
I don't have the time now to fully analyze it, so I can't say whether it
is
workable. But you can't assume that it is equivalent to stepwise
selection,
or backward selection.
> Another option is to do a macro to run all combinations of all numbers
of
> variables that you like, and pick the one with the highest adjusted
R^2
> value; this selection method is available in Proc Reg etc., but I
assume not
> poisson.
This also has the same problems as I have discussed previously when
working with
stepwise selection methods.. even if it too is popular. Kruskal has
written about
the problems of coming up with 'relative importance' in regressions, and
I have
cited his papers in some of my diatribes on this subject in SAS-L. But
note that
the highest-R^2 method and Kruskal's method both require running all k!
possible
regressions on k regressors, and so the amount of time goes up
exponentially as
the number of regressors goes up.
My recommendation: determine why you think you need 'stepwise poisson
regression'
and re-think your problem. If you are after 'relative importance', then
you have
a vast can of worms you are opening. If you have been ordered to do
this on pain
of death, then try to find some papers or documents you boss/prof/ruler
is using
as his/her basis for decision, and evaluate them.
HTH,
David
--
David Cassell, CSC
Cassell.David@epa.gov
Senior computing specialist
mathematical statistician
|