Date: Tue, 23 Dec 2003 09:20:40 -0800
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "David L. Cassell" <cassell.david@EPAMAIL.EPA.GOV>
Subject: Re: Stepwise Poisson Regression
Content-type: text/plain; charset=US-ASCII
"DePuy, Venita" <depuy001@DCRI.DUKE.EDU> helpfully replied:
> I would say you could do it manually (and painfully).
> Stepwise, in a nutshell, involves putting all the variables in the
> Then removing the one which decreases the R2 by the smallest amount.
> after each removal, consider adding a previously deleted variable if
> significant. (you can specify p value levels for removal and
But the rules for stepwise selection are based on i.i.d. normal errors
the assumptions of linear regression. And Poisson regression doesn't
have that, so the standard values for removal and addition wouldn't
> A slightly revised version, which I would do in your shoes (if anyone
> list sees something wrong with this, please tell me?), is to remove
> variable with the highest p value (assuming it's non significant).
> run, remove the variable with the highest p value in the revised
> Then, try adding the first variable to the model to see if it's
> - probably not, but that double check is what makes it stepwise
> backwards regression.
> Now you have a model with 2 variables removed; run the model, remove
> variable as above; then check to see if one of the first two variables
> removed should be re-added.
> Continue along those lines until you have every variable in the model
> Note the difference between stepwise (as I was taught it in school)
> method above: stepwise removes variables based on the smallest drop in
> value. "My" method removes based on highest p value. The variable
> selections should be pretty close, but may not be exactly the same.
Venita's method might take some time to evaluate. You would need to
for the purposes of your research that it would yield an acceptable
in the same way that people have established the rules for stepwise
I don't have the time now to fully analyze it, so I can't say whether it
workable. But you can't assume that it is equivalent to stepwise
or backward selection.
> Another option is to do a macro to run all combinations of all numbers
> variables that you like, and pick the one with the highest adjusted
> value; this selection method is available in Proc Reg etc., but I
This also has the same problems as I have discussed previously when
stepwise selection methods.. even if it too is popular. Kruskal has
the problems of coming up with 'relative importance' in regressions, and
cited his papers in some of my diatribes on this subject in SAS-L. But
the highest-R^2 method and Kruskal's method both require running all k!
regressions on k regressors, and so the amount of time goes up
the number of regressors goes up.
My recommendation: determine why you think you need 'stepwise poisson
and re-think your problem. If you are after 'relative importance', then
a vast can of worms you are opening. If you have been ordered to do
this on pain
of death, then try to find some papers or documents you boss/prof/ruler
as his/her basis for decision, and evaluate them.
David Cassell, CSC
Senior computing specialist