```Date: Tue, 23 Dec 2003 09:20:40 -0800 Reply-To: cassell.david@EPAMAIL.EPA.GOV Sender: "SAS(r) Discussion" From: "David L. Cassell" Subject: Re: Stepwise Poisson Regression Content-type: text/plain; charset=US-ASCII "DePuy, Venita" helpfully replied: > I would say you could do it manually (and painfully). > > Stepwise, in a nutshell, involves putting all the variables in the model. > Then removing the one which decreases the R2 by the smallest amount. Then, > after each removal, consider adding a previously deleted variable if it is > significant. (you can specify p value levels for removal and addition). But the rules for stepwise selection are based on i.i.d. normal errors under the assumptions of linear regression. And Poisson regression doesn't quite have that, so the standard values for removal and addition wouldn't apply. > A slightly revised version, which I would do in your shoes (if anyone on the > list sees something wrong with this, please tell me?), is to remove the > variable with the highest p value (assuming it's non significant). Then re > run, remove the variable with the highest p value in the revised model. > Then, try adding the first variable to the model to see if it's significant > - probably not, but that double check is what makes it stepwise instead of > backwards regression. > Now you have a model with 2 variables removed; run the model, remove the 3rd > variable as above; then check to see if one of the first two variables > removed should be re-added. > Continue along those lines until you have every variable in the model > significant. > > Note the difference between stepwise (as I was taught it in school) and the > method above: stepwise removes variables based on the smallest drop in R^2 > value. "My" method removes based on highest p value. The variable > selections should be pretty close, but may not be exactly the same. Venita's method might take some time to evaluate. You would need to prove for the purposes of your research that it would yield an acceptable result, in the same way that people have established the rules for stepwise selection. I don't have the time now to fully analyze it, so I can't say whether it is workable. But you can't assume that it is equivalent to stepwise selection, or backward selection. > Another option is to do a macro to run all combinations of all numbers of > variables that you like, and pick the one with the highest adjusted R^2 > value; this selection method is available in Proc Reg etc., but I assume not > poisson. This also has the same problems as I have discussed previously when working with stepwise selection methods.. even if it too is popular. Kruskal has written about the problems of coming up with 'relative importance' in regressions, and I have cited his papers in some of my diatribes on this subject in SAS-L. But note that the highest-R^2 method and Kruskal's method both require running all k! possible regressions on k regressors, and so the amount of time goes up exponentially as the number of regressors goes up. My recommendation: determine why you think you need 'stepwise poisson regression' and re-think your problem. If you are after 'relative importance', then you have a vast can of worms you are opening. If you have been ordered to do this on pain of death, then try to find some papers or documents you boss/prof/ruler is using as his/her basis for decision, and evaluate them. HTH, David -- David Cassell, CSC Cassell.David@epa.gov Senior computing specialist mathematical statistician ```

Back to: Top of message | Previous page | Main SAS-L page