LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (September 2003, week 5)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Tue, 30 Sep 2003 16:58:54 -0700
Reply-To:     cassell.david@EPAMAIL.EPA.GOV
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         "David L. Cassell" <cassell.david@EPAMAIL.EPA.GOV>
Subject:      Re: Proportion of Variance Explained by Each Predictor in
              Logistic Regression
Content-type: text/plain; charset=UTF-8

Laurel Copeland <laurel.copeland@MED.VA.GOV> wrote: > I would like to calculate the proportion of the variance explained by each > predictor in a multiple logistic regression (see for example Lai et al. > (2002) Urology 56:108-115, Radical prostatectomy: geographic and > demographic variation). > > Lai et al. wrote their own program to do this (I don't know in what – maybe S-PLUS). > > I have received a macro that calculates some R2 values by pulling out of > PROC LOGISTIC the INTERCEPT (OUT=...) and XBETA and P (OUTEST=...) and > going on from there, but it does not calculate the contributions to PVE > from each predictor.

I hate to say it, but... Well, I *don't* hate to say it. Because I've said it dozens, maybe hundreds, of times. Computing PVE is completely misleading. It essentially says, "Here is the first most important predictor, and the second, and..." And you can't do that. The classic work on relative importance is Kruskal. I have cited some of his pubs on this multiple times in SAS-L, so you can search the archives to get the references. He has talked about trying to get around this in multiple linear regression, and even in ideal circumstances you have a very messy problem requiring an exponential number of regression fits and computations from the fits. In *normal* circumstances, you can't do it at all, because even a method like Kruskal's assumes that you have no measurement error (or measurement error scaled so carefully across all variables that it cancels out and can therefore be ignored). This never happens. Even if you don't have any measurement error among any of your predictor variables, you still run into problems with multicollinearity, suppressor variables, etc. Multicollinear data can give you meaningless numbers when you look at PVE, since how do you separate out the contribution of X1, X2, and X3 when all three of them tell you the same thing?

I haven't read Lai et al. I stopped getting free copies of medical journals mailed to my home when my father (the pediatrician) passed on. But I doubt that Lai and his co-authors did any substantial statistical theory in building their numbers. So I am skeptical that this is an approach that you really want to take.

Have I been pessimistic enough yet?

HTH, David The Curmudgeon -- David Cassell, CSC Cassell.David@epa.gov Senior computing specialist mathematical statistician


Back to: Top of message | Previous page | Main SAS-L page