Date: Tue, 30 Sep 2003 16:58:54 -0700
Reply-To: cassell.david@EPAMAIL.EPA.GOV
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "David L. Cassell" <cassell.david@EPAMAIL.EPA.GOV>
Subject: Re: Proportion of Variance Explained by Each Predictor in
Logistic Regression
Content-type: text/plain; charset=UTF-8
Laurel Copeland <laurel.copeland@MED.VA.GOV> wrote:
> I would like to calculate the proportion of the variance explained by
each
> predictor in a multiple logistic regression (see for example Lai et
al.
> (2002) Urology 56:108-115, Radical prostatectomy: geographic and
> demographic variation).
>
> Lai et al. wrote their own program to do this (I don't know in what –
maybe S-PLUS).
>
> I have received a macro that calculates some R2 values by pulling out
of
> PROC LOGISTIC the INTERCEPT (OUT=...) and XBETA and P (OUTEST=...) and
> going on from there, but it does not calculate the contributions to
PVE
> from each predictor.
I hate to say it, but... Well, I *don't* hate to say it. Because I've
said it dozens, maybe hundreds, of times. Computing PVE is completely
misleading. It essentially says, "Here is the first most important
predictor,
and the second, and..." And you can't do that. The classic work on
relative importance is Kruskal. I have cited some of his pubs on this
multiple times in SAS-L, so you can search the archives to get the
references.
He has talked about trying to get around this in multiple linear
regression, and even in ideal circumstances you have a very messy
problem
requiring an exponential number of regression fits and computations from
the
fits. In *normal* circumstances, you can't do it at all, because even a
method like Kruskal's assumes that you have no measurement error (or
measurement
error scaled so carefully across all variables that it cancels out and
can therefore be ignored). This never happens. Even if you don't have
any measurement error among any of your predictor variables, you still
run into problems with multicollinearity, suppressor variables, etc.
Multicollinear data can give you meaningless numbers when you look at
PVE, since how do you separate out the contribution of X1, X2, and X3
when all
three of them tell you the same thing?
I haven't read Lai et al. I stopped getting free copies of medical
journals
mailed to my home when my father (the pediatrician) passed on. But I
doubt
that Lai and his co-authors did any substantial statistical theory in
building
their numbers. So I am skeptical that this is an approach that you
really want
to take.
Have I been pessimistic enough yet?
HTH,
David The Curmudgeon
--
David Cassell, CSC
Cassell.David@epa.gov
Senior computing specialist
mathematical statistician
|