Whatever you use, please do not use the percentage of correctly predicted
individual cases. In my opinion it means little or nothing (even if you
choose a "correct" cutoff point, which is itself a difficult and from some
viewpoints an unsolvable problem). Probability prediction by logistic
regression is not predicated of individuals but of populations or groups or
similar individuals: any individual outcome is compatible with the
prediction. For instance, if you predict a 90% probability that I (or more
exactly, people with my values in the chosen predictors) would die within a
year, my eventual survival for another 45 years is perfectly compatible with
that prediction. What you were actually predicting is that out of a large
number of people like me, 90 out of every 100 die within one year. I just
happened to be in the lucky 10% living longer. Even the estimated 90%
probability is itself subjet to estimation error: the "true" population
probability might be higher or lower, with certain probability of error (you
may have, say, a 95% chance that the true probability is between 0.85 and
0.95, and 5% chance that is it either lower or higher. In other words, the
true probability might be much lower.
The probability (observed or predicted), whatever its value happens to be,
is an attribute of the group, not an attribute of each subject. This is, of
course, the frequentist interpretation of probability, but it is arguably
the only consistent one. Individual outcomes of random variables are
strictly indeterminate: it is the group aggregate outcome which is subject
to the prediction.
With these caveats in mind, you may turn for example to Hosmer and
Lemeshow's book, Applied Logistic Regression for detailed information about
significance tests and goodness of fit tests for logistic regression, and
about applying a logistic regression solution (obtained from one dataset) to
a second dataset with a validation purpose.
Hope this helps.
Hector
Original Message
From: SPSSX(r) Discussion [mailto:SPSSXL@LISTSERV.UGA.EDU] On Behalf Of J P
Sent: 10 December 2008 18:33
To: SPSSXL@LISTSERV.UGA.EDU
Subject: validating a logistic regression model
Dear Colleagues,
I am attempting to learn how to valide a logististic regression model. I've
been reading about bootstrapping and cross validation, etc. But have found
no instruction on how to actually conduct the anaysis and interpret the
results. Any references on this subject or advice on how to perform this
with SPSS is greatly appreciated.
Here is an example of what I am talking
about....http://symptomresearch.nih.gov/chapter_8/sec7/cess7pg14.htm
Thanks!
John
To manage your subscription to SPSSXL, send a message to
LISTSERV@LISTSERV.UGA.EDU (not to SPSSXL), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSXL
For a list of commands to manage subscriptions, send the command
INFO REFCARD
=====================
To manage your subscription to SPSSXL, send a message to
LISTSERV@LISTSERV.UGA.EDU (not to SPSSXL), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSXL
For a list of commands to manage subscriptions, send the command
INFO REFCARD
