| Date: | Mon, 4 Jul 2011 06:47:02 +0000 |
| Reply-To: | DorraJ Oet <love_u_endlessly@hotmail.com> |
| Sender: | "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU> |
| From: | DorraJ Oet <love_u_endlessly@hotmail.com> |
| Subject: | Re: Hosmer-Lemeshow Goodness of Fit (Logistic Regression) |
|
| In-Reply-To: | <017FB41275AE7A46988755E60E32F4010574F65E71@UTHCMS3.uthouston.edu> |
| Content-Type: | multipart/alternative;
|
|---|
Hi Dr Paul,
Thanks for the simulation. This was what I suspected as well as in when the sample goes larger and larger, HL becomes significant. However can I boldly say that with a large data file, data mining models (maybe CHAID, C5 for profiling) could be a better choice than Statistical models?
By the way, thank you Hector and Rich. Those were interesting discussion.
Warmest regardsDorraj Oet
Date: Fri, 1 Jul 2011 15:25:37 -0500
From: Paul.R.Swank@uth.tmc.edu
Subject: Re: Hosmer-Lemeshow Goodness of Fit (Logistic Regression)
To: SPSSX-L@LISTSERV.UGA.EDU
I did a little simulation. I generated a normally distributed variable (theta) as a function of two independent normally distributed IVs. I then created a new dichotomous variable by selecting values of theta greater than one. I did this 1000 times with a samples of 500, 50000, and 100000. The Hosmer-Lemeshow test was significant 4.1% of the time when n=500 but 33.5% when n was 50000, and 66.6% of the time when n was 100,000. So it does appear that the test becomes more sensitive when n is large. Thus, very small deviations from the model may result in significant lack of fit when the sample size is large. This is similar to the problem in SEM where large samples are much more likely to show lack of fit than small samples. Yes, there is a significant lack of fit. But is it substantial enough to invalidate the findings of the logistic model. Dr. Paul R. Swank, Professor Children's Learning InstituteUniversity of Texas Health Science Center-Houston From: SPSSX(r) Discussion [mailto:SPSSX-L@LISTSERV.UGA.EDU] On Behalf Of Hector Maletta
Sent: Friday, July 01, 2011 12:19 AM
To: SPSSX-L@LISTSERV.UGA.EDU
Subject: Re: Hosmer-Lemeshow Goodness of Fit (Logistic Regression) Of course, Rich, you did not say anything contrary to my comments. You just happened to be the one commenting before me in this thread.Hector De: SPSSX(r) Discussion [mailto:SPSSX-L@LISTSERV.UGA.EDU] En nombre de Rich Ulrich
Enviado el: Thursday, June 30, 2011 23:11
Para: SPSSX-L@LISTSERV.UGA.EDU
Asunto: Re: Hosmer-Lemeshow Goodness of Fit (Logistic Regression) Hector,
The H-L test is a Goodness of Fit test, as you describe. I don't think
that I said anything contrary to that, and I don't think I said anything
that is irrelevant when it comes to the effects of huge N.
I grant that for ordinary, small N, there is little power for discerning whether
the Logistic model is the "correct" one, contrasted to other ways of modeling
the approach to a dichotomy. But Probit models, based on the normal, were
were a frequent alternative to the logistic, before about 1988 when better
computerization of the logistic arrived. Fortunately for us all, the cases that
truly deserve Probit are rarer than the other cases.
Using the wrong model (Logistic versus other) *ought* to show up as a
poorer fit in the tails -- with excessive deviations that will be captured
by a goodness-of-fit test. I grant that this is subtle; I expect that other
problems are more likely to be detected in the usual run of things.
By coincidence, there was a similar problem posted today in the Usenet
group, sci.stat.consult. I hope that Brendan Halpin won't mind my
re-posting his Reply, here.
***from s.s.c.
Newsgroups: sci.stat.consult
Subject: Re: Hosmer Lemeshaw Test and Large Samples
Date: Thu, 30 Jun 2011 15:26:32 +0100
Lines: 17
Message-ID: <8762nnpf8n.fsf@wivenhoe.ul.ie>
I though the H-L test was out of favour these days, even with H & L.
See
http://www.biostat.wustl.edu/archives/html/s-news/1999-04/msg00147.html
for an explanation, and note that Harrell has implemented this new test
for R in his rms package.
Mind you, other comments suggest the problems with the H-L test are in
the direction of failing to detect lack of fit, so this may not help
you!
Brendan
--
Brendan Halpin, Department of Sociology, University of Limerick, Ireland
***end cite.
Date: Thu, 30 Jun 2011 18:54:49 -0300
From: hmaletta@fibertel.com.ar
Subject: Re: Hosmer-Lemeshow Goodness of Fit (Logistic Regression)
To: SPSSX-L@LISTSERV.UGA.EDURich,I think you got it wrong, but I can be wrong myself.In my view, the Hosmer Lemeshow test is NOT a statistical significance test such as the ordinary significance tests you run to ascertain whether a value is ¡°significantly different from zero¡±. The H-L test is a test applied in logistic regression, no matter what is the sample size, to ascertain one and only one thing: whether the observed proportions of events are similar to the predicted probabilities of occurrence. This test starts by sorting the cases by predicted probability, and splitting them into deciles, i.e. subgroups comprising 10 percent of total cases each. The first decile, for instance, may group the lower ten percent of cases, with predicted probabilities ranging from zero to, say, 0.24, with an average of 0.14; the second decile may comprise another ten percent of cases with predicted probabilities above 0.24 and up to 0.29, with an average predicted probability of 0.27; an d so on. Hosmer-Lemeshow compare these average probabilities (0.14, 0.27 and so on) to the actual proportion of events occurred within each group. The index is a sum of terms of the form [(O-E)/E]^2. If the sum happens to be zero, it means the predicted probabilities and observed relative frequencies coincide perfectly. If the sum is larger it means there are discrepancies between observed relative frequencies and predicted probabilities (the discrepancies may happen in any of the deciles). One wants the discrepancies to be low, i.e. one wants the sum to be as low as possible. In my case, besides using the sum as a chi square, and then apply the chi square distribution to find out whether the value of the H/L indicator is ¡°significant¡± (which is of course a function of the number of cases in the sample), I prefer observing a graph showing observed proportions and predicted probabilities in the ten deciles. I recently completed a study based on more than one million households (Census data from Bolivia). Of course, even small values of HL w
ere sometimes ¡°statistically significant¡±, in the sense of being ¡°too large, given the size of the sample, to have arisen by mere chance¡±, though the sheer number of cases caused that not to happen too frequently. However, I preferred to look at the graph to see where the (usually small) differences were more noticeable, in the lower or the middle or the higher deciles, whether once or two deciles concentrated the differences or the differences were similar across deciles. (By the way, SPSS does not produce that graph, but only the observed and predicted proportions, from which one can build the graph in Excel). These tests do not test whether a logistic model is appropriate, or the ¡°goodness of fit¡± of the model to data. But a model predicting different probabilities should be able to produce predicted probabilities (for groupings of people) that somehow match the observed proportions. In my case they matched quite well. Of course, this does not enable you to guess which particular individuals will suffer the event: probabilities (at least in this context) are an attribute of the group.In HL, the groupings are simply constructed by ordering the predicted probabilities from low to high. But one could use the same approach for different groupings of predictor variables. Suppose the predictors are gender, age and education level (with several age groups and several levels of education); this could generate a number of groups, each with individuals of the same sex, same age group and same education level. Within those groups, predicted probabilities would be equal or very similar, and one can assess whether the observed proportions of events within those groups are close to the predicted probabilities. If the groupings are based on ALL the predictors, the predicted probabilities within each group will be uniform; if some predictor is left out, there might be some variability in predicted probabilities within each group (as within the deciles in the H/L test), however one works with the AVERAGE predicted proba
bility within each group, and compares those averages with the actual proportion of events. Individual prediction is not possible: if everyone in a group has a predicted probability of, say, about 0.75, you may expect that one quarter of them do not get the event, and three quarters do; there is no way to identify in advance which individuals will suffer the event, just as knowing you are in a group with 75% risk of lung cancer does not allow you to know whether you or your neighbour will actually have lung cancer. Winston Churchill (fat, heavy drinker and chain smoker) was at a high risk of early death all along his long life, till he died of old age a few months before making 90. He had the same risk of plenty other people in his same risk groups along his life, but it was others who died while he was among the lucky few survivors. Hector
[snip, previous] No virus found in this message.
Checked by AVG - www.avg.com
Version: 10.0.1388 / Virus Database: 1516/3735 - Release Date: 06/30/11
[text/html]
|