Date: Wed, 6 Jun 2007 12:13:05 -0300
Reply-To: Hector Maletta <hmaletta@fibertel.com.ar>
Sender: "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From: Hector Maletta <hmaletta@fibertel.com.ar>
Subject: Re: Log-it regression
In-Reply-To: <200706061411.l56AkDdN014466@malibu.cc.uga.edu>
Content-Type: text/plain; charset="us-ascii"
Alina,
I don't quite understand what your problem really is.
Logit or logistic regression estimates the probability or the odds
of an event as a function of one or more predictors, and not the actual
occurrence of the event in individual cases. As such, it should be used as
an indicator of odds or probabilities for populations, not occurrences for
individuals. Nonetheless, it is customarily used to predict the outcome of
individuals by means of some cut-off point, and this leads often to some
confusion and debate (not least about what the cut off point should be).
As the predicted probability (or log odds ratio) goes up, of
course, it is expected that the actual percentage of people with the outcome
goes also up (or down, depending on the sign of coefficients), with some not
having the event, i.e. with a value of zero which is at or below the
predicted or observed probability of the outcome, and some having the event
i.e. a value of one which is at or above the predicted or observed
probability of the event. The individual "residuals" of the logit are in
fact the actual outcome for each individual (0 or 1) minus the predicted
value (the probability of the event for that individual, as a function of
predictors).
What you are encountering, apparently, is that your cases come in
triads: as the log odds go up (or the probability of the event goes up) you
find three cases without the event, then three with it, then another three
without it, and so on. There is no reason for that, and it is probably a
fluke or some quirk in the data. On the other hand, if that were the case
all along, the odds would not vary as a function of predictors, since 0s and
1s would alternate in equal numbers (3 of each alternately), and the odds
ratio curve would be flat (since the positives would equal the negatives all
along the range of the logit function, except perhaps for the slight
imbalance between the first three and the last three if the number of triads
is an even number).
Perhaps I am dumber than usual today and am missing something else
you are trying to say.
Hector
-----Original Message-----
From: SPSSX(r) Discussion [mailto:SPSSX-L@LISTSERV.UGA.EDU] On Behalf Of
Alina Sheyman
Sent: 06 June 2007 11:11
To: SPSSX-L@LISTSERV.UGA.EDU
Subject: Log-it regression
Hi all,
I have a quick question about a log-it regression. I've build a
model that
uses the log of odds ratio (probability of staying in school vs.
dropping
out) as my dependent variable. It looks like a decent model (good r
sq),
but what worries me is that there seems to be a slight pattern to
the
regression. For 12 data points I am using I get about three
residiuals
with a positive sign, three with a negative, then three more with a
positive, etc. Does anyone know if this is a typical occurance with
a log-
it model or if there's a better model I should use to avoid seeing
this
pattern in the residiuals?
thank you,
Alina Sheyman
|