let me give this a try...
I'll start with a situation where you have 'complete separation'....consider
the following data set with 6 observations, Y is the response and x is the
independent variable e.g. exposure level....
Obs y x
1 1 15
2 1 10
3 1 11
4 0 21
5 0 23
6 0 28
These data are completely separated because there is no overlap. Note that
we can easily come up with a rule that if x is < 20 then Y=1, and if x > 20
then y=0. With this rule, we get a perfect separation of observations into
their response classes. Now in logistic regression, during the iterations
to find the maximum likelihood estimates, the negative likelihood decreases
to 0 because it's a perfect fit and the procedure freaks out. You get a
message in the log that 'there is complete separation of data points, the
maximum likelihood does not exist, the validity of the model is
questionable'.
Now to quasicomplete separation:
If we add an observation 7 in the above data with y=0 and x=15, we still do
not have an overlap but we have an equality with observation # 1. In this
case the maximum likelihood does not exist....technically, the dispersion
matrix becomes unbound and you get large variances of the estimates. You can
easily tell the culprit variable by running a proc freq; tables x*y; and
look for lack of overlaps.
Hope this gives an idea of what's happening.... It's a Monday morning and I
can't think of good references at this point....maybe someone else will.
Anthony
Original Message
From: Alison Young [mailto:alisyoung@HOTMAIL.COM]
Sent: Sunday, May 06, 2001 9:50 PM
To: SASL@LISTSERV.UGA.EDU
Subject: what does "quasicomplete separation of
data" mean?
Hi,
I got the following warning message while I was running
logistic regression:
"WARNING: There is possibly a quasicomplete separation of
data
points. The maximum likelihood estimate may not
exist.
WARNING: The LOGISTIC procedure continues in spite of the
above
warning. Results shown are based on the last
maximum
likelihood iteration. Validity of the model fit is
questionable."
One of the odds ratios (there were several exposure levels)
came out as
>999.999 in the output.
Could anybody tell me what these all mean? Thanks.
