Date: Mon, 7 May 2001 08:33:34 -0500
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Anthony Kilili <Anthony.Kilili@BMGDIRECT.COM>
Subject: Re: what does "quasi-complete separation of data" mean?
Content-Type: text/plain; charset="iso-8859-1"
let me give this a try...
I'll start with a situation where you have 'complete separation'....consider
the following data set with 6 observations, Y is the response and x is the
independent variable e.g. exposure level....
Obs y x
1 1 15
2 1 10
3 1 11
4 0 21
5 0 23
6 0 28
These data are completely separated because there is no overlap. Note that
we can easily come up with a rule that if x is < 20 then Y=1, and if x > 20
then y=0. With this rule, we get a perfect separation of observations into
their response classes. Now in logistic regression, during the iterations
to find the maximum likelihood estimates, the negative likelihood decreases
to 0 because it's a perfect fit and the procedure freaks out. You get a
message in the log that 'there is complete separation of data points, the
maximum likelihood does not exist, the validity of the model is
Now to quasi-complete separation:
If we add an observation 7 in the above data with y=0 and x=15, we still do
not have an overlap but we have an equality with observation # 1. In this
case the maximum likelihood does not exist....technically, the dispersion
matrix becomes unbound and you get large variances of the estimates. You can
easily tell the culprit variable by running a proc freq; tables x*y; and
look for lack of overlaps.
Hope this gives an idea of what's happening.... It's a Monday morning and I
can't think of good references at this point....maybe someone else will.
From: Alison Young [mailto:alisyoung@HOTMAIL.COM]
Sent: Sunday, May 06, 2001 9:50 PM
Subject: what does "quasi-complete separation of
I got the following warning message while I was running
"WARNING: There is possibly a quasi-complete separation of
points. The maximum likelihood estimate may not
WARNING: The LOGISTIC procedure continues in spite of the
warning. Results shown are based on the last
likelihood iteration. Validity of the model fit is
One of the odds ratios (there were several exposure levels)
came out as
>999.999 in the output.
Could anybody tell me what these all mean? Thanks.
Get your FREE download of MSN Explorer at