```Date: Mon, 7 May 2001 08:33:34 -0500 Reply-To: Anthony.Kilili@BMGDIRECT.COM Sender: "SAS(r) Discussion" From: Anthony Kilili Subject: Re: what does "quasi-complete separation of data" mean? Content-Type: text/plain; charset="iso-8859-1" let me give this a try... I'll start with a situation where you have 'complete separation'....consider the following data set with 6 observations, Y is the response and x is the independent variable e.g. exposure level.... Obs y x 1 1 15 2 1 10 3 1 11 4 0 21 5 0 23 6 0 28 These data are completely separated because there is no overlap. Note that we can easily come up with a rule that if x is < 20 then Y=1, and if x > 20 then y=0. With this rule, we get a perfect separation of observations into their response classes. Now in logistic regression, during the iterations to find the maximum likelihood estimates, the negative likelihood decreases to 0 because it's a perfect fit and the procedure freaks out. You get a message in the log that 'there is complete separation of data points, the maximum likelihood does not exist, the validity of the model is questionable'. Now to quasi-complete separation: If we add an observation 7 in the above data with y=0 and x=15, we still do not have an overlap but we have an equality with observation # 1. In this case the maximum likelihood does not exist....technically, the dispersion matrix becomes unbound and you get large variances of the estimates. You can easily tell the culprit variable by running a proc freq; tables x*y; and look for lack of overlaps. Hope this gives an idea of what's happening.... It's a Monday morning and I can't think of good references at this point....maybe someone else will. Anthony -----Original Message----- From: Alison Young [mailto:alisyoung@HOTMAIL.COM] Sent: Sunday, May 06, 2001 9:50 PM To: SAS-L@LISTSERV.UGA.EDU Subject: what does "quasi-complete separation of data" mean? Hi, I got the following warning message while I was running logistic regression: "WARNING: There is possibly a quasi-complete separation of data points. The maximum likelihood estimate may not exist. WARNING: The LOGISTIC procedure continues in spite of the above warning. Results shown are based on the last maximum likelihood iteration. Validity of the model fit is questionable." One of the odds ratios (there were several exposure levels) came out as >999.999 in the output. Could anybody tell me what these all mean? Thanks. _________________________________________________________________ Get your FREE download of MSN Explorer at http://explorer.msn.com ```

Back to: Top of message | Previous page | Main SAS-L page