LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (June 1997, week 4)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Sun, 22 Jun 1997 17:00:23 -0700
Reply-To:     Donald Peter Cram <doncram@LELAND.STANFORD.EDU>
Sender:       "SAS(r) Discussion" <SAS-L@UGA.CC.UGA.EDU>
From:         Donald Peter Cram <doncram@LELAND.STANFORD.EDU>
Organization: Stanford University, CA 94305, USA
Subject:      Re: Logist Regression and Dummy Categorical Vars: HELP

Raymond explains the condition of separation well, and that may well explain the problem for Timothy as he is using many dummy variables. Briefly, there may exist a linear combination of the independent variables that completely separates the data according to Timothy's dependent measure.

However, the SAS software in version 6.11 can give this warning message even when separation has not occured(sp?). Note the message says "possible" and "may". There is some mysterious criteria related to some measure of model fit that triggers this message. I have a data set which I have run in SAS versions 6.09 and 6.11 as well as in Splus and TSP. In this data set, the binary outcome data is only poorly explained by the independent data, but maximum likelihood estimates do exist. There is _not_ separation. It is perfectly okay for me to apply this model. SAS version 6.09 did not give the warning message; version 6.11 does and does a disservice to the user in my opinion. So I think the SAS/STAT developers should at least add more explanation to the warning note: the warning needs a warning that it may not be valid at all. And the criteria for the message should be explained in documentation. I can't find anything on it right now in the SAS/STAT Changes and Enhancements Through Release 6.11 manual.

regards Don

In article <>, Raymond V. Liedka <liedka@UNM.EDU> wrote: >On Thu, 19 Jun 1997, Timothy S. Killian wrote: > >> I am a new stat/sas person so please forgive my elementary question. I >> have a binary dependent variable and independent variables that are both >> continuos and categorical. My categorical variable has 7 levels. I >> created 6 (k-1) dummy variables. When I run the program, the listing >> file gives me the following warning >> >> WARNING: There is a possible quasi-complete separation in the sample >> points. The maximum likelihood estimate may not exist. >> WARNING: The LOGISTIC procedure continues in spite of the above >> warning. Results shown are based on the last maximum likelihood >> iteration. Validity of the model fit is questionable. >> >> I think I have followed the directions exactly and regardless of what I >> do, I get this result. >> >> If I have not provided enough info., let me know and I will be more >> specific. >> >> What am I doing wrong????? >> > >Nothing. You, Tim, are doing isn't your program code. This >problem is that one or more combinations of your independent variables >does not make any distinction between the two categories of your dependent >variable. > >For example, suppose you have two variables, FEMALE (=1 if yes, =0 if not) >and COLLEGE (=1 if college degree, =0 if not). The response variable is >PROFESSIONAL (=1 if a professional occupation, =0 if not). Suppose the >list-style cross classification of FEMALE*COLLEGE*PROFESSIONAL is: > > PROFESSIONAL > FEMALE COLLEGE YES NO > 1 1 23 14 > 1 0 0 45 > 0 1 56 23 > 0 0 6 38 > >What you see is that for the combination (Female, No college Degree) ALL >values of the response are the same...PROFESSIONAL=0...and there is no >ability/contribution of that combination to distinguishing whether someone >is a professional or not. It means that the likelihood function is >infinite at this point and the coefficient cannot be estimated. >This CAN occur with continuous variables, but means that there is some way >of cutting the continuous variable such that its combination (cut into >two) with the other variables is producing a perfect discrimination of the >response variable (all values are in one or the other category). > >You need to identify the combination, and likely drop a variable to >eliminate the problem. One simple way is to cross all the dummy and >discrete variables in a frequency distribution like above, and look to see >if there is a combination with perfect discrimination. > >It is clearly more difficult to identify whether a continuous variable is >causing this behavior. One way is to take turns deleting a variable and >seeing if the problem goes away. > >There is a SAS Technical Report by Ying So on the SAS Web Site. It's >title is "A Tutorial of Logistic Regression". > >ray > > > >Raymond V. Liedka >Department of Sociology >University of New Mexico

-- doncram at gsb dot stanford dot edu all lowercase http colon slash slash www hyphen leland dot stanford dot edu slash tilde doncram

Back to: Top of message | Previous page | Main SAS-L page