Date: Sun, 22 Jun 1997 17:00:23 0700
ReplyTo: Donald Peter Cram <doncram@LELAND.STANFORD.EDU>
Sender: "SAS(r) Discussion" <SASL@UGA.CC.UGA.EDU>
From: Donald Peter Cram <doncram@LELAND.STANFORD.EDU>
Organization: Stanford University, CA 94305, USA
Subject: Re: Logist Regression and Dummy Categorical Vars: HELP
Raymond explains the condition of separation well, and that may well
explain the problem for Timothy as he is using many dummy variables.
Briefly, there may exist a linear combination of the independent
variables that completely separates the data according to Timothy's
dependent measure.
However, the SAS software in version 6.11 can give this warning
message even when separation has not occured(sp?). Note the message
says "possible" and "may". There is some mysterious criteria related
to some measure of model fit that triggers this message. I have a
data set which I have run in SAS versions 6.09 and 6.11 as well as in
Splus and TSP. In this data set, the binary outcome data is only
poorly explained by the independent data, but maximum likelihood
estimates do exist. There is _not_ separation. It is perfectly okay
for me to apply this model. SAS version 6.09 did not give the warning
message; version 6.11 does and does a disservice to the user in my
opinion. So I think the SAS/STAT developers should at least add more
explanation to the warning note: the warning needs a warning that it
may not be valid at all. And the criteria for the message should be
explained in documentation. I can't find anything on it right now in
the SAS/STAT Changes and Enhancements Through Release 6.11 manual.
regards
Don
In article <Pine.A41.3.96.970621175133.17086A100000@pegasus.unm.edu>,
Raymond V. Liedka <liedka@UNM.EDU> wrote:
>On Thu, 19 Jun 1997, Timothy S. Killian wrote:
>
>> I am a new stat/sas person so please forgive my elementary question. I
>> have a binary dependent variable and independent variables that are both
>> continuos and categorical. My categorical variable has 7 levels. I
>> created 6 (k1) dummy variables. When I run the program, the listing
>> file gives me the following warning
>>
>> WARNING: There is a possible quasicomplete separation in the sample
>> points. The maximum likelihood estimate may not exist.
>> WARNING: The LOGISTIC procedure continues in spite of the above
>> warning. Results shown are based on the last maximum likelihood
>> iteration. Validity of the model fit is questionable.
>>
>> I think I have followed the directions exactly and regardless of what I
>> do, I get this result.
>>
>> If I have not provided enough info., let me know and I will be more
>> specific.
>>
>> What am I doing wrong?????
>>
>
>Nothing. You, Tim, are doing nothing...it isn't your program code. This
>problem is that one or more combinations of your independent variables
>does not make any distinction between the two categories of your dependent
>variable.
>
>For example, suppose you have two variables, FEMALE (=1 if yes, =0 if not)
>and COLLEGE (=1 if college degree, =0 if not). The response variable is
>PROFESSIONAL (=1 if a professional occupation, =0 if not). Suppose the
>liststyle cross classification of FEMALE*COLLEGE*PROFESSIONAL is:
>
> PROFESSIONAL
> FEMALE COLLEGE YES NO
> 1 1 23 14
> 1 0 0 45
> 0 1 56 23
> 0 0 6 38
>
>What you see is that for the combination (Female, No college Degree) ALL
>values of the response are the same...PROFESSIONAL=0...and there is no
>ability/contribution of that combination to distinguishing whether someone
>is a professional or not. It means that the likelihood function is
>infinite at this point and the coefficient cannot be estimated.
>This CAN occur with continuous variables, but means that there is some way
>of cutting the continuous variable such that its combination (cut into
>two) with the other variables is producing a perfect discrimination of the
>response variable (all values are in one or the other category).
>
>You need to identify the combination, and likely drop a variable to
>eliminate the problem. One simple way is to cross all the dummy and
>discrete variables in a frequency distribution like above, and look to see
>if there is a combination with perfect discrimination.
>
>It is clearly more difficult to identify whether a continuous variable is
>causing this behavior. One way is to take turns deleting a variable and
>seeing if the problem goes away.
>
>There is a SAS Technical Report by Ying So on the SAS Web Site. It's
>title is "A Tutorial of Logistic Regression".
>
>ray
>
>
>
>Raymond V. Liedka
>Department of Sociology
>University of New Mexico

doncram at gsb dot stanford dot edu
all lowercase http colon slash slash www hyphen leland dot stanford dot edu
slash tilde doncram
