Date: Mon, 3 Oct 2005 13:47:15 -0400
Reply-To: Sigurd Hermansen <HERMANS1@WESTAT.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Sigurd Hermansen <HERMANS1@WESTAT.COM>
Subject: Re: complete separation in logistic regression
Content-Type: text/plain; charset="us-ascii"
Robin:
Thanks for the suggestions. Matthew Zack also suggested testing how the
FL Macro compares with PROC LOGISTIC exact.
I asked about the FL Macro because it accommodates continuous covariates
as predictors. I'll take a look at how it works with the valuable test
data that you supplied. I also have a reduced version of the model that
includes binary predictors only (although it generates enough
misclassifications to fall outside the 'almost complete separation'
class of models). I'll try the exact logistic procedure on it.
While the FL Macro reduces parameter estimates to something closer to
those in a typical model, its predicted values of the outcome have a
higher misclassification rate for test data than the model that suffers,
when applied to my data, from complete separation. I don't know how to
interpret that finding.
I appreciate your help with this statistical modelling problem. Glad to
know that someone else has looked at the results it produces.
Sig
-----Original Message-----
From: owner-sas-l@listserv.uga.edu [mailto:owner-sas-l@listserv.uga.edu]
On Behalf Of Robin High
Sent: Monday, October 03, 2005 11:56 AM
To: Sigurd Hermansen
Cc: SAS-L@LISTSERV.UGA.EDU
Subject: Re: complete separation in logistic regression
> Anyone have any experience with logistic regression under conditions
> of complete separation. Heinze and Schemper have publications and SAS
> Macro available at
> http://www.meduniwien.ac.at/msi/biometrie/programme/fl/
>
> I'd have a special interest in hearing from anyone who has used the
> algorithm and would know something about its performance
> characteristics. Would also like to hear whether SAS statistical
> PROC's handle complete or almost complete separation, or if someone
> has adapted NLMIXED or other procedures to the problem.
>
Hi Sig,
One way to examine how it works is to run a few test programs comparing
it to the output from PROC LOGISTIC with the EXACT statement as shown in
the example below. It is important to recognize the input data coding
scheme to get equivalent, thus the reason for the "descending" option
and the reference category treatment for time in the CLASS statement of
LOGISTIC.
The estimated odds ratios tend to vary widely when there is a '0' in one
of the cells. They are much closer when at least 1 observation is
present in every cell (just add a new observation 1 0 1 to the test
data).
The Firth macro merges files without BY statements, so be aware of that
if you have invoked:
OPTIONS mergeNoBy = error ;
I applied this procedure recently to a set of data with complete
separation and found helpful results. However, with a confidence
interval containing an upper bound that is close to "infinity" on the
odds ratio, not sure what, if any, technique would produce anything
better.
I don't believe NLMIXED would be relevant in this case, though I
continually learn new things about it and am continually amazed at what
it can do.
Robin High
Univ. of Oregon
TITLE1 'Compare Exact and Firth Logistic Regression';
Data one;
Input group time rsp @@;
Cards;
1 1 1 1 1 0 1 1 1 1 1 0 1 1 1
1 1 1 1 1 1 1 1 1 1 1 0 1 1 1
1 0 0 1 0 0 1 0 0 1 0 0 1 0 0
1 0 0 1 0 0 1 0 0 1 0 0 1 0 0
;
PROC TABULATE data=one NOseps ;
class time rsp;
table time, (rsp all='Tot')*n=' '*f=5.0 / rts=10 misstext='0'; TITLE2
'Data Summary'; run;
/*
----------------------------
| | rsp | |
| |-----------| |
| | 0 | 1 | Tot |
|--------+-----+-----+-----|
|time | | | |
|0 | 10| 0| 10|
|1 | 3| 7| 10|
---------------------------
*/
*
proc logistic data=one order=data descending;
CLASS time(ref=last) / param=ref;
MODEL rsp = time / risklimits expb ;
EXACT time / estimate=both;
TITLE1 'LOGISTIC: Compare MLE and Exact Calculations';
run;
* read in the firth macro;
%INCLUDE 'c:\sas\logistic\fl.sas';
* There are other options to choose from when you call it /
The macro assumes the binomial response is dummy coded (rsp=0/1)
and that classification data (e.g., gender) if present, are 'dummy'
coded as well. Apply PROC GLMMOD first if you need to recode data ;
%fl(data=one, y=rsp,
varlist= time ,
maxit=50, epsilon=0.0001, noint=0, outest=_est,
print=1, pl=1, plint=0, alpha=0.05, odds=1,
notes=0, standard=1);