Date: Thu, 10 Jan 2002 17:14:08 -0500
Reply-To: john.hixon@KODAK.COM
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: john.hixon@KODAK.COM
Subject: Re: Logistic regression in PROC LOGISTIC vs GENMOD
Content-type: text/plain; charset=us-ascii
From: John Hixon
/*
Jennifer asked:
>I've come across an inconsistency in the results output from LOGISTIC and
>GENMOD for a logistic regression. The inconsistency appears with
>categorical predictor variables, the coefficient and standard error
>estimates from GENMOD are exactly double the LOGISTIC estimates for
>dichotomous variables, and for the x5 (values 1,2,3) variable they don't
>seem to have an exact relationship. This occurs for the dichotomous
>variables when they are included in the CLASS statement.
>The estimates for the continuous variables agree regardless of what other
>variables are in the model. I believe that the LOGISTIC and GENMOD code
>should be providing the same models. Can someone help explain this? I hope
>there is just something simple I'm missing. [Example dataset and code below]
>I've got v8.2 TS02M0 on Win98.
>Thanks!
If you run the code below you will see that LOGISTIC uses a different
parameterization for X1 the GENMOD. When you use the
Class x1
statement in Proc LOGISTIC, it codes x1 to levels of -1 and +1.
Proc GENMOD codes x1 as 0 1.
You have already pointed out that if you do not use a class
statement in LOGISTIC, then it matches GENMOD.
*/
dm 'clear log';
dm 'clear output';
goptions reset=all;
options ls=100 ps=60;
goptions ftitle=swiss ftext=swiss htitle=4 pct hby=2.75 pct cby=red;
goptions vsize=6.1 hsize=6.75;
option nodate nonumber;
data test;
input y x1 x2 x3 x4 x5;
cards;
1 1 0 2.45 16.12 1
1 0 1 3.45 13.18 2
1 1 1 2.34 14.27 3
1 1 1 3.12 24.23 3
1 1 1 2.34 16.56 3
1 1 0 3.89 14.34 2
1 0 0 1.34 20.56 2
0 0 0 1.56 18.45 1
0 1 0 1.34 15.45 1
0 0 1 2.14 20.34 2
0 0 1 2.56 19.53 3
0 0 0 2.32 18.45 3
0 1 0 1.89 19.98 2
0 0 0 2.68 18.45 2
0 0 0 2.98 16.12 1
0 0 0 2.57 12.34 2
;
run;
/*MODEL 1*/
proc logistic data=test descending;
class x1 / order=formatted;
model y=x1;
run;
*
Note that when you use "Class x1" in Proc Logistic,
it uses a design matrix of -1 and +1 for the class variable X1
The LOGISTIC Procedure
Model Information
Data Set WORK.TEST
Response Variable y
Number of Response Levels 2
Number of Observations 16
Model binary logit
Optimization Technique Fisher's scoring
Response Profile
Ordered Total
Value y Frequency
1 1 7
2 0 9
Probability modeled is y=1.
Class Level Information
Design
Variables
Class Value 1
x1 0 1
1 -1
;
proc genmod data=test descending;
class x1;
model y=x1 /dist=bin link=logit ;
run; quit;
* But, the GENMOD procedure uses a different parameterization.
It uses values of 0, 1 for the levels of x1. So, I guess it is
sensible that the estimated effect is twice as large? The Class
variable x1 changed from 0 to 1 in this design, but changed
from -1 to 1 in the Logistic Parameterization. This does
seem quite confusing.
The GENMOD Procedure
Model Information
Data Set WORK.TEST
Distribution Binomial
Link Function Logit
Dependent Variable y
Observations Used 16
Class Level Information
Class Levels Values
x1 2 0 1
Response Profile
Ordered Total
Value y Frequency
1 1 7
2 0 9
You have already pointed out that if you do not use a class
statement in LOGISTIC, then it matches GENMOD.
Interesting.
HTH
John Hixon
Eastman Kodak Co
Rochester, NY USA
;