Date: Wed, 3 Aug 2005 17:39:07 +0200
Reply-To: Marta García-Granero
<biostatistics@terra.es>
Sender: "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From: Marta García-Granero
<biostatistics@terra.es>
Organization: Asesoría Bioestadística
Subject: Re: zeros in predictors and categorical regression
In-Reply-To: <2081161075.1123067625@SSW0243.ssw.buffalo.edu>
Content-Type: text/plain; charset=ISO-8859-15
Hi Gene
eabe> I found something unexpected and don't understand the underlying math of
eabe> it. I was doing a logistic regression with a single categorical predictor
eabe> (IV) with 8 values. The frequencies on the IV shows no zeros (i.e., no
eabe> values with zero frequency). A crosstabs of the IV with the DV shows two
eabe> cells with zeros. When i run a logisitic regression the contrasts
eabe> representing those two cells have B coefficients of about -19 and standard
eabe> errors of 12,000 to 13,000. In two words, extremely large. I figure that i
eabe> if i combine several cells; i can get rid of the huge SEs. What i don't
eabe> understand is the arithmetic that yields to the huge SEs. Is it related to
eabe> the solution of the underlying equation for the logistic regression. Or,
eabe> could it be due to a collinearity problem? Perhaps a hugely technical
eabe> question, but i'm confident at least a few understand intimately the
eabe> algebra of logistic regression.
Logistic regression works with Odds Ratio. In a 2x2 contingency table:
Outcome+ Outcome-
RF+ a b
RF- c d
The OR is = (a·d)/(b·c)
If any b or c cell is null, then the OR can't be computed
You must collapse categories to avoid empty cells. There is no other
way around that problem.
Regards,
Marta mailto:biostatistics@terra.es
|