Date: Wed, 31 Jul 2002 16:34:47 +0100
Reply-To: Peter Watson <peter.watson@mrc-cbu.cam.ac.uk>
Sender: "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From: Peter Watson <peter.watson@mrc-cbu.cam.ac.uk>
Subject: Re: question regarding PIN & POUT criteria in logistic regression
In-Reply-To: <5.1.0.14.0.20020730095909.00a0cec0@smtp.brisnet.org.au>
Content-Type: TEXT/PLAIN; charset=US-ASCII
Hi Bob,
One thought occurs to me: The large odds ratios and the large standard
errors (>1) giving rise to the large confidence intervals you describe
are sometimes symptomatic of what is termed infinite maximum likelihood
estimates.
This is because the groups can be close to non-overlapping wrt
one or more predictors evidenced by zero cells. Quasi-separate I
think is the term that has been given.
My understanding is one has to be careful interpreting statistical
tests in this case with inflated standard errors of estimates.
The chi-square statistic seems more reliable e.g.:
predictor
1 2
group 1 1 8
2 6 0
this has a highly significant likelihood chi-square of 14.45, p<0.001
using crosstabs or logistic regression "omnibus tests" as one might
have expected.
However, using the variables in the equation segment of the logistic
regression output, similar to yours, I get a huge 95% CI of 0 to 440045.2
and an analogous Wald chi-square statistic akin to the t-ratio of
B/s.e.(B) = 12.995/95.755. This says there is NO relation between the
predictor and group, thus, contradicting the usual chi-square statistic
and ones apriori belief.
So, it seems to me, in summary, the Wald part of the output should
be handled with care and a more robust way of looking at predictor effects
may be to look at the change in chi-squares between models containing
and not containing predictors of interest. I know some of these
variable selection procedures are based on this wald statistic.
best wishes
Peter
On Tue, 30 Jul 2002, Bob Green wrote:
> I am hoping for some advice regarding PIN & POUT criteria in logistic
> regression.
>
> Initially I used the following syntax (this is excerpt from the default
> syntax)
> /METHOD=ENTER
> /CRITERIA PIN(.05) POUT(.10) ITERATE(20) CUT(.5) .
>
> This resulted in some extremely large OR (the largest being 8048; CI: 0 -
> 6400000000000000)
>
> When I used the same variables, but substituted
> /METHOD=BSTEP(LR)
> /CRITERIA PIN(1) POUT(1) ITERATE(20) CUT(.5)
> the results were more meaningful (though some of the CI were large), i.e.
> the largest OR was 59; CI 5.4-665, and the variables with the largest OR
> were associated with the variables which seemed most notable when the raw
> data was examined.
>
> My understanding is that the larger probabilities, using the second syntax
> make it easier for variables to enter and remain in the model. Is this a
> problem?
>
> Any assistance is appreciated,
>
> Bob Green
>
--------------------------------------------------------------------------
" "
A pinch of probability is worth a pound of perhaps
James Thurber
Peter Watson
MRC Cognition and Brain Sciences Unit
15 Chaucer Road
Cambridge
CB2 2EF
Tel: +44 01223 355294 Ext. 801
Fax: +44 01223 359062
Email: peter.watson@mrc-cbu.cam.ac.uk