Date: Tue, 2 Nov 2010 14:42:33 -0400
Reply-To: peterflomconsulting@mindspring.com
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Peter Flom <peterflomconsulting@MINDSPRING.COM>
Subject: Re: huge (>999.99) odds ratios: cause?
In-Reply-To: <AANLkTikrK25XqCVWCzO3WVO786YiARrhGy83pizDYdFa@mail.gmail.com>
Content-Type: text/plain; charset="us-ascii"
Jordan H wrote
<<<
Hello, all.
First, a little background. I've been asked to help with a project in which
the goal to develop a model that predicts high cost pharmacy expenditures
based on a variety of variables, such co-morbidities, demographics, etc. To
do this, a multivariate regression model was used. My client is also
interested in trying to model poor prediction within the multiple regression
model. To do this, they saved the residuals from PROC REG, made an
indicator variable for those observations with residuals greater than 1.75,
and ran a PROC LOGISTIC with the new indicator variable as the response
variable and the original independent variables, plus additional cost
variables, as predictors.
The model converges and most coefficients/odds ratios look reasonable but
some appear to be errors (odds ratios of >999.99, confidence intervals
(<0.001 - >999.99). We've checked things like multicollinearity but that
doesn't seem to be an issue.
Does anyone have an idea as to what could be going on?
Thank you for your consideration!
>>>
First, thanks for providing context.
Second, I don't think that's the right way to look at poor prediction.
Instead, I would look at the particular cases that have very high residuals
and see what they have in common, if anything. The residuals should not be
related to the IVs. If they are, something is wrong with the model. If you
wanted to model the residuals on the IVs, I would do it in OLS regression,
one variable at a time, and looking at lots of plots. In fact, I might ONLY
look at plots. A residual of 1.75 isn't some magic value.
Third, if you do decide to go this way, crazy ORs and CIs are usually the
result of zero cells or near zero cells in the crosstabs. So, I'd look
variable by variable at crosstabs (if the IV is categorical), or at parallel
box plots (if the IV is continuous).
HTH
Peter
Peter Flom PhD.
Peter Flom Consulting LLC
5 Penn Plaza, Ste 2342
NY NY 10001
www.statisticalanalysisconsulting.com
www.IAmLearningDisabled.com
|