Date: Fri, 2 Sep 2005 09:12:43 -0400
Reply-To: Peter Flom <flom@NDRI.ORG>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Peter Flom <flom@NDRI.ORG>
Subject: Re: On Hosmer-Lemeshow, etc. and Model Selection
Content-Type: text/plain; charset=US-ASCII
Bora et al.
I've snipped and answered what I can......lots of e-mails on this. I
can't wait to see what the
stats-gurus make of this (I know David and Dale are both on the West
Coast)
Me, I am no guru.
[BY: Yes, we categorise continuos variables too. What are the
drawbacks?]
Increased type 2 error, sometimes also increased type 1 error, less
sensible models.....
Think of it this way. Suppose you are trying to predict heart attacks.
One of your IVs is going to be age.
If you categorize it, into, say
< 18, 18-25, 26-35.......75 +
then you are saying that the risk of heart attack for a 55 year old is
the same as for a 64 year old, but that this risk changes at age 65, and
then stays constant to age 74.......
SOMETIMES categorizing makes sense - but rarely.
[BY: I've used "bagging" (bootstrapp averagging) at times and made use
of
information criteria (AIC, etc.) on model selection. But, apparently,
there does not exist a coherent methodology for selecting the "best"
model
and a wide raneg of conflicting practices and approaches exist.]
True. But, from what I can see, they are debating over the minutiae,
and any of the approaches
may yield some valuable insight.
Peter
Peter L. Flom, PhD
Assistant Director, Statistics and Data Analysis Core
Center for Drug Use and HIV Research
National Development and Research Institutes
71 W. 23rd St
www.peterflom.com
New York, NY 10010
(212) 845-4485 (voice)
(917) 438-0894 (fax)
|