Date: Thu, 3 Feb 2005 17:27:40 0500
ReplyTo: "Luo, Peter" <pluo@draftnet.com>
Sender: "SPSSX(r) Discussion" <SPSSXL@LISTSERV.UGA.EDU>
From: "Luo, Peter" <pluo@draftnet.com>
Subject: Overfitting
ContentType: text/plain; charset="iso88591"
Hi list, I have a continuous predictor in my multivariate model. To make it
work better, I could bin the variable in either way
categorize it into 5 or 10 groups of equal size
or
categorize it in a way that maximizes its effects on dependent variable (For
example, use CHAID to determine the best split point)
then test this categorized variable in the model.
Well, I was 'criticized' that the second approach is trying to capitalize on
chance, the variable thus transformed may not hold in reality. And I do
remember reading somewhere a professor warned that you can transform the
predictors in whatever ways, so long as the transformation is not related to
the dependent variable.
I guess my question is: I understand the second categorization approach does
have a danger to overfit the model; but won't that be the case for every
other type of transformation, too? In other words, if the first approach
turns out a significant predictor, then I was not banking on chance?
