LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (February 2005)Back to main SPSSX-L pageJoin or leave SPSSX-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Thu, 3 Feb 2005 17:27:40 -0500
Reply-To:     "Luo, Peter" <pluo@draftnet.com>
Sender:       "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From:         "Luo, Peter" <pluo@draftnet.com>
Subject:      Over-fitting
Content-Type: text/plain; charset="iso-8859-1"

Hi list, I have a continuous predictor in my multivariate model. To make it work better, I could bin the variable in either way

categorize it into 5 or 10 groups of equal size

or

categorize it in a way that maximizes its effects on dependent variable (For example, use CHAID to determine the best split point)

then test this categorized variable in the model.

Well, I was 'criticized' that the second approach is trying to capitalize on chance, the variable thus transformed may not hold in reality. And I do remember reading somewhere a professor warned that you can transform the predictors in whatever ways, so long as the transformation is not related to the dependent variable.

I guess my question is: I understand the second categorization approach does have a danger to overfit the model; but won't that be the case for every other type of transformation, too? In other words, if the first approach turns out a significant predictor, then I was not banking on chance?


Back to: Top of message | Previous page | Main SPSSX-L page