LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (August 2008, week 3)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Tue, 19 Aug 2008 13:44:16 -0400
Reply-To:     Peter Flom <>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Peter Flom <peterflomconsulting@MINDSPRING.COM>
Subject:      Re: Proc logistic - Significant categorical variable with
Comments: To: Vadim Pliner <Vadim.Pliner@VERIZONWIRELESS.COM>
Content-Type: text/plain; charset=UTF-8

Vadim Pliner <Vadim.Pliner@VERIZONWIRELESS.COM> wrote

>Peter, > >we may be talking about different things. Let me give you an example. >Let's say one of our predictors (independent variables) is age >categorized into several groups. Now, suppose you are finding that if >you collapse the 'youngest' group with the 'oldest' one you will get a >better model (say, predicting better on a validation data set). Do >those two categories make sense together? I'd say I don't care as long >as I am getting a better model and the two groups being together >doesn't contradict any prior knowledge. Well, I imagine an >epidemiologist or a sociologist would start thinking: "how do I >interpret this? what does this mean?", something along those lines. In >predictive modeling I cannot see a problem here. If you can come up >with a better example clarifying your point of view, I'd be glad to >discuss.... >And I agree with you that there are ways to check for spurious >relationships. > >2 more cents from me :-)

Interesting discussion!

At this point, I've forgotten the original situation.

I should also note that I come from the world where explanation is important, rather than just prediction. I've worked is a bunch of fields, but never in, say, credit risk, where prediction is by far the most important thing.

Let's take credit risk, and your example of combining 'youngest' and 'oldest' and then checking on a validation set. I have to admit, if you did this, and it worked well on the validation set, I'd go with it. For credit risk, I could even come up with an explanation: People in the youngest and oldest groups are probably also the most likely to be not working (although for different reasons) and to be subject to sudden changes in income.

I guess my training just makes me very suspicious of models that don't make sense, even if they work :-)


Peter L. Flom, PhD Statistical Consultant www DOT peterflom DOT com

Back to: Top of message | Previous page | Main SAS-L page