Date: Wed, 26 Mar 2008 13:50:49 -0400
Reply-To: Chang Chung <chang_y_chung@HOTMAIL.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Chang Chung <chang_y_chung@HOTMAIL.COM>
Subject: Re: PROC LOGISTIC MODEL--Standardize vars?
Given the background, i can tell you right away what would be the most
significant predictor of all. It would be whether or not the doctor was
fraudulent on his/her last claim.
Standardization can help and in no time hurt model estimation. But i don't
think it is one of your biggest problems given the background. you have data
that are not independent observations because a same doctor can submit
multiple claims over time.
i don't think it matters much in estimation that you have 1% fraud rate. (it
does in our lives--that is way too high!) and don't think over-sampling
would do any good in building a predictive model. I don't know how you would
weigh down anything after being done estimating a model, either.
if you put aside some data for validation or evaluation or the model, then
do it with a random sample. putting aside a year's worth of data is
definitely not a good idea.
in evaluating the model, the associated cost has to be considered. You can
have a model that catches most frauds, but also falsely accuses many
innocent doctors and incurs a lot of cost of investigation.