Date: Wed, 26 Mar 2008 15:17:03 -0500
Reply-To: Tom White <tw2@MAIL.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Tom White <tw2@MAIL.COM>
Subject: Re: PROC LOGISTIC MODEL--Standardize vars?
Content-Type: text/plain; charset="iso-8859-1"
Chang writes below:
"Forgoing a lot of data by aggregating observations into doctor (or provider) level --
this will surely eliminate biases and minimize information loss if you aggregate sensibly"
Chang, do you mean aggregate
all the claims from a provider for a specific patient?
(So if I have been a patient of Dr. XYZ for the past 5 years and I have about 10 claims in my file,
you mean aggregate these claims?)
or do you mean
aggragte all the claims from a provider?
(So if provider ABC had 20 patients over the past 5 years who have generated 50 claims,
do you mean aggregate these 50 claims?)
If I manage to do one or the other, then is it ok to use LOGISTIC?
But then the proble is that we don't have a FRAUD or NO_FRAUD claim status since the claims have been aggregated.
That's why i wroye to Sig becuase hw was talking about providers whereas I was talking about claims.
I have historical claimd data and the fraud status on those claims.
I don't have fraud status on a provider since a provider will not always be fraudulent.
Therefore at this point, someone please help me understand how I can aggregate the claims (and their associated
fraud or no_fraud status) and still being able to predict the probability that the next CLAIM (not PROVIDER)
that comes through the door is fraudulent or not?
Right now, I just don't see this aggregation concept?
I think we are mixing provider fraud (the prob. a provider is fraudulent) and claims fraud (the prob. a claim
coming from a provider is fraudulent). I am interested in the later not the former (which doesn't make sense
to me right atvthis moment).
Thanks.
Tom
But if I do this Chang, i.e. somehow take all the claims coming from a provider and aggregate them
so that I have one piece of information
----- Original Message -----
From: "Chang Chung"
To: SAS-L@LISTSERV.UGA.EDU, "Tom White"
Subject: Re: PROC LOGISTIC MODEL--Standardize vars?
Date: Wed, 26 Mar 2008 15:38:26 -0400
On Wed, 26 Mar 2008 13:42:43 -0500, Tom White wrote:
> Chang writes below:
>
> "you have data that are not independent observations because
> a same doctor can submit multiple claims over time."
>
> Chang is absolutely correct. That's the nature of claim analysis.
> Doctors do submit multiple claims on the same patients over time.
>
> Therefore, what do I do now?
>
> Do I toss out PROC LOGISTIC?
> What do I replace it with given what I am trying to do here?
hi, Tom,
I am not sure what is the best way to approach this. You may get a better
luck consulting a qualified statistician or a fraud detection expert. You
can probably start with a logistic regression models and see how it
performs, then try other ways and see if any of them improves the prediction
performance.
Some of the other ways are: Including lagged dependent variable(s) as
predictors will help; Forgoing a lot of data by aggregating observations
into doctor (or provider) level -- this will surely eliminate biases and
minimize information loss if you aggregate sensibly; utilizing adjustments
provided by "robust" estimators; Modeling the clustering directly with mixed
or hierarchical models, and so on.
On the other hand, you can go totally different ways. see if SASĀ® Fraud
Management solution (http://www.sas.com/industry/fsi/fraud/index.html) can
help, which seems to be training a neural network to make predictions. hope
this helps a bit.
cheers,
chang
--
Want an e-mail address like mine?
Get a free e-mail account today at www.mail.com!