Date: Tue, 27 Sep 2005 12:28:38 -0400
Reply-To: Sigurd Hermansen <HERMANS1@WESTAT.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Sigurd Hermansen <HERMANS1@WESTAT.COM>
Subject: Re: False Positive and False Negative
Content-Type: text/plain; charset="us-ascii"
I wouldn't sweat the details of WEKA diagnostic statistics for this
model. As I read the contents of the table,
=== Confusion Matrix ===
a b <-- classified as
19710 19 | a = 0
251 20 | b = 1 ,
it says that non-fraud instances classified as fraud amount to about the
same as the number of instances of fraud that the model classifies
correctly; further, the model misclassifies over 92% of instances of
fraud as non-fraud. Having seen similar results when attempting to
predict very small proportions of cancer causes of death using weak
predictors, the results don't surprise me. Weak predictors predict
correctly only a fraction of a small number of 'positives', and a very
small false positive rate generates a large number of false positive
predictions relative to true positive predictions. We also see that in
screening tests for rare virus infections.
Fraud detection typically has a requirement that a model err on the side
of precision. A 50% false prediction rate would keep a legal department
busy for years.
Sig
-----Original Message-----
From: owner-sas-l@listserv.uga.edu [mailto:owner-sas-l@listserv.uga.edu]
On Behalf Of pa pa
Sent: Tuesday, September 27, 2005 12:25 AM
To: SAS
Subject: False Positive and False Negative
Hi there,
I am using WEKA to detect frauds.I personally prefer SAS but I could not
find a way to generate useful statistics out of the trained models.
I have few questions regarding the following outputs from WEKA that I
suspect wrong (1 is Fraud and 0 is non-Fraud)
Correctly Classified Instances 19730 98.65 %
Incorrectly Classified Instances 270 1.35 %
Kappa statistic 0.1261
Mean absolute error 0.0184
Root mean squared error 0.1128
Relative absolute error 85.1049 %
Root relative squared error 97.4474 %
Total Number of Instances 20000
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure Class
0.999 0.926 0.987 0.999 0.993 0
0.074 0.001 0.513 0.074 0.129 1
=== Confusion Matrix ===
a b <-- classified as
19710 19 | a = 0
251 20 | b = 1
=> From this output of WEKA, I found that FP(0) = 251/(251+20)= 0.926
and FP(1) = 19/(19+19710)=0.513
The concept of FP for class 0 and 1 in WEKA is quite strange.
By applying the definition:
"FP is class1 which was wrong classed as class 0"
=> FP = "class1 but classified as class0" / total number of class 1
= 251/(251+20) = 0.926
"FN is class 0 which is wrong classified as class 1"
=> FN = "class0 but classified as class1" / total number of class 0
= 19/(19+19710)=0.513
So, my FP is actually WEKA's FP(0) and my FN is WEKA's FP(1). Could you
please confirm me which approach is appropriate? and Do I understand the
definition correctly? Thanks Have a nice day PatrickTran
---------------------------------
Yahoo! for Good
Click here to donate to the Hurricane Katrina relief effort.