Date: Mon, 26 Sep 2005 21:25:28 -0700
Reply-To: pa pa <ctll04@YAHOO.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: pa pa <ctll04@YAHOO.COM>
Subject: False Positive and False Negative
Content-Type: text/plain; charset=iso-8859-1
Hi there,
I am using WEKA to detect frauds.I personally prefer SAS but I could not find a way to generate useful statistics out of the trained models.
I have few questions regarding the following outputs from WEKA that I suspect wrong (1 is Fraud and 0 is non-Fraud)
Correctly Classified Instances 19730 98.65 %
Incorrectly Classified Instances 270 1.35 %
Kappa statistic 0.1261
Mean absolute error 0.0184
Root mean squared error 0.1128
Relative absolute error 85.1049 %
Root relative squared error 97.4474 %
Total Number of Instances 20000
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure Class
0.999 0.926 0.987 0.999 0.993 0
0.074 0.001 0.513 0.074 0.129 1
=== Confusion Matrix ===
a b <-- classified as
19710 19 | a = 0
251 20 | b = 1
=> From this output of WEKA, I found that FP(0) = 251/(251+20)= 0.926 and FP(1) = 19/(19+19710)=0.513
The concept of FP for class 0 and 1 in WEKA is quite strange.
By applying the definition:
"FP is class1 which was wrong classed as class 0"
=> FP = "class1 but classified as class0" / total number of class 1
= 251/(251+20) = 0.926
"FN is class 0 which is wrong classified as class 1"
=> FN = "class0 but classified as class1" / total number of class 0
= 19/(19+19710)=0.513
So, my FP is actually WEKA's FP(0) and my FN is WEKA's FP(1).
Could you please confirm me which approach is appropriate? and Do I understand the definition correctly?
Thanks
Have a nice day
PatrickTran
---------------------------------
Yahoo! for Good
Click here to donate to the Hurricane Katrina relief effort.