Date:         Tue, 27 Sep 2005 12:28:38 -0400
Reply-To:     Sigurd Hermansen <HERMANS1@WESTAT.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Sigurd Hermansen <HERMANS1@WESTAT.COM>
Subject:      Re: False Positive and False Negative
Comments: To: pa pa <>
Content-Type: text/plain; charset="us-ascii"

I wouldn't sweat the details of WEKA diagnostic statistics for this model. As I read the contents of the table, === Confusion Matrix === a b <-- classified as 19710 19 | a = 0 251 20 | b = 1 , it says that non-fraud instances classified as fraud amount to about the same as the number of instances of fraud that the model classifies correctly; further, the model misclassifies over 92% of instances of fraud as non-fraud. Having seen similar results when attempting to predict very small proportions of cancer causes of death using weak predictors, the results don't surprise me. Weak predictors predict correctly only a fraction of a small number of 'positives', and a very small false positive rate generates a large number of false positive predictions relative to true positive predictions. We also see that in screening tests for rare virus infections.

Fraud detection typically has a requirement that a model err on the side of precision. A 50% false prediction rate would keep a legal department busy for years. Sig

-----Original Message----- From: [] On Behalf Of pa pa Sent: Tuesday, September 27, 2005 12:25 AM To: SAS Subject: False Positive and False Negative

Hi there, I am using WEKA to detect frauds.I personally prefer SAS but I could not find a way to generate useful statistics out of the trained models.

I have few questions regarding the following outputs from WEKA that I suspect wrong (1 is Fraud and 0 is non-Fraud)

Correctly Classified Instances 19730 98.65 % Incorrectly Classified Instances 270 1.35 % Kappa statistic 0.1261 Mean absolute error 0.0184 Root mean squared error 0.1128 Relative absolute error 85.1049 % Root relative squared error 97.4474 % Total Number of Instances 20000 === Detailed Accuracy By Class === TP Rate FP Rate Precision Recall F-Measure Class 0.999 0.926 0.987 0.999 0.993 0 0.074 0.001 0.513 0.074 0.129 1 === Confusion Matrix === a b <-- classified as 19710 19 | a = 0 251 20 | b = 1

=> From this output of WEKA, I found that FP(0) = 251/(251+20)= 0.926 and FP(1) = 19/(19+19710)=0.513

The concept of FP for class 0 and 1 in WEKA is quite strange.

By applying the definition: "FP is class1 which was wrong classed as class 0" => FP = "class1 but classified as class0" / total number of class 1 = 251/(251+20) = 0.926 "FN is class 0 which is wrong classified as class 1" => FN = "class0 but classified as class1" / total number of class 0 = 19/(19+19710)=0.513

So, my FP is actually WEKA's FP(0) and my FN is WEKA's FP(1). Could you please confirm me which approach is appropriate? and Do I understand the definition correctly? Thanks Have a nice day PatrickTran

