LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (September 2005, week 4)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Tue, 27 Sep 2005 12:28:38 -0400
Reply-To:     Sigurd Hermansen <HERMANS1@WESTAT.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Sigurd Hermansen <HERMANS1@WESTAT.COM>
Subject:      Re: False Positive and False Negative
Comments: To: pa pa <ctll04@yahoo.com>
Content-Type: text/plain; charset="us-ascii"

I wouldn't sweat the details of WEKA diagnostic statistics for this model. As I read the contents of the table, === Confusion Matrix === a b <-- classified as 19710 19 | a = 0 251 20 | b = 1 , it says that non-fraud instances classified as fraud amount to about the same as the number of instances of fraud that the model classifies correctly; further, the model misclassifies over 92% of instances of fraud as non-fraud. Having seen similar results when attempting to predict very small proportions of cancer causes of death using weak predictors, the results don't surprise me. Weak predictors predict correctly only a fraction of a small number of 'positives', and a very small false positive rate generates a large number of false positive predictions relative to true positive predictions. We also see that in screening tests for rare virus infections.

Fraud detection typically has a requirement that a model err on the side of precision. A 50% false prediction rate would keep a legal department busy for years. Sig

-----Original Message----- From: owner-sas-l@listserv.uga.edu [mailto:owner-sas-l@listserv.uga.edu] On Behalf Of pa pa Sent: Tuesday, September 27, 2005 12:25 AM To: SAS Subject: False Positive and False Negative

Hi there, I am using WEKA to detect frauds.I personally prefer SAS but I could not find a way to generate useful statistics out of the trained models.

I have few questions regarding the following outputs from WEKA that I suspect wrong (1 is Fraud and 0 is non-Fraud)

Correctly Classified Instances 19730 98.65 % Incorrectly Classified Instances 270 1.35 % Kappa statistic 0.1261 Mean absolute error 0.0184 Root mean squared error 0.1128 Relative absolute error 85.1049 % Root relative squared error 97.4474 % Total Number of Instances 20000 === Detailed Accuracy By Class === TP Rate FP Rate Precision Recall F-Measure Class 0.999 0.926 0.987 0.999 0.993 0 0.074 0.001 0.513 0.074 0.129 1 === Confusion Matrix === a b <-- classified as 19710 19 | a = 0 251 20 | b = 1

=> From this output of WEKA, I found that FP(0) = 251/(251+20)= 0.926 and FP(1) = 19/(19+19710)=0.513

The concept of FP for class 0 and 1 in WEKA is quite strange.

By applying the definition: "FP is class1 which was wrong classed as class 0" => FP = "class1 but classified as class0" / total number of class 1 = 251/(251+20) = 0.926 "FN is class 0 which is wrong classified as class 1" => FN = "class0 but classified as class1" / total number of class 0 = 19/(19+19710)=0.513

So, my FP is actually WEKA's FP(0) and my FN is WEKA's FP(1). Could you please confirm me which approach is appropriate? and Do I understand the definition correctly? Thanks Have a nice day PatrickTran

--------------------------------- Yahoo! for Good Click here to donate to the Hurricane Katrina relief effort.


Back to: Top of message | Previous page | Main SAS-L page