Date: Wed, 21 Oct 2009 11:48:17 -0400
Reply-To: "Fehd, Ronald J. (CDC/CCHIS/NCPHI)" <rjf2@CDC.GOV>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Fehd, Ronald J. (CDC/CCHIS/NCPHI)" <rjf2@CDC.GOV>
Subject: Re: Bootstrap to find outliers
In-Reply-To: <886fb756-e2d1-4895-9703-4ec5635b5cce@b18g2000vbl.googlegroups.com>
Content-Type: text/plain; charset=us-ascii
> From: Bminer
> Sent: Wednesday, October 21, 2009 11:02 AM
> Subject: Bootstrap to find outliers
>
> I wanted to toss this out to the group to comment on. What does
> everyone think about the use of bootstrapped confidence interval to
> identify outliers in a data set that will be used for predictive
> modleing?
>
> For simplicity sake, this is looking at a single variable.
>
> Basically, I am wondering about taking the (it is large) sample,
> resampling, building a distribution of the bootstrap mean or median
> and then building a confidence interval (using percentile method, Bca
> what ever). Those values outside say a 99% CI would be considered
> outliers.
>
> Is there any fatal flaw in this approach?
>
> Thanks!
see this paper and program
ChekOut: A Program to screen for outliers
http://www2.sas.com/proceedings/sugi23/Posters/p197.pdf
http://www.sascommunity.org/mwiki/images/d/d3/ChkOut.sas
Ron Fehd the macro maven CDC Atlanta GA USA RJF2 at cdc dot gov
|