LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (January 2006, week 1)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Wed, 4 Jan 2006 15:46:37 -0800
Reply-To:     David L Cassell <davidlcassell@MSN.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         David L Cassell <davidlcassell@MSN.COM>
Subject:      Re: MCD Outlier Determination
In-Reply-To:  <200601042038.k04K1uTK010570@malibu.cc.uga.edu>
Content-Type: text/plain; format=flowed

topkatz@MSN.COM wrote: >Happy New Year, and so forth. I'm returning to a topic I asked about >recently. I want to try using Minimum Covariance Determinant estimation >to locate outliers. SAS/IML has a function called MCD to implement this >methodology. Essentially, as I understand it, the algorithm finds >the "best" half of the data by minimizing the determinant of the >covariance matrix of a large number of subsamples, and then computing >robust Mahalanobis-type distances based on this "best" half. The >distances are then compared with a cutoff, and any distances above the >cutoff are considered outliers. The MCD function returns the set of >distance values, as well as a vector of zeroes and ones, where the zeroes >denote outliers, i.e., values of the robust distances above the cutoff >point. From what I can tell, the cutoff for the MCD function is ALWAYS >fixed as the square root of the .975 quantile of the chi-square >distribution with n degrees of freedom, where n is the number of >covariates. My question is: is there a way to vary the cutoff in the MCD >call, or do you have to do it by hand with the set of distances it returns?

You're sort of close on the method. You take h% of the data as your cutoff. You can't do exactly half, but you can do 1 + N/2. There's an upper limit as well. The default is something like (N+n+1)/2 where N and n are as you discussed, the number of obs and the number of regressors (including an intercept if you have one). Then you do a lot of sampling (hey, did someone say 'sampling'?) from the original to get that robust estimate, which is based on an objective function F_sub_MCD.

As I understand the SAS set-up, you cannot vary the cutoff in the MCD call. That's later on. You'll have to take the set of distances it returns and do the cutoff you prefer.

There is one alternative you could try. Rather than doing this via SAS/IML, you could try using PROC ROBUSTREG and letting it do the fit. You'd still be assigning your own cutoff and stuff. But it might be easier. It would make it easier to use ODS Statistical Graphics to plot the results.

HTH, David -- David L. Cassell mathematical statistician Design Pathways 3115 NW Norwood Pl. Corvallis OR 97330

_________________________________________________________________ Is your PC infected? Get a FREE online computer virus scan from McAfeeŽ Security. http://clinic.mcafee.com/clinic/ibuy/campaign.asp?cid=3963


Back to: Top of message | Previous page | Main SAS-L page