Date: Fri, 20 Jun 2008 17:49:51 -0400
Reply-To: Peter Flom <peterflomconsulting@mindspring.com>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Peter Flom <peterflomconsulting@MINDSPRING.COM>
Subject: Re: Invitation for outlier tracking schemes
Content-Type: text/plain; charset=UTF-8
Well, all these are nice.... but I thought Mark was asking for *statistical* procedures to identify outliers through mechanisms other than just getting the (say) largest and smallest values.
Even for univariate data, simply knowing what the highest and lowest numbers are doesn't always say "outlier" (an ill defined term, in any case, usually defined in very vague ways).
When you are looking bivariately, things are trickier. One census report, in the early days of punch cards, apparently said there were 20,000 12 year old widows in the USA. Hmmmm. 12 year olds aren't unusual, widows aren't unusual, but 12 year old widows?
Then, when you get to multivariate data, it gets much more complex, although there are some graphical procedures that work well when the number of variables isn't TOO big (say, 10 or so). But *I* was dealing with a data set that had 524 variables.... some correlated highly. This was fun!
Peter
Peter L. Flom, PhD
Statistical Consultant
www DOT peterflom DOT com
|