```Date: Thu, 23 Oct 2003 16:50:22 -0700 Reply-To: cassell.david@EPAMAIL.EPA.GOV Sender: "SAS(r) Discussion" From: "David L. Cassell" Subject: Re: chi square analysis to identify the outliers Content-type: text/plain; charset=US-ASCII Karriere Sucher kindly replied: > No, this is not a homework. I am not a statistician and I briefly remember > that there is a so called chi-square test for outliers. I may be wrong and > that is why I am asking. But this is not a homework problem. That's quite reassuring. I was somewhat concerned. There isn't a simple chi-squared test for univariate outliers. And if there were, it would probably be assuming normally-distributed data, which is always a problem with really small data sets. I don't think that you can reasonably rule out your high value when you can't really assume normality in your data. (You have already thought about the fact that your data are restricted between 0 and 100, and a true normal distribution wouldn't have that restriction.) With a standard deviation of 30.4 , all your points are within two standard deviations of your sample mean. I recommend that you trying plotting the data. Try something simple, like: proc univariate data=yourdatasetname plot normal; var yourvariable; run; You'll see that with this small a data set and this much variability, there are no serious outliers showing up on the boxplot. The q-q plot doesn't look that bad for 10 points. None of the normality tests will reject the assumption of normality with this few points being shaped in a nice mound-shape. I would certainly say that 8 out of 100 is a bad grade. But it doesn't look like an outlier given the rest of the data. Sorry. >> [1] You computed the CI incorrectly. 30.4 is *NOT* the standard error >> of the mean that you need to use in your CI. The correct CI does not >> get anywhere near 0 or 100. > > Can you enlighten on what the correct standard error is in this case and how to calculate it? Thanks a lot! Okay, since this isn't a homework problem, I will. You forgot to divide your standard deviation by the square root of n to get the standard error of the mean. So the real CI will be less than one-third the width of the one you came up with. HTH, David -- David Cassell, CSC Cassell.David@epa.gov Senior computing specialist mathematical statistician ```

Back to: Top of message | Previous page | Main SAS-L page