```Date: Thu, 23 Oct 2003 13:57:41 -0700 Reply-To: Karriere Sucher Sender: "SAS(r) Discussion" From: Karriere Sucher Subject: Re: chi square analysis to identify the outliers Comments: To: cassell.david@EPAMAIL.EPA.GOV In-Reply-To: Content-Type: text/plain; charset=us-ascii >> Perhaps you could clarify why you need a chi-square test, and >> what KIND of chi-square test you are talking about? This isn't >> a homework problem, is it? No, this is not a homework. I am not a statistician and I briefly remember that there is a so called chi-square test for outliers. I may be wrong and that is why I am asking. But this is not a homework problem. >> [1] You computed the CI incorrectly. 30.4 is *NOT* the standard error >> of the mean that you need to use in your CI. The correct CI does not >> get anywhere nar 0 or 100. Can you enlighten on what the correct standard error is in this case and how to calculate it? Thanks a lot! "David L. Cassell" wrote:Karriere Sucher wrote: > I have a sample of math exam scores for 10 students. They are 8, 25, 35, 41, 50, > 75, 75, 79, 92, 99. How to use SAS to conduct a chi square analysis to find out > whether the student with the score 99 and the student with the scroe 8 are two > outliers. A chi-square to look for outliers in small samples? That doesn't seem to be reasonable. Grubbs' test (which uses an F distribution under the hood) is a good example of why you shouldn't be using a parametric test on teeny samples: when n < 6, it is notorious for leading people to reject *most* of the data, whether there are real outliers or not. Perhaps you could clarify why you need a chi-square test, and what KIND of chi-square test you are talking about? This isn't a homework problem, is it? What underlying assumptions are you willing to make about the distribution of grades? If you assume the grades should be distributed normally, then you have some parametric tests available to you. There are also some non-parametric tests available. > My second question is I calculated the mean to be 57.9 and the standard > deviation is 30.4. Then 95% CI will roughly be the range of -3 ~ 119 (mean plus / > minus two standard deviations). But in reality the scroe can only be in the range > of 0 ~ 100. How to interpret the negative score and the score that is greater than 100. [1] You computed the CI incorrectly. 30.4 is *NOT* the standard error of the mean that you need to use in your CI. The correct CI does not get anywhere nar 0 or 100. [2] If you do have a confidence interval that exceeds the reasonable interval, then you view the part above 100 (or the part below 0) as extraneous. HTH, David -- David Cassell, CSC Cassell.David@epa.gov Senior computing specialist mathematical statistician --------------------------------- Do you Yahoo!? The New Yahoo! Shopping - with improved product search ```

Back to: Top of message | Previous page | Main SAS-L page