Date: Fri, 11 Feb 2005 14:14:26 -0800
Reply-To: Markus Kemmelmeier <email@example.com>
Sender: "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From: Markus Kemmelmeier <firstname.lastname@example.org>
Subject: Re: Statistics Question
Content-Type: text/plain; charset="iso-8859-1"
To follow up on David Hitchin's comparison between ANOVA performed on continuous data and on dichotomous data: Consistent with this own results, there's a paper in the literature that suggests that the two converge rather nicely "where cell frequencies are equal under the following conditions: (a) the proportion of responses in teh smaller response category is equaly to or greater than .2 and there are at least 20 degrees of freedom for error, or (b) the proportion of responses in the smaller response category is less than .2 and there are at least 40 degrees of freedom for error" (from Lunney, G. H. (1970). Using analysis of variance with a dichotomous dependent variable: An empirical study. Journal of Educational Measurement, 7, 263-269.)
From: SPSSX(r) Discussion on behalf of David Hitchin
Sent: Fri 2/11/2005 4:31 AM
Subject: Re: Statistics Question
Quoting Marta García-Granero <email@example.com>:
> MK> Yours is a good example of the fact that ANOVA is not nearly
> MK> as robust against violations of normality as is often believed,
> MK> e.g., in my own field of social psychology. ANOVA is fairly
> MK> robust against violations of kurtosis, but is much more sensitive
> MK> toward violations of symmetry.
> I have read just the opposite. ANOVA is considered to be quite
> robust against violations of symmetry
I had a colleague who produced results from a rather large repeated
measures design in which the dependent variable took only the values
zero and one. He had analysed it using conventional ANOVA, but was
doubtful about the calculated p-value.
I set this up for a randomisation test, and gave it more than an hour's
worth of CPU time on a big machine. The p-value came out identical to
the third decimal place.
Now of course, while a zero-one distribution cannot produce normally
distributed residuals, neither can they be extremely asymmetrical and
there can't be large outliers, and there is no guarantee that other
similar experiments would produce ANOVA p-values anywhere near as close
to randomisation p-values.
I always begin by looking at the data, of course, and then a quick
ANOVA may be sufficient - if the p-value is 0.0001 or 0.888 then there
is little doubt about whether the results are significant at the 5%
level. If they are hovering in the 2%-10% range, then it's worth
thinking much more carefully about the analysis.
As Marta wrote, it's important to look for normality WITHIN each of the
subgroups - you don't need normality in the sample as a whole.
In my view Kolmogorov-Smirnov, Shapiro-Wilk and Homogeneity of Variance
with Levene Statistic don't tell you much that you can't see far more
clearly by plotting the data, where box-plots give you nearly all that
you need to know. The p-values from the tests are as much or more
related to sample size as to how non-normal the residuals are. You can
get highly significant results from large samples in which the non-
normality is so slight as to be no problem for conventional ANOVA tests.