Date: Wed, 28 Sep 2005 21:58:47 +0100
Reply-To: Kathryn Gardner <firstname.lastname@example.org>
Sender: "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From: Kathryn Gardner <email@example.com>
Subject: Re: data screening help
Content-Type: text/plain; format=flowed
Thanks once again for taking the time to reply to my e-mail Richard, your
help is much appreciated.
I actually have skew, kurtosis, outliers etc on about 8 DVs and 3 IVs, but I
was actually under the impression that distributional requirements applied
to IVs as well.
By multivariate outliers I mean a case with a combination of extreme values
on two or more variables.
When I said non-linearity I was referring to the idea that variables that
are part of one scale should usually (but not not always) have linear
relationships (or atleast this is whatI can gather from my reading). I
thought that linearity was an assumption of parametric analyses.
You said that "the assumption is normality of the residuals, not of the body
of the data." I'm not too familar with residuals (not yet anyway), but
aren't residuals usually inspected post-analysis via multiple regression? If
so, this suggests that I can start my main analyses now and then screen for
normality later. But if I did this, what would the implications be of
finding residuals that suggested non-normality?
I understand what you're saying about the problem with transforming .."if
you think transforming is all right, you think that your scale of
measurement doesn't mean anything, as a scale". I am happy not to transform
or remove outliers, providing this still means that I run analyses such as
ANOVA and MR.
>From: Richard Ristow <firstname.lastname@example.org>
>To: Kathryn Gardner <email@example.com>,SPSSX-L@LISTSERV.UGA.EDU
>Subject: Re: data screening help
>Date: Wed, 28 Sep 2005 15:35:50 -0400
>At 01:25 PM 9/28/2005, Kathryn Gardner wrote:
>>Sorry, you're right in that I meant - among continuous variables screening
>>for outliers depends on whether data are grouped. So my question is "I am
>>using analyses that will involve both the use of grouped (ANOVA) and
>>ungrouped (Regression) data, so in light of this, how should I screen my
>>data? According to Tabachnick and Fidell, grouped data means screening
>>separately within each group, while ungrouped means screening among all
>>cases at once.
>Certainly, grouped rather than ungrouped. Among other things, if the
>grouping variable is an independent variable for your analysis (so, an
>ANOVA), if you look at the individual groups you're looking at the
>residuals. And, as I (and many others) have stated, the assumption is
>normality of the residuals, not of the body of the data.
>>I am aware of the debate surrounding transforming data and deleting
>>outliers, and do actually agree that variables should not be transformed
>>if they are real unusual values…
>>5)… but I thought that a normal distribution is one requirement for using
>>tests such as ANOVA, MR, FA and correlation. I have also been following
>>the book “Using Multivariate Statistics” by Tabachnick and Fidell, where
>>the advice is to deal with outliers by transforming or deleting them, and
>>to transform data to address skew and kurtosis. So…in a nutshell you are
>>suggesting that it’s best not to transform at all.
>I'm afraid I am suggesting that. Another way: as I said in one of the
>postings I quoted, if you think transforming is all right, you think that
>your scale of measurement doesn't mean anything, as a scale. That is, you
>think the order of values is meaningful, but the difference of values on
>your numerical scale is not, since you're willing to change that to make
>data "work better." I would argue that, if you believe that is right, you
>also believe your data is of ordinal level only, and should do
>non-parametric analysis. That, incidentally, eliminates particular
>sensitivity to outliers.
>>If I don’t transform my data or delete outliers etc, this means that I
>>have about 8 variables with skewness and kurtosis, univariate and
>>multivariate outliers and non-linearity etc.
>First, I assume you mean 8 dependent variables. I'm not aware of any
>distributional requirements for independent variables.
>Excuse me if I'm out of my depth: could you say what a multivariate outlier
>And when you say non-linearity, in what sense do you mean it? If you do
>think that a quantity affects the outcome, or is affected, non-linearly,
>and you have an argument what the shape of the non-linear effect is, by all
>means transform accordingly. (See the example of income.)
>>My data (outliers) are actually genuine unusual scores. So in light of our
>>discussion so far then, my other questions are: a) if my data is skewed
>>with kurtosis and outliers etc, am I best to simply leave this as it is?
>>If so… b) …am I OK perform analyses such as ANOVA, MR, FA and pearson
>>correlation on this data? c) should I not at least deal with “really
>As a maybe naive way of looking at it, see the discussion of "cloud and
>outlier" distributions that was part of the last post.
>In essence: what mechanism seems to be underlying a distribution that
>generates the majority of values within a limited range, and a small
>minority far outside that range?
>I can't solve the problem. Given what you're seeing, one might hypothesize
>two underlying mechanisms: a 'normal' one that generates variation within
>the range where most of your observations lie, and a 'special' one that
>operates in only a small minority of cases, but generates very large
>values. You might, then, trim your outliers, and say explicitly that you're
>looking only for the 'normal' mechanism. But in most real situation,
>extreme values matter. It's good if you can get some idea under what
>circumstance the 'special' mechanism is invoked.
>And, I'm afraid, that's as far as I can go with my statistical knowledge,
>and without subject-specific information.
>Again, good luck,
MSN Messenger 7.5 is now out. Download it for FREE here.