LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (September 2005)Back to main SPSSX-L pageJoin or leave SPSSX-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Wed, 28 Sep 2005 21:58:47 +0100
Reply-To:     Kathryn Gardner <kjgardner10@hotmail.com>
Sender:       "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From:         Kathryn Gardner <kjgardner10@hotmail.com>
Subject:      Re: data screening help
Comments: To: wrristow@mindspring.com
In-Reply-To:  <5.1.0.14.2.20050928145955.02f49dd0@pop.mindspring.com>
Content-Type: text/plain; format=flowed

Thanks once again for taking the time to reply to my e-mail Richard, your help is much appreciated.

I actually have skew, kurtosis, outliers etc on about 8 DVs and 3 IVs, but I was actually under the impression that distributional requirements applied to IVs as well.

By multivariate outliers I mean a case with a combination of extreme values on two or more variables.

When I said non-linearity I was referring to the idea that variables that are part of one scale should usually (but not not always) have linear relationships (or atleast this is whatI can gather from my reading). I thought that linearity was an assumption of parametric analyses.

You said that "the assumption is normality of the residuals, not of the body of the data." I'm not too familar with residuals (not yet anyway), but aren't residuals usually inspected post-analysis via multiple regression? If so, this suggests that I can start my main analyses now and then screen for normality later. But if I did this, what would the implications be of finding residuals that suggested non-normality?

I understand what you're saying about the problem with transforming .."if you think transforming is all right, you think that your scale of measurement doesn't mean anything, as a scale". I am happy not to transform or remove outliers, providing this still means that I run analyses such as ANOVA and MR.

Thanks

Kathryn

>From: Richard Ristow <wrristow@mindspring.com> >To: Kathryn Gardner <kjgardner10@hotmail.com>,SPSSX-L@LISTSERV.UGA.EDU >Subject: Re: data screening help >Date: Wed, 28 Sep 2005 15:35:50 -0400 > >At 01:25 PM 9/28/2005, Kathryn Gardner wrote: > >>Sorry, you're right in that I meant - among continuous variables screening >>for outliers depends on whether data are grouped. So my question is "I am >>using analyses that will involve both the use of grouped (ANOVA) and >>ungrouped (Regression) data, so in light of this, how should I screen my >>data? According to Tabachnick and Fidell, grouped data means screening >>separately within each group, while ungrouped means screening among all >>cases at once. > >Certainly, grouped rather than ungrouped. Among other things, if the >grouping variable is an independent variable for your analysis (so, an >ANOVA), if you look at the individual groups you're looking at the >residuals. And, as I (and many others) have stated, the assumption is >normality of the residuals, not of the body of the data. > >>I am aware of the debate surrounding transforming data and deleting >>outliers, and do actually agree that variables should not be transformed >>if they are real unusual values… >> >>5)… but I thought that a normal distribution is one requirement for using >>tests such as ANOVA, MR, FA and correlation. I have also been following >>the book “Using Multivariate Statistics” by Tabachnick and Fidell, where >>the advice is to deal with outliers by transforming or deleting them, and >>to transform data to address skew and kurtosis. So…in a nutshell you are >>suggesting that it’s best not to transform at all. > >I'm afraid I am suggesting that. Another way: as I said in one of the >postings I quoted, if you think transforming is all right, you think that >your scale of measurement doesn't mean anything, as a scale. That is, you >think the order of values is meaningful, but the difference of values on >your numerical scale is not, since you're willing to change that to make >data "work better." I would argue that, if you believe that is right, you >also believe your data is of ordinal level only, and should do >non-parametric analysis. That, incidentally, eliminates particular >sensitivity to outliers. > >>If I don’t transform my data or delete outliers etc, this means that I >>have about 8 variables with skewness and kurtosis, univariate and >>multivariate outliers and non-linearity etc. > >First, I assume you mean 8 dependent variables. I'm not aware of any >distributional requirements for independent variables. > >Excuse me if I'm out of my depth: could you say what a multivariate outlier >is? > >And when you say non-linearity, in what sense do you mean it? If you do >think that a quantity affects the outcome, or is affected, non-linearly, >and you have an argument what the shape of the non-linear effect is, by all >means transform accordingly. (See the example of income.) > >>My data (outliers) are actually genuine unusual scores. So in light of our >>discussion so far then, my other questions are: a) if my data is skewed >>with kurtosis and outliers etc, am I best to simply leave this as it is? >>If so… b) …am I OK perform analyses such as ANOVA, MR, FA and pearson >>correlation on this data? c) should I not at least deal with “really >>extreme” outliers? > >As a maybe naive way of looking at it, see the discussion of "cloud and >outlier" distributions that was part of the last post. > >In essence: what mechanism seems to be underlying a distribution that >generates the majority of values within a limited range, and a small >minority far outside that range? > >I can't solve the problem. Given what you're seeing, one might hypothesize >two underlying mechanisms: a 'normal' one that generates variation within >the range where most of your observations lie, and a 'special' one that >operates in only a small minority of cases, but generates very large >values. You might, then, trim your outliers, and say explicitly that you're >looking only for the 'normal' mechanism. But in most real situation, >extreme values matter. It's good if you can get some idea under what >circumstance the 'special' mechanism is invoked. > >And, I'm afraid, that's as far as I can go with my statistical knowledge, >and without subject-specific information. > >Again, good luck, >Richard Ristow >

_________________________________________________________________ MSN Messenger 7.5 is now out. Download it for FREE here. http://messenger.msn.co.uk


Back to: Top of message | Previous page | Main SPSSX-L page