LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (September 2005)Back to main SPSSX-L pageJoin or leave SPSSX-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Wed, 28 Sep 2005 20:17:13 -0400
Reply-To:     Richard Ristow <wrristow@mindspring.com>
Sender:       "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From:         Richard Ristow <wrristow@mindspring.com>
Subject:      Re: data screening help
Comments: To: Kathryn Gardner <kjgardner10@hotmail.com>
In-Reply-To:  <BAY101-F41114E78C965382388CAB2A68D0@phx.gbl>
Content-Type: text/plain; charset="us-ascii"; format=flowed

Dear Kathryn,

At 04:58 PM 9/28/2005, Kathryn Gardner wrote:

>Thanks once again for taking the time to reply to my e-mail Richard, >your >help is much appreciated.

You're most welcome. No miracles, I'm afraid.

>I actually have skew, kurtosis, outliers etc on about 8 DVs and 3 IVs, >but I was actually under the impression that distributional >requirements applied to IVs as well.

I have always understood not. However, outlier cases on the DVs, like those on the IVs, can have greatly disproportionate 'leverage' on the results.

>By multivariate outliers I mean a case with a combination of extreme >values on two or more variables.

If you have 8 DVs, there are a few things you can look for. One is, do extreme values cluster among the DVs; that is, is there evidence for an underlying mechanism that produced outliers in several DVs?

By the way, how far out are they lying? Do they look like a low-frequency extension of the variables's general distribution? If so, that's a case for retaining them. Or, are they many SDs from the mean, looking completely isolated from what you'd consider the main body of the data? That could be a case for postulating a separate, low-frequency mechanism by which they arise.

In other words: what can you say about the structure of the outliers? And, what can you say might possibly account for them? I can't solve this, but you may be able to, knowing your study. It's certainly a question. The one thing you can't do is ignore them. Think of your report this way:

"This model explains xx% of the variance in the data, except on the y% of the cases where very large values are observed, which have been excluded from this analysis. No information is available on the mechanism causing these large values. If they are included, the model described explains zz% [probably much smaller] of the variance.

"Accordingly, we have fitted an alternative model, as described above but with all data included. It explains ww% of the overall variance; with the very large values exclude, it explains tt% of the variance in the remaining values." (And you'd discuss the differences in the models.)

This isn't a template. It's an illustration of what you get if you simply trim 'outliers'.

These are, by the way, mainly questions to guide you, and your colleagues and advisors. For us on the list to go much farther with them would be to go beyond general advice, to moving in on your study.

>When I said non-linearity I was referring to the idea that variables >that are part of one scale should usually (but not not always) have >linear relationships (or atleast this is whatI can gather from my >reading). I thought that linearity was an assumption of parametric >analyses.

It is, for scale-level independent variables. (For categorical independents, as in ANOVA, the question doesn't arise.) A non-linear relationship, if known, needs to be dealt with. (You were very non-specific in stating 'non-linearity'; it read as if you thought it was a property of single variables.)

Here, again, are question for you: What is the reason for thinking the relationship is non-linear? If it's based on theory, the theory may will suggest an appropriate transformation. If the evidence for non-linearity is from observation, it's standard to add non-linear terms, the square and perhaps the cube of your independent variable, to your model. Be careful! In many cases, a variable, its square, and its cube are very highly correlated. Get advice on the ways to work around this.

If tests on your model indicate that the non-linear terms should be included, you need to consider the implications in your discussion.

>You said that "the assumption is normality of the residuals, not of >the body of the data." I'm not too familar with residuals (not yet >anyway), but aren't residuals usually inspected post-analysis via >multiple regression? If so, this suggests that I can start my main >analyses now and then screen for normality later. But if I did this, >what would the implications be of finding residuals that suggested >non-normality?

I'm getting out of my depth here. Try, say, Hector Maletta directly. Briefly, as I wrote, the methods are mostly pretty robust against modest deviations from normality. I certainly wouldn't worry simply because the skewness or kurtosis statistics can be shown to be non-zero. Do worry about long 'tails' away from the center of the distribution - "outliers."

>I am happy not to transform or remove outliers, providing this still >means that I run analyses such as ANOVA and MR.

In brief: you can. Special characteristics of the data, notably outliers, will give you difficulties in interpretation, and you'll have to address those. But if you eliminate them by tranforms to make your data look 'normal', and arbitrarily removing large values before analysis, you'll give yourself interpretation problems that are just as bad, or worse.

There's no magic. If your data isn't simple, it isn't. I've given you questions for investigation, not answers that will solve it.


Back to: Top of message | Previous page | Main SPSSX-L page