LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (April 2007)Back to main SPSSX-L pageJoin or leave SPSSX-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Wed, 4 Apr 2007 20:43:06 -0400
Reply-To:     Richard Ristow <wrristow@mindspring.com>
Sender:       "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From:         Richard Ristow <wrristow@mindspring.com>
Subject:      Re: Chi-squared and Chi-squared test for trend comparison
Comments: To: "Burleson,Joseph A." <burleson@up.uchc.edu>
In-Reply-To:  <F7558203F6DCE54D87688CE56F289A780DC3DD@itexcn01.uchc.net>
Content-Type: text/plain; charset="us-ascii"; format=flowed

At 03:18 PM 4/4/2007, Burleson,Joseph A. wrote:

>The last 6 large (n = 400 to 6,000) clinical trials I analyzed all had >age perfectly normally distributed (i.e., skewness between -.20 and >+.20).

Well... The skewness measure may not be conclusive. The skewness is zero for any symmetric distribution. That includes uniform distributions, or choosing each one of two values with probability 0.5, or any number of easy to construct long-tailed distributions.

I've no idea of the design of the studies you were on, but most clinical trials select an age range, explicitly or by implication. The population pyramid being fairly flat over much of its range, that tends toward a uniform, or nearly uniform, age distribution. If it's a very wide age range in an adult population, you'll probably see some upward skewing.

So, I might collect after all. Did you run a Kolmogorov-Smirnov, or other specific, test for normality?

(I might add that an approximately uniform age distribution will be just fine for analysis, and there was no need to go beyond the skewness check, for your purposes. The worst problem would be age outliers; people near the end of the observed age range have very different medical problems, of course. But the selection procedures surely excluded those.)

>I, too, have seen age not be normal (e.g., Poisson distributions, >U-shaped distributions, etc.). One cannot assume that it is one way or >the other for no specific reason.

No. On the other hand, the age distribution, whatever it is, is usually that way because of a selection criterion applied to an overall population pyramid, and a clear grasp of the explicit or implicit selection rule, is crucial.

(Stories: Like a study of at-risk - premature - neonates, that showed a strong negative correlation between birth weight, and gestational age at birth.)

>Sorry to be so nit-picky, but the Central Limit Theorem has nothing at >all to do with whether a population OR a single sample is normal or >non-normal.

Actually, I've often seen it argued, that it does. Admittedly the argument is a little hand-wavy, as it deals with effects that can only be hypothesized to exist.

>The CLT has to do with "sampling" distributions.

First, no; the CLT has to do with distributions of the sums (or means) of random variables; sampling distributions are one instance.

Now, bear with me, and I'm taking a point of view standard among probability theorists, but that often seems strange to statisticians: the observations are not selected from a 'population', considered as a finite, potentially identifiable set of subjects; but are drawn, generated, according to distribution and dependency rules.

Consider residuals, then - 'random variation' added to an underlying value that we actually want. (This is the standard premise of linear models.) Why would we remotely expect these to be normally distributed?

Here's the hand-wavy part: If there are actually many unobserved factors whose effects add to form the residuals, they are statistically independent, and their variances are comparable ("uniformly bounded" is the correct notion), then the hypotheses of the CLT apply, and we may with some justice expect approximately normal residuals.

This model, of residuals that are the sum of many small random effects, suggests a likely problem: what if they aren't all of comparable size? Indeed, one of the more common observed deviations from normal residuals, is long 'tails' - probability of very large residuals much greater than given by the normal distribution. That is what you get if you have one, or a few, influences that occur rarely but have high variance when they do occur.

This model also suggests circumstances where its unwise to expect normal residuals. For example, you've good hope that a scale made by summing Likert-scale responses will be something like normally distributed around its mean; but there's little chance that's true for a single Likert scale.

Which brings us back to age. Subject ages aren't 'generated'; subjects really are selected from a population with a known, usually nowhere-near-normal, distribution of ages. Further, the selection is almost always for a sub-range of the distribution.

It's hard to argue that the resulting distribution should be normal. Hard enough, that if I saw a normal distribution of ages in a study, I'd look skeptically at the selection criterion.

Now, an unskewed distribution, that I can readily believe. But I think it'll usually look much more like uniform than like normal.

I'm interested in your comments, and anybody's, on what age distributions are common in real studies.


Back to: Top of message | Previous page | Main SPSSX-L page