Date: Fri, 12 Jan 2007 18:46:00 -0500
Reply-To: Peter Flom <flom@NDRI.ORG>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Peter Flom <flom@NDRI.ORG>
Subject: Re: normality of residuals: opinions?
Content-Type: text/plain; charset=US-ASCII
Peter L. Flom, PhD
Assistant Director, Statistics and Data Analysis Core
Center for Drug Use and HIV Research
National Development and Research Institutes
71 W. 23rd St
http://cduhr.ndri.org
www.peterflom.com
New York, NY 10010
(212) 845-4485 (voice)
(917) 438-0894 (fax)
>>> Kevin Roland Viel <kviel@EMORY.EDU> 01/12/07 4:38 PM >>> wrote
<<<
Too right. I should not have blind-sided the list like that. We
measured the activity level of a plasma protein. The independent
variable of interest is a score from an instrument. I expect that with
a
moderate sample size (200-500) that the activity level would be suitably
normally distributed. As David points out, though, it is the
distribution of the residuals and not of the DP that is important (
e~N(0,sigma).
But your point brings up another question. What IF I know that my
residuals *are* normally distributed from many other investigations, but
for my current sample, this was not the case. Obviously, failure to
meet
the assumptions could foul the model. Besides thoroughly investigating
potential violations, what might one do?
BTW, most of the IV's are quantitative (age, BMI, another protein
level)
so any clustering is surprising, not that I conclude that it happened.
>>>
Kevin
First, although it may seem like picking nits, you cannot know that the
residuals
in YOUR data are normally distributed from the results of runs on OTHER
data. I actually
don't think it's picking nits at all. If there is something in your
data that isn't present in other,
similar samples then either
1) You got unlucky. Hey, it happens. Once in a while, a random sample
will include some strange data.
or
2) You've discovered something really itneresting. This is an
'geee......that's funny' moment, and that is
the sort of moment that starts big discoceries.
Second, clustering is always possible with numerical data. How did you
get BMI? If you asked people to
tell their weight and height, then I would bet dollars to donuts that
the data ARE clustered. A lot more
people report (say) 180 pounds than 179 or 181.
How were protein levels recorded? (I have no idea how this is done, but
if a human has to read some instrument,
I bet there's clumping).
HTH
Peter
|