Date: Fri, 12 Jan 2007 16:38:48 0500
ReplyTo: Kevin Roland Viel <kviel@EMORY.EDU>
Sender: "SAS(r) Discussion" <SASL@LISTSERV.UGA.EDU>
From: Kevin Roland Viel <kviel@EMORY.EDU>
Subject: Re: normality of residuals: opinions?
InReplyTo: <Pine.A41.4.02.10701121500300.16604100000@unlunix.unl.edu>
ContentType: TEXT/PLAIN; charset=USASCII
On Fri, 12 Jan 2007, Robin High wrote:
> "To transform or not to transform" has many implications  esp. not
> knowing the data or the objectives  interpretation and how to
> backtransform, among them.
>
> A LOG seems a bit extreme here; perhaps a square root would be another
> choice. Of concern are the values of e around 7080; perhaps they are
> outliers that ROBUSTREG could be an alternative. And also the spike for e
> around 10  is there a clustering of values say at a boundary point?
Robin,
Too right. I should not have blindsided the list like that. We
measured the activity level of a plasma protein. The independent
variable of interest is a score from an instrument. I expect that with a
moderate sample size (200500) that the activity level would be suitably
normally distributed. As David points out, though, it is the
distribution of the residuals and not of the DP that is important (
e~N(0,sigma).
But your point brings up another question. What IF I know that my
residuals *are* normally distributed from many other investigations, but
for my current sample, this was not the case. Obviously, failure to meet
the assumptions could foul the model. Besides thoroughly investigating
potential violations, what might one do?
BTW, most of the IV's are quantitative (age, BMI, another protein level)
so any clustering is surprising, not that I conclude that it happened.
Thank you,
Kevin
Kevin Viel
PhD Candidate
Department of Epidemiology
Rollins School of Public Health
Emory University
Atlanta, GA 30322
