```Date: Thu, 26 Jul 2007 09:47:33 -0700 Reply-To: David L Cassell Sender: "SAS(r) Discussion" From: David L Cassell Subject: Re: Regression Skewed data! In-Reply-To: <1184599235.452054.54970@57g2000hsv.googlegroups.com> Content-Type: text/plain; format=flowed shawn.haskell@TTU.EDU wrote: > > It is true that normality of residuals is probably not an issue of >you have a large sample. However, to reduce influence of outliers and >help satisfy the assumption of homoscedasticity (consistent error >variance is pretty important for precision estimates) you may consider >a natural log transformation. Weighted regression is another >alternative. There is 1 more assumption not mentioned previosuly, and >that is independent errors or observations - that's what mixed models >are for. happy trails. SH There are several *different* components involved here. We assume things about outliers, leverage points, heteroskedasticity, etc. in order to get 'good' (or best) estimates of the parameters and their variances. We don't need the normality yet, except for assessing that 'best linear unbiased estimator' stuff. So we have to get that stuff straightened out first. [Actually we have to get things like multi-collinearity done first, so that we have stable point estimates.] When we want to do the hypthesis testing and confidence intervals, *then* we are down to normality of residuals. And I really do want (approximate) normality of residuals for the hypothesis tests, since we're making strong assumptions for some of those tests. If I don't have normality, I may need to perform the analyses using bootstrapping or randomization tests or something similar. But one of the annoying problems with 'skewed' data is that a skewed Y may not mean anything about the distribution of the residuals. It may also mean that we have outlier problems rather than 'non-normal residuals', or the problem might be contaminating distributions, or lots of other things. I find that I *rarely* can find a transform that solves all problems simultaneously for me. People say "I learned to take logs in grad school", but if taking a log fixes your behavior for Y, it may also mess up everything else we have to assume! Now if taking logs gives you a meaningful linear model, then that is entirely different. Subject-matter issues should take precedence. We can always handle stat details later. But we have to have meaningful interpretations when we are done, or what's the point of starting with the stats anyway? HTCT, David -- David L. Cassell mathematical statistician Design Pathways 3115 NW Norwood Pl. Corvallis OR 97330 _________________________________________________________________ http://imagine-windowslive.com/hotmail/?locale=en-us&ocid=TXT_TAGHM_migration_HM_mini_2G_0507 ```

Back to: Top of message | Previous page | Main SAS-L page