Date: Wed, 28 Sep 2005 08:31:07 0400
ReplyTo: Art@DrKendall.org
Sender: "SPSSX(r) Discussion" <SPSSXL@LISTSERV.UGA.EDU>
From: Art Kendall <Art@DrKendall.org>
Organization: Social Research Consultants
Subject: Re: Logarithmic transformation of not normal data
InReplyTo: <S114778AbVIOLde/20050915113337Z+11165@avasmr07.fibertel.com.ar>
ContentType: text/plain; charset=usascii; format=flowed
Depending on the number of cases you have and the subject matter area, a
multiple correlation of .55 (r**2= .3) could be suspiciously high.
What are your variables? how are they measured?
How many cases do you have? How were they selected?
Art
Art@DrKendall.org
Social Research Consultants
University Park, MD USA Inside the Washington, DC beltway.
(301) 8645570
Hector Maletta wrote:
>Razan,
>see my comments below.
>Hector
>
>
> _____
>
>From: Razan Mikwar [mailto:razan_mikwar@yahoo.com]
>Sent: Thursday, September 15, 2005 2:30 AM
>To: Hector Maletta
>Subject: RE: Logarithmic transformation of not normal data
>
>
>Hi Mr.Hector,
>
>First of all thank you very much for your quick response.
>Secondly:
>1I don't want high correlation coefficient what I need to make it higher is
>the coefficient of determination(R squre), and about residuals I've already
>tested there normality and they are normal.
>
>R2 is the squared correlation coefficient, so both are essentially the same.
>If residuals are normal, nothing is necessary to get more normal residuals
>such as a log transformtion.
>
>2I don't know what do you mean by the 2nd point but I've tested that there
>is no correlation between independent variables i.e there is no
>multicollinearity, and the scatter between the DV and each IV is not u
>shaped.
>
>What I mean in my second point is that a low R or R2 may be due to either:
>the absence of any relationship between your DV and the set of IV, or the
>presence of a relationship that is not linear. This can be ascertained by
>plotting predicted and observed values. A formless cloud is the first case,
>a regular but not linear shape, e.g. a cloud in the shape of an U, is the
>second case. In the latter situation you may transform some of the variables
>to get a linear, instead of nonlinear relationship, or you may try
>nonlinear regression or curve fitting.
>
>3 & 4 I'm trying hardly not to another model other than linear in order not
>to test another assumptions that's why I'm trying to find a way to solve the
>problem,Moreover Idon't know how to detect which model that would fit.
>
>Models are based on theory. Trying blindly anything that fits is not good
>advice.
>
>5As I mentioned before I've tested collinearity but there is only one
>assumption that I wasn't able to test is that residuals and independent
>variables are independent from each other because I don't have the residuals
>as separated variable.
>
>Collinearity might have been one problem, but you evidently do not have it.
>Perhaps it is simply that your IV do not predict the DV well. That happens.
>
>
>Razan
>
>
>Razan,
>
>1. Your variables do not need to be normally distributed in order to use
>regression, and even less so in order to get high correlation coefficient.
>You are confused by the fact that linear regression requires that residuals,
>i.e. random errors of prediction (difference between predicted and observed
>values) have a normal distribution both sides of the regression line.
>
>2. A low or near zero linear [multiple] correlation coefficient may be due
>to (a) the absence of any systematic relationship between your IV and DV, or
>(b) the existence of a relationship which is non linear. As an example of
>(b), if your scatterplot shows a cloud of points with the shape of a U,
>there would be possibly a quadratic relationship but the linear coefficient
>may be zero.
>
>3. The method of least squares to estimate regression functions is based on
>the assumption of a linear relationship between the variables involved. When
>the relationship is not linear there are two ways to go: (i) identify the
>nonlinear function linking the variables, and transform it in some way that
>yields a linear function, then apply least squares linear regression; or (b)
>approximate a non linear function by means of nonlinear regression or
>curvefitting, which do not use the least squares algorithm. Some non linear
>functions are amenable to linearization, some are not. For instance, a
>quadratic equation like y=a+bX+cX^2 can be linearized if you define a new
>variable Z=X^2, and use the linear equation y=a+bX+cZ; likewise the equation
>y=aX^b can be linearized by taking logarithms as log y=log a + b(log X).
>
>4. The fact that a certain mathematical function fits your data is no great
>deal. You can always find some function that does that. The trick is finding
>a function for which you have a theoretical explanation. So it is not
>advisable to go around blindly trying different mathematical functions until
>any of them "fits". In fact, you may find several, perhaps an infinite
>number of functions that reasonably fit the data, and that is arguably worse
>than not having any.
>
>5. If no reasonable function fits the shape of the data, perhaps your data
>just show little relationship at all between the variables...
>
>Hector
>
>
>
>
>
>>Original Message
>>
>>
>
>
>
>>From: SPSSX(r) Discussion [ <mailto:SPSSXL@LISTSERV.UGA.EDU>
>>
>>
>mailto:SPSSXL@LISTSERV.UGA.EDU] On Behalf
>
>
>
>>Of Razan
>>
>>
>
>
>
>>Sent: Monday, September 12, 2005 11:04 PM
>>
>>
>
>
>
>>To: SPSSXL@LISTSERV.UGA.EDU
>>
>>
>
>
>
>>Subject: Logarithmic transformation of not normal data
>>
>>
>
>
>
>
>
>
>>Hi,
>>
>>
>
>
>
>
>
>
>>I've made a multiple linear regression using SPSS by one dependent
>>
>>
>
>
>
>>variable and two indepent variables and all assumptions were satisfied
>>
>>
>
>
>
>>but R squre is very low about 0.3,so I think that is because my
>>
>>
>
>
>
>>variable are not normally distributed that's why I was thinking about
>>
>>
>
>
>
>>transforming my data uasing logarithmic transformation to normal
>>
>>
>
>
>
>>distributio and repeat the regression,but I don't know how to
>>
>>
>
>
>
>>transform them?
>>
>>
>
>
>
>>and do I have to test any other assumptions after applying the
>>
>>
>
>
>
>>transformation?]
>>
>>
>
>
>
>
>
>
>>Thanks
>>
>>
>
>
>
>
