Date: Mon, 12 Sep 2005 14:59:38 0400
ReplyTo: Derek Wilkinson <dwilkinson@laurentian.ca>
Sender: "SPSSX(r) Discussion" <SPSSXL@LISTSERV.UGA.EDU>
From: Derek Wilkinson <dwilkinson@laurentian.ca>
Subject: Re: data transformation bibliografical sources
InReplyTo: <S163921AbVIKSae/20050911183044Z+165597@avasmr07.fibertel.com.ar>
ContentType: text/plain; charset="iso88591"
Jorge and Hector:
The most original and pervasive account is that by John Tukey entitled
Exploratory Data Analysis. The threevolume prepublication version had a
lot more than the final version published under that title. Some was
published in Mosteller & Tukey, Data Analysis and Regression: A second
course. EDA is a very quirky book but brilliant and if you are experienced,
you will find real gems therein.
Of the general stats books, John Fox has a very good treatment of
transformations, but his is pretty mathematical. Bonnie Erikson has a more
introductory version.
I need to disagree with two comments from Hector. For much of social science
there is no a priori meaningful scale so often transformed variables (if
they are increasing transformations) may have as much or more legitimacy as
the original. This is particularly true with income. How could Jorge have
the same error in his calculated income as Bill Gates does in his income?
Errors and misestimates are obviously related to size, ergo the necessity of
logging.
Second, there isn't always the possibility of finding an abstruse
mathematical formula (unless it's stochastic) to create normality. I have
had students (albeit without much background in math) try to transform
gender (M or F) into a normally distributed and symmetric variable. Square
roots and logarithms didn't work! Neither did anything else.
Cheers.
Derek
PS Samuel Leonhardt did a didactic workshop at an American Sociology
Association meeting twentysome years ago and lucidly introduced me and all
others who attended into the virtues of Exploratory Data Analysis and the
insights of John Tukey.
Original Message
From: SPSSX(r) Discussion [mailto:SPSSXL@LISTSERV.UGA.EDU] On Behalf Of
Hector Maletta
Sent: Sunday, September 11, 2005 2:30 PM
To: SPSSXL@LISTSERV.UGA.EDU
Subject: Re: data transformation bibliografical sources
Jorge,
Normality and homogeneous variance are possible attributes of your data, and
they may or may not have them. No data transformation by itself will give
them what they do not have.
You can of course transform your variables into something else that is more
similar to what you desire (e.g. the logarithm of a variable may have a
distribution that looks more "normal" than the original variable), and there
is always the possibility of finding a mathematical formula, however
abstruse, able to achieve that. But on scientific terms this would be
meaningless unless you have a theory whereby your variable behaves in ways
related to that particular mathematical function. For instance, if people
react more to the PROPORTION their incomes grow, than the AMOUNT of the
increase, and thus an additional $1000 means different things to a
billionnaire or to you and me, then the logarithm of income may find a place
in your analysis, because a certain difference in logarithms means a certain
proportional difference in the original variable. If you do not have theory
or evidence of this kind, using logarithms has as much sense as using, say,
the cosine or the cubic root or a 17th degree polynomial of your variable.
Besides, remember previous caveats in this forum to the effect that it is
not variables, but errors of estimation, that have to be normal, with
homogeneous variances, for standard statistical models (like regression) to
apply.
Hector
> Original Message
> From: SPSSX(r) Discussion [mailto:SPSSXL@LISTSERV.UGA.EDU]
> On Behalf Of Jorge Camacho
> Sent: Sunday, September 11, 2005 2:03 PM
> To: SPSSXL@LISTSERV.UGA.EDU
> Subject: data transformation bibliografical sources
>
> Dear All:
>
> I am loking for a good review or bibliografical source (in
> electronic format if possible) about data transformation in
> order to reach normallity, homogeneous variances etc. Most
> text books have very few pages on this. I would appreciate
> any supportt on this.
>
> Thanks in davance.
>
> Jorge
>
> 
> @@@@@@@@@@@@@@@@@@@@@@@@@@@@
> Jorge Camacho Sandoval, Ph. D.
> Bioestadística  Mejora Genética Animal
> P. O. Box 1960  4050, Alajuela, Costa Rica Tel. (506)4410487
> Fax. (506)4400575
> email: jcamacho@ice.co.cr or jorge.camacho.s@gmail.com
> @@@@@@@@@@@@@@@@@@@@@@@@@@@@
>
> __________ Información de NOD32 1.1213 (20050909) __________
>
> Este mensaje ha sido analizado con NOD32 Antivirus System
> http://www.nod32.com
>
>
