Date: Wed, 13 Jan 2010 20:47:21 -0300
Reply-To: Hector Maletta <hmaletta@fibertel.com.ar>
Sender: "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From: Hector Maletta <hmaletta@fibertel.com.ar>
Subject: Re: Factor Analysis on dichotomous variables
In-Reply-To: <6F692813E2221C42B90DA62B4021FBD007DF7FB6@EXVS1.utm.edu>
Content-Type: text/plain; charset="us-ascii"
Angelina
The number of factors (or components) worth retaining largely depends on the
degree of linear correlation or association between the observed variables,
either dichotomous or otherwise. If all variables are highly correlated
among them, possibly one (or two) factors would explain most of the total or
common variance, regardless of the type of variable involved.
Besides, there is not a single unequivocal criterion to ascertain the number
of factors worth retaining, and much depends on the purpose of the analysis.
Sometimes you are after one factor only (which should explain a large
fraction of total variance), sometimes you look for various underlying
dimensions, either orthogonal to each other or correlated among them (this
latter case is obtained through oblique rotation).
The common criterion of using only factors with eigenvalue above 1, or using
the scree curve to identify the cutoff factor, are only rules of thumb that
not always are useful.
One has, besides, to understand that factors are mathematical constructs,
not real objects, and therefore one can heuristically select the most useful
variant. I am of course speaking of exploratory factor analysis. What is
called confirmatory factor analysis should more properly be treated as
structural equation models with latent variables. However, in my humble
opinion, these "confirmatory" analyses cannot "confirm" that the model is
right, nor "prove" causal links between variables. Factor analysis simply
replaces observed variables with a (possibly smaller) number of underlying
scales, all of which are linear functions of the observed variables.
Hector
-----Original Message-----
From: Angelina S. MacKewn [mailto:amackewn@utm.edu]
Sent: 13 January 2010 20:32
To: Hector Maletta
Subject: RE: Factor Analysis on dichotomous variables
Hector,
I have read the argument that dichotomous variables in a PCA produces too
many components? Do you think this is something that one would get nailed on
when we go to publish this?
Thanks for an answer I could understand. I am not a statistician, just a
researcher trying to write a paper.
Cheers,
Angie
-----Original Message-----
From: Hector Maletta [mailto:hmaletta@fibertel.com.ar]
Sent: Wed 1/13/2010 5:29 PM
To: Angelina S. MacKewn; SPSSX-L@LISTSERV.UGA.EDU
Subject: RE: Factor Analysis on dichotomous variables
Any factor analysis can be run on dichotomous variables, because these
variables can legitimately be considered as interval measures. As only one
interval is involved (from 0 to 1), there is no question of comparing
unequal intervals. Their mean is the proportion (p) of the value 1, and the
variance is p(1-p).
There is a specific SPSS procedure, CATPCA, for principal component analysis
of categorical variables (ordinal or nominal, any number of categories).
However, for dichotomous variables CATPCA gives the same solution as
classical Principal Components Analysis of interval variables (PCA is one of
the variants of factor analysis).
Purists insist that dichotomous variables cannot be used in anything related
to regression, because their residuals are not normally distributed. To see
this, one has to see that the predicted value for a dichotomous variable is
either a value between 0 and 1, or a value outside that interval. In the
first case, the actual values will be either 1 or 0, and the residuals would
therefore be piled at the ends of the 0,1 interval, and not around the
predicted value. In the second case, the residuals will all be at one side
of the predicted value. In any case, their distribution would not be normal.
However, dummy variables (i.e. variables with value 0 or 1) are routinely
used in regression. Factor analysis is a variant of linear regression (or,
more widely, a variant of the Generalized Linear Model) and therefore this
habitual use applies also to it.
-----Original Message-----
From: SPSSX(r) Discussion [mailto:SPSSX-L@LISTSERV.UGA.EDU] On Behalf Of
Angelina S. MacKewn
Sent: 13 January 2010 19:41
To: SPSSX-L@LISTSERV.UGA.EDU
Subject: Factor Analysis on dichotomous variables
What is the factor analysis (PCA) equivalent that can be run on dichotomous
variables. I have 50 exhibited behaviours (yes/no) that I want to factor
together. I have a sample size of about 500. I would be using SPSS and could
use syntax if it is available.
Thanks,
Angie
=====================
To manage your subscription to SPSSX-L, send a message to
LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
=====================
To manage your subscription to SPSSX-L, send a message to
LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
|