Date: Wed, 27 Jul 2005 19:43:53 -0300
Reply-To: Hector Maletta <firstname.lastname@example.org>
Sender: "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From: Hector Maletta <email@example.com>
Subject: Re: Factor analysis of binary variables
Content-Type: text/plain; charset="us-ascii"
Along this exchange you keep insisting on one idea I think is not applicable
to my case, and not valid in general, but only in a particular (though quite
common) implicit causal model. The idea I allude to is the idea that "> If
you find that the items ... group together onto more than one factor, then
using them as a single score is obviously invalid."
This idea stems from a definite causal model, in which several imperfect but
observable indicators are used to estimate the value of a a central but
unobservable factor. The underlying causal model is that every observable
variable or item is the effect of the underluing factor plus an error or
unique factor, and therefore the only reason why items correlate with one
another is because all reflect the same underlying factor. This factor is
modeled as a [linear] function of observable items, as in intelligence tests
where the unobservable Intelligence is a function of the number of items
responded correctly. Every IQ test problem is an imperfect indicator of the
This is quite useful an approach, and used widely to grasp unobservable
entities or constructs in psychology and other fields. But that is not the
only situation in which one tries to construct a single scale. To give a an
oversimple example: suppose people can get income from several sources:
salaries, share dividends, ownership rental, transfers from rich relatives,
bank account interest, and so on. Total income is, of course, the sum of all
these components giving all of them a weight of 1, because one dollar out of
salaries is worth as much as a dollar coming from rentals or dividends.
However, this does not mean that all these sources are correlated or "load
on the same factor". It may be that people earning salaries do not tend to
earn rentals or interest, and earning more in salaries does not imply
earning more on any other kind of income, but nonetheless (even in the
absence of any correlation, or with perfectly negative correlations, or
whatever) total income is still the sum of all these incomes. Total income
is still a unique scale measuring the total flow of income accruing to a
person from all sources, and it is obtained as a linear combination of all
partial incomes from the various sources, but this does not imply they load
on the same factor or are necessarily correlated.
Likewise, the standard of living may be conceived of as a combination or sum
total of many sources of well being (housing, sanitation, education, health
care, and so on) irrespective of the correlations of these dimensions among
them. If such a scale of well being or the standard of living can be somehow
constructed, it will represent the total well being accruing to a household
or person, from many sources, irrespective of correlations among these
sources. The sources will usually be correlated, admittedly, but that is not
essential to the problem.
If the various sources of income are all in US dollars, their aggregation
might be straightforward, dollar by dollar. But perhaps some weights are
needed. For instance, the various forms of income can have different degrees
of liquidity, or different degrees of certainty, or come denominated in
various currencies, or with different tax rates applicable to each, or
whatever other circumstances may mandate applying different weights to
dollars coming from the various sources. The same happens from the well
being coming from different sources, and more so because well being is not a
measurable and observable quantity like income but a non-observable entity
that should be measured in some ad hoc manner. Perhaps this clarifies the
issue and allows us to move on on my real question: how to establish the
weight of the various variables that I wish to use as indicators of
well-being or the standard of living.
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:SPSSX-L@LISTSERV.UGA.EDU]
> On Behalf Of Art Kendall
> Sent: Wednesday, July 27, 2005 6:55 PM
> To: SPSSX-L@LISTSERV.UGA.EDU
> Subject: Re: Factor analysis of binary variables
> In the approach I am suggesting you include in those you try,
> the decision about which items go on a factor is based on two
> things. 1) a Factor analysis (PA2 or Reliability) says that
> this set of items have something in common (redundant) that
> they are measuring that is distinct from what that set of
> items is measuring. 2) Using the cleanly loading items this
> set of items can be attributed a meaning that is distinct
> from the meaning attributed to that set set of items.
> If I factor liberalism, equalitarianism, and sexual freedom , etc.
> items, it can assure me that items that I thought go together
> conceptually go together in the real world. it might point
> out to me that certain items split onto more than one factor.
> E.g., endorsing equal rights for gay people would contain
> elements of equalitarianism and sexual freedom and therefore
> would not be a clean item and would lower divergent validity.
> Your application is more exploratory in that you would first
> see which items go together and then sees if their going
> together makes sense. (on the other hand, the items have more
> concrete meanings, and your data set is not a sample but a
> census.) What I am suggesting is seeing if
> the manner of creating factor scores makes a lot of difference. I am
> not suggesting ignoring the grouping of items found via factoring.
> If you find that the items on the UN list, or on your set of
> items group together onto more than one factor, then using
> them as a single score is obviously invalid.
> If you remove splitter items, and redo the factor analysis,
> you can get any of the more complicated forms of factor
> scores (even all 3 from SPSS).
> The importance of the items within a factor may contain a lot
> of noise.
> However, I would be surprised if the 5 kinds of scores
> behaved very differently. By that I mean something like
> this: Say that you have a thousand local jurisdictions. What
> is the maximum difference between the percentiles or ranks of
> any jurisdiction based on the different methods of deriving
> scores. If I break the scores into quintiles, and create a
> dot map, how different will my conclusions about where need
> is concentrated be?
> Aristotle In Ethics I.3, says, "we must be satisfied to
> indicate the truth with a rough and general sketch: when the
> subject and the basis of a discussion consist of matters
> which hold good only as a general rule, but not always, the
> conclusions reached must be of the same order. . . .
> For a well-schooled man is one who searches for that degree
> of precision in each kind of study which the nature of the
> subject at hand admits".
> Informal logic in the Aristotelian tradition calls the use of
> extra decimal places the "fallacy of precision". Many of us
> stat teachers call it the "sig-fig" or significant figures
> problem. In a way the tradition of unit weight for items in
> creating scales based on factor analysis was based on similar
> considerations. Which factor items loaded together was
> fairly stable across groups of subjects, but loading could vary across
> studies. Another way to look at unit weights for variables
> in a score,
> is that weights are rounded to the nearest integer.
> Social Research Consultants
> University Park, MD USA
> (301) 864-5570
> Hector Maletta wrote:
> > Art, one of the key issues is weighting. Are they all equally
> > important? Is a shower as important as a reliable safe water supply?
> > Your approach is quite common, but makes a number of people unhappy.
> > The United Nations has been using throughout the developing
> world an
> > indicator of "unmet basic needs" based on a list of essential
> > conditions to be met. A household is found lacking if it
> fails to meet
> > all of the chosen conditions, but there has been complaints
> for a long
> > time about the indicator being very rough and lacking
> > power, using arbitrary cutpoints (for instance, depending on your
> > definition, a household with 2.99 people per room is OK,
> but another
> > with 3.01 people per room has an unmet basic need for
> dwelling space)
> > and giving some indicators a disproportionate weight relative to
> > others. Some countries have tried to use arbitrary weights,
> but they
> > are, well, arbitrary and thus not quite satisfactory
> either. An index
> > based on some more sophisticated procedures such as FA or CFA would
> > assign different weight to the various indicators of the
> standard of
> > living. It would also allow for more flexibility regarding
> > and other matters.
> > Hector
> > From: Art Kendall [mailto:Art@DrKendall.org]
> > Sent: Wednesday, July 27, 2005 4:03 PM
> > To: Hector Maletta
> > Cc: SPSSX-L@LISTSERV.UGA.EDU
> > Subject: Re: Factor analysis of binary variables
> > Isn't that what you get when you simply sum across items? I.e.,
> > use the old-fashioned unit weights in creating scales based on a
> > factor analysis?
> >sanitation (water supply, sewage, WC facilities,etc) each is
> there or it isn't.
> >compute sanitation = water + sewage + WC + shower_tub + hot_water.
> >Families that have zero of these have less sanitation than
> families that have 1 of these and those that have one of
> these have less sanitation than those families that have two
> of these . . .
> >this is analogous to forming a score on a test for spelling
> where spelling one word correctly indicates less achievement
> in spelling than getting two words right.
> >To see what I am talking about, you might want to create 5
> variables. One by using unit weights on the raw items. One
> using unit weights and standardized items (Zs), and three
> using the three kinds of factor scores that SPSS creates.
> >What do the correlations of those 5 variables look like?
> >Social Research Consultants
> >University Park, MD USA
> >(301) 864-5570
> > Hector Maletta wrote:
> >> In response to Art Kendall.
> >> You wrote:
> >> Wrt your very first post. I'm still not clear on why you
> >> would want to deal with predicting an item within a measure
> >> and need to limit its prediction to zero-one (logistic)
> >> rather than having the constructed predicted
> variable have a
> >> wider range with two modes like in discriminant function
> >> analysis.
> >> Art,
> >> what I want is passing from my i9ndex as a function of factor
> >> scores, to the index as a function of observed variables.
> >> This is because for my purposes the extraction of
> factors and the
> >> estimation of factor scores are mere by products.
> >> My analysis centers on the estimation of eigenvalues
> and thus of
> >> the relative importance of variables to explain total variance.
> >> What I obtain from FA is the analysis of total variance into
> >> underlying factors' contributions, given by eigenvalues, but my
> >> real interest lies in seeing total variance explained
> by observed
> >> variables and their inter correlations.
> >> So I need passing from an index formed by adding up
> factor scores
> >> to one expressed as a function of observed variables alone.
> >> This can be done with factor scores coming from regular factor
> >> analysis, but I realize they depend on the notion that the
> >> variable value is a linear function of factors , which makes
> >> little sense with dichotomous items.
> >> Hector