Date: Thu, 10 Feb 2005 10:56:23 +0000 David Hitchin "SPSSX(r) Discussion" David Hitchin Re: factor analysis and your experience To: Mpundu Mukanga <20050210095944.75445.qmail@web53501.mail.yahoo.com> text/plain

Quoting Mpundu Mukanga <engineeringresearch3000@yahoo.com>:

> I dont if my questions make sense to you. Questions: Based on your > experience and knowledge of statistics would you perform factor > analysis on non-normal data such as likert type dat (ordinal data)

Yes, I would, and I often have for the sort of purpose that you have.

> My goal as mentioned before is > enable me look at the data critically and identify factors which i > can use to create subscales from a pool of items i have (likert > instrument).

There are two ways of looking at factor analysis. The first is as a data reduction method, and for this data distributions don't matter much. You will want to extract as many factors as make practical sense, i.e. not too many, and the ones that you extract should cover a reasonable amount of the total variance.

This will enable you to identify and throw away some of your original variables, while not throwing away too much of the total information. There are other ways of doing this, such as repeated selection of subsets using reliability analysis, and this usually results in pretty much the same subsets that factor analysis can find for you with far less work on your part.

What SPSS calls the "principal component" method (statisticians also have another definition of "principal components") may be the best method, but if many variables are associated with a particular factor, then that factor becomes important. You can create a large factor by putting many very similar questions into your test battery. An alternative is the maximum likelihood method, which is not so much influenced by the number of variables linked with a particular factor. One of the odd things about ML is that although the definitions make a lot of use of the idea of normality, SPSS produces a ML solution regardless of your data distribution.

The second way of looking at factor analysis, using it as a statistical technique, requires some assumptions about the data distributions. If you want statistical measures and tests of the "true number" of factors, or you want to know whether a particular factor loading is signficant, then there are methods FOR NORMALLY DISTRIBUTED DATA which will provide you with standard errors of estimates.

Many people use rules of thumb, such as keeping factors with eigenvalues greater than one, and treating loadings of, for example, 0.4 and above as significant. If you have normally distributed data and you do a proper STATISTICAL factor analysis, you will find that these rules of thumb are very rough approximations.

If you really only want to find subscales, then rotation of the factors may not be relevant. Rotations are only useful if you want to give some kind of description to the factors which you have found. The original unrotated solution is the neatest one in mathematical terms. If you like to think in terms of your original variables rather than the neat mathematical factors, then rotation may help - and varimax is as good as any, for all but the expert.

I would advise the selection of your subsets of variables from an unrotated solution. Having discarded some variables you might want to repeat the factor analysis on the remaining variables, and then rotation may help you to find names for your subscales, by seeing how the factors are related to the original variables.

David Hitchin

Back to: Top of message | Previous page | Main SPSSX-L page