Quoting Mpundu Mukanga <email@example.com>:
> I dont if my questions make sense to you. Questions: Based on your
> experience and knowledge of statistics would you perform factor
> analysis on non-normal data such as likert type dat (ordinal data)
Yes, I would, and I often have for the sort of purpose that you have.
> My goal as mentioned before is
> enable me look at the data critically and identify factors which i
> can use to create subscales from a pool of items i have (likert
There are two ways of looking at factor analysis. The first is as a
data reduction method, and for this data distributions don't matter
much. You will want to extract as many factors as make practical sense,
i.e. not too many, and the ones that you extract should cover a
reasonable amount of the total variance.
This will enable you to identify and throw away some of your original
variables, while not throwing away too much of the total information.
There are other ways of doing this, such as repeated selection of
subsets using reliability analysis, and this usually results in pretty
much the same subsets that factor analysis can find for you with far
less work on your part.
What SPSS calls the "principal component" method (statisticians also
have another definition of "principal components") may be the best
method, but if many variables are associated with a particular factor,
then that factor becomes important. You can create a large factor by
putting many very similar questions into your test battery. An
alternative is the maximum likelihood method, which is not so much
influenced by the number of variables linked with a particular factor.
One of the odd things about ML is that although the definitions make a
lot of use of the idea of normality, SPSS produces a ML solution
regardless of your data distribution.
The second way of looking at factor analysis, using it as a statistical
technique, requires some assumptions about the data distributions. If
you want statistical measures and tests of the "true number" of
factors, or you want to know whether a particular factor loading is
signficant, then there are methods FOR NORMALLY DISTRIBUTED DATA which
will provide you with standard errors of estimates.
Many people use rules of thumb, such as keeping factors with
eigenvalues greater than one, and treating loadings of, for example,
0.4 and above as significant. If you have normally distributed data and
you do a proper STATISTICAL factor analysis, you will find that these
rules of thumb are very rough approximations.
If you really only want to find subscales, then rotation of the factors
may not be relevant. Rotations are only useful if you want to give some
kind of description to the factors which you have found. The original
unrotated solution is the neatest one in mathematical terms. If you
like to think in terms of your original variables rather than the neat
mathematical factors, then rotation may help - and varimax is as good
as any, for all but the expert.
I would advise the selection of your subsets of variables from an
unrotated solution. Having discarded some variables you might want to
repeat the factor analysis on the remaining variables, and then
rotation may help you to find names for your subscales, by seeing how
the factors are related to the original variables.