Date: Fri, 21 Nov 2003 10:35:14 -0600
Reply-To: Anthony Babinec <email@example.com>
Sender: "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From: Anthony Babinec <firstname.lastname@example.org>
Subject: Re: Trouble with discriminant analysis
Content-Type: text/plain; charset="us-ascii"
I did not see any replies to your note.
Is there a precedent for analyzing your problem this way? I don't
know the literature in your field.
It sounds like you have 75 classifying variables, 5 groups, and
a sample size of 80. Discriminant analysis makes an assumption
of within-groups multivariate normality and homogeneity of
covariance matrices across groups. Discriminant analysis must
compute roughly (1/2)*75**2 covariances in a common covariance
matrix, and 5 times that for within-groups covariance matrices.
If any of your group sizes are less than 75, then the covariance
matrix for that group is necessarily singular. That would account
for the messages being reported with the Box M statistic. It sounds
like you have a situation where the ratio of sample size to number
of variables is not favorable, and therefore you risk overfitting
the training data in a way that does not validate.
From: SPSSX(r) Discussion [mailto:SPSSX-L@LISTSERV.UGA.EDU] On Behalf Of
Kristian K McIntyre
Sent: Wednesday, November 19, 2003 3:45 PM
Subject: Trouble with discriminant analysis
Perhaps, someone can help with this problem I'm having in discriminant
analysis. My data is comprised of bird species abundance values (75
species) collected at 80 sites, and each site was classified into one of 5
disturbance categories. Bird abundance was log(x+1) transformed. There are
unequal sample sizes of sites across the disturbance categories. I am using
discriminant analysis in SPSS to see which species of birds best
discriminate between the 5 disturbance categories (groups). I entered
the disturbance category as the grouping variable, and all species of birds
as 75 independent variables. I entered all independents together, requested
descriptives, Box's M, and unstandardized function coefficients and used a
within groups covariance matrix. I also requested a leave-one-out
1. I'm getting several error (?) messages in my Box's M test. Under the
rank column, it states that for each grouping, the rank is less than
whatever the sample size is for each group, and under the log determinant
column it states that "there were too few cases to be non-singular". I
did, however, get a value for "pooled within groups" for each column. Under
the Box's M test results it states "no test can be performed with fewer than
2 nonsingular group covariance matrices". What exactly does this mean and
how can I fix this? I've tried everything I could think of to correct this
with no luck. Apparently, I'm not understanding the problem.
2. The rest of the output looked somewhat reasonable until I got to the
classification results and the probabilities of group membership. I had a
100% of original group cases correctly classified and all probablilites for
group membership were 1.0. The cross validation indicated 20% of the
groups IDed correctly which given that by chance it is 20% since I have 5
categories. What's going on here? This is not right!
Any advice you can provide will be greatly appreciated. Many thank yous in
advance. Cheers, Kris
Kristian K. McIntyre