```Date: Fri, 21 Nov 2003 10:35:14 -0600 Reply-To: Anthony Babinec Sender: "SPSSX(r) Discussion" From: Anthony Babinec Subject: Re: Trouble with discriminant analysis Comments: To: Kristian K McIntyre In-Reply-To: Content-Type: text/plain; charset="us-ascii" I did not see any replies to your note. Is there a precedent for analyzing your problem this way? I don't know the literature in your field. It sounds like you have 75 classifying variables, 5 groups, and a sample size of 80. Discriminant analysis makes an assumption of within-groups multivariate normality and homogeneity of covariance matrices across groups. Discriminant analysis must compute roughly (1/2)*75**2 covariances in a common covariance matrix, and 5 times that for within-groups covariance matrices. If any of your group sizes are less than 75, then the covariance matrix for that group is necessarily singular. That would account for the messages being reported with the Box M statistic. It sounds like you have a situation where the ratio of sample size to number of variables is not favorable, and therefore you risk overfitting the training data in a way that does not validate. Anthony Babinec -----Original Message----- From: SPSSX(r) Discussion [mailto:SPSSX-L@LISTSERV.UGA.EDU] On Behalf Of Kristian K McIntyre Sent: Wednesday, November 19, 2003 3:45 PM To: SPSSX-L@LISTSERV.UGA.EDU Subject: Trouble with discriminant analysis Fellow SPSSers, Perhaps, someone can help with this problem I'm having in discriminant analysis. My data is comprised of bird species abundance values (75 species) collected at 80 sites, and each site was classified into one of 5 disturbance categories. Bird abundance was log(x+1) transformed. There are unequal sample sizes of sites across the disturbance categories. I am using discriminant analysis in SPSS to see which species of birds best discriminate between the 5 disturbance categories (groups). I entered the disturbance category as the grouping variable, and all species of birds as 75 independent variables. I entered all independents together, requested descriptives, Box's M, and unstandardized function coefficients and used a within groups covariance matrix. I also requested a leave-one-out classification. My problems: 1. I'm getting several error (?) messages in my Box's M test. Under the rank column, it states that for each grouping, the rank is less than whatever the sample size is for each group, and under the log determinant column it states that "there were too few cases to be non-singular". I did, however, get a value for "pooled within groups" for each column. Under the Box's M test results it states "no test can be performed with fewer than 2 nonsingular group covariance matrices". What exactly does this mean and how can I fix this? I've tried everything I could think of to correct this with no luck. Apparently, I'm not understanding the problem. 2. The rest of the output looked somewhat reasonable until I got to the classification results and the probabilities of group membership. I had a 100% of original group cases correctly classified and all probablilites for group membership were 1.0. The cross validation indicated 20% of the groups IDed correctly which given that by chance it is 20% since I have 5 categories. What's going on here? This is not right! Any advice you can provide will be greatly appreciated. Many thank yous in advance. Cheers, Kris >>>>>>>>>>>>>>>>> Kristian K. McIntyre Wildlife Biologist kkmcintyre@fs.fed.us <<<<<<<<<<<<<<<<<< ```

Back to: Top of message | Previous page | Main SPSSX-L page