LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (September 2005)Back to main SPSSX-L pageJoin or leave SPSSX-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Tue, 20 Sep 2005 19:27:05 -0400
Reply-To:     Chiao-Wen Hsiao <>
Sender:       "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From:         Chiao-Wen Hsiao <>
Subject:      Re: sample size for cluster analysis
Content-Type: text/plain; charset=US-ASCII

Hector: Thank you so much for the suggestions. I tried to run the analysis using different combination of variables. All the analyses showed that our sample fell in three groups, and the three groups have meaningful and distinct differences in the variables of interests. In this case, I suppose we can use almost all of the variables but will need to carefully choose which ones to be included based on the theroy.


>>> Hector Maletta <> 9/20/2005 5:17:33 PM >>> There is no specific rule for this. In linear regression usually a rule is circulated requiring at least 10-20 cases per variable. Based on this rule, you should use a maximum of 5 variables, possibly extensible to 10 variables.

But it all depends on the variability among your cases. If your 100 cases fall neatly within a few groups, and the variables are highly correlated among themselves, then you may use more variables and still get meaningful results (i.e. meaningful groups of cases). But if your cases are dispersed across all values and combinations of values of the various variables, you may as well form three clusters or thirty clusters, use four variables or forty variables...

The general objective of a cluster analysis is to construct a few groups or clusters that are (a) internally homogeneous and (b) clearly distinct from other groups. If the groups are more or less equally distributed all over the variable-space, many will fall in the "gray area", more or less at an equal distance from various cluster centers, and thus attributing those cases to one cluster or to another would be essentially arbitrary, and all solutions would be highly unstable (changing even slightly the value of a case in some of the variables would throw it into a different cluster). In that kind of situation, larger samples (and larger cases/variables ratios) would be needed.


> -----Original Message----- > From: Chiao-Wen Hsiao [] > Sent: Tuesday, September 20, 2005 5:42 PM > To: > Subject: RE: sample size for cluster analysis > > What is the general rule of thumb for determining sample size > in cluster analysis? Is there any books/articles out there > that you would recommend? Thanks. > > I am trying to reduce the number of variables based on our > theory. What would be the acceptable number? Is 12 variables > acceptable? > > Thank you so much! > > Joyce > > > > > > >>> "Hector Maletta" <> 9/20/2005 > 4:29:13 PM >>> > It is probably too small a sample, and probably (even for a > somewhat larger > sample) 20 is too many clustering variables. Most probably > some of the 20 are strongly correlated with other variables > in the set, and thus redundant. > Try to think of a few of the most essential variables you > want to classify cases by (perhaps one for each conceptual > dimension you are trying to cover), and run cluster analysis > with those variables only. > > Using factor analysis to reduce your 20 variables to a few > underlying factors, and then use the resulting factor scores > for the clustering, may also in theory be a solution, but 100 > cases are too few also for factor analysis of 20 variables. > > Hector > > > > > -----Original Message----- > > From: SPSSX(r) Discussion [mailto:SPSSX-L@LISTSERV.UGA.EDU] > On Behalf > > Of Chiao-Wen Hsiao > > Sent: Tuesday, September 20, 2005 5:02 PM > > To: SPSSX-L@LISTSERV.UGA.EDU > > Subject: sample size for cluster analysis > > > > Hi all, > > > > I am running a cluster analysis with 20 variables in a > sample of 100 > > participants. Is the sample size too small? Should I try to > reduce the > > number of variables? This is my first time running cluster > analysis. > > Any help would be greatly appreciated! Thank you!! > > > > Joyce > > > > __________ Informacisn de NOD32 1.1224 (20050920) __________ > > > > Este mensaje ha sido analizado con NOD32 Antivirus System > > > > > > > > > > __________ Informacisn de NOD32 1.1224 (20050920) __________ > > Este mensaje ha sido analizado con NOD32 Antivirus System > > >

Back to: Top of message | Previous page | Main SPSSX-L page