LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (November 2003)Back to main SPSSX-L pageJoin or leave SPSSX-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Fri, 21 Nov 2003 17:55:38 -0600
Reply-To:     Paul R Swank <Paul.R.Swank@uth.tmc.edu>
Sender:       "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From:         Paul R Swank <Paul.R.Swank@uth.tmc.edu>
Subject:      Re: FW: Problems with cluster analysis
Comments: To: Sally Zitzer <sallyz@u.washington.edu>
In-Reply-To:  <Pine.A41.4.58.0311211455020.835830@mead12.u.washington.edu>
Content-Type: text/plain; charset="US-ASCII"

If you supply the original cluster centroids, then the solution will remain the same. By shuffling the data, there is a different set of initial centroids to start with. If the clusters are substantially different, then you may have a problem. It's like starting a maximum likelihood analysis with different starting values. You should end up at the same place if there is a real solution. Likewise, if there are clear clusters in the data then different starting paeeterns should result in the same clusters being identified. I usually run a hierarchical cluster first to determine the number of clusters and whether they are reasonable. Then I feed the final solution for the hierarchical clustering to then k-means and it reclusters using those centroids to start from. This allows entities whose clusters may change drastically after those entities are merged to migrate to a better cluster location.

Paul R. Swank, Ph.D. Professor, Developmental Pediatrics Medical School UT Health Science Center at Houston

-----Original Message----- From: SPSSX(r) Discussion [mailto:SPSSX-L@LISTSERV.UGA.EDU] On Behalf Of Sally Zitzer Sent: Friday, November 21, 2003 5:06 PM To: SPSSX-L@LISTSERV.UGA.EDU Subject: Re: FW: Problems with cluster analysis

Hi Sharon and group,

I just ran KMEANS cluster twice using the same data, and in between the runs I used SORT CASES to shuffle the data (sorted on a continuous variable). The results were certainly different. In the second KMEANS run 153 cases of 2440 ended up in different clusters than they were in in the first KMEANS run. (this was determined by running a CROSSTABS on the Cluster membership variable which I saved)

I have no idea WHY that is, but it's not the way I thought KMEANS cluster would work.

Sally in Seattle

On Wed, 19 Nov 2003, Sharon Morris wrote:

> Hi all, > > after playing with my data for nearly 3 hours (!), I think I MAY know > why I am having problems. Each time I run a cluster analysis, I use > the SPLIT FILE command to split the file into cluster groups so I can > profile them. When I do this, the cases in the datafile get resorted > by cluster membership. Does the order of cases affect the choice of > initial cluster centres, which in turn affects my final result? > > If so, should I be worried about the stability of my solution given > that it depends to some extent on the order the data is in in the > first place? > > Many thanks, > Sharon Morris > > -----Original Message----- > From: Sharon Morris [mailto:smorris@dbmcons.com.au] > Sent: Wednesday, 19 November 2003 3:21 PM > To: SPSSX-L@LISTSERV.UGA.EDU > Subject: Problems with cluster analysis > > > Dear listers, > > I have been using SPSS for quite a few years now, generally without > too many problems. However, about 3 months ago, I ran a segmentation > analysis in which I used cluster analysis. A month or so later, I > tried to replicate my clusters and could not do so, even though the > file had not been changed and my syntax was saved. I even went > through my syntax journal to ensure I wasn't doing anything wrong. > Eventually, I reconstructed my working file from the "original" file > (I always keep an unused original file), and was then able to > replicate the cluster solution. > > I am now doing a different study with cluster analysis, and once > again, the solution I produced two days ago can no longer be produced. > I have not changed the datafile. > > Is there any known problem with cluster analysis (I am using K Means)? > Has anyone else had this experience? How can I put any faith in my > cluster solutions if this keeps happening? > > Many thanks, > Dr Sharon Morris > Senior Project Director > DBM Consultants > Market Research Professionals > 5-7 Guest St > Hawthorn, Victoria, Australia 3122 > ph 61 3 8862 5524 >


Back to: Top of message | Previous page | Main SPSSX-L page