Date: Fri, 21 Nov 2003 17:55:38 -0600
Reply-To: Paul R Swank <Paul.R.Swank@uth.tmc.edu>
Sender: "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From: Paul R Swank <Paul.R.Swank@uth.tmc.edu>
Subject: Re: FW: Problems with cluster analysis
In-Reply-To: <Pine.A41.4.58.0311211455020.835830@mead12.u.washington.edu>
Content-Type: text/plain; charset="US-ASCII"
If you supply the original cluster centroids, then the solution will remain
the same. By shuffling the data, there is a different set of initial
centroids to start with. If the clusters are substantially different, then
you may have a problem. It's like starting a maximum likelihood analysis
with different starting values. You should end up at the same place if there
is a real solution. Likewise, if there are clear clusters in the data then
different starting paeeterns should result in the same clusters being
identified. I usually run a hierarchical cluster first to determine the
number of clusters and whether they are reasonable. Then I feed the final
solution for the hierarchical clustering to then k-means and it reclusters
using those centroids to start from. This allows entities whose clusters may
change drastically after those entities are merged to migrate to a better
cluster location.
Paul R. Swank, Ph.D.
Professor, Developmental Pediatrics
Medical School
UT Health Science Center at Houston
-----Original Message-----
From: SPSSX(r) Discussion [mailto:SPSSX-L@LISTSERV.UGA.EDU] On Behalf Of
Sally Zitzer
Sent: Friday, November 21, 2003 5:06 PM
To: SPSSX-L@LISTSERV.UGA.EDU
Subject: Re: FW: Problems with cluster analysis
Hi Sharon and group,
I just ran KMEANS cluster twice using the same data, and in between the
runs I used SORT CASES to shuffle the data (sorted on a continuous
variable). The results were certainly different. In the second KMEANS run
153 cases of 2440 ended up in different clusters than they were in in the
first KMEANS run. (this was determined by running a CROSSTABS on the
Cluster membership variable which I saved)
I have no idea WHY that is, but it's not the way I thought KMEANS cluster
would work.
Sally in Seattle
On Wed, 19 Nov 2003, Sharon Morris wrote:
> Hi all,
>
> after playing with my data for nearly 3 hours (!), I think I MAY know
> why I am having problems. Each time I run a cluster analysis, I use
> the SPLIT FILE command to split the file into cluster groups so I can
> profile them. When I do this, the cases in the datafile get resorted
> by cluster membership. Does the order of cases affect the choice of
> initial cluster centres, which in turn affects my final result?
>
> If so, should I be worried about the stability of my solution given
> that it depends to some extent on the order the data is in in the
> first place?
>
> Many thanks,
> Sharon Morris
>
> -----Original Message-----
> From: Sharon Morris [mailto:smorris@dbmcons.com.au]
> Sent: Wednesday, 19 November 2003 3:21 PM
> To: SPSSX-L@LISTSERV.UGA.EDU
> Subject: Problems with cluster analysis
>
>
> Dear listers,
>
> I have been using SPSS for quite a few years now, generally without
> too many problems. However, about 3 months ago, I ran a segmentation
> analysis in which I used cluster analysis. A month or so later, I
> tried to replicate my clusters and could not do so, even though the
> file had not been changed and my syntax was saved. I even went
> through my syntax journal to ensure I wasn't doing anything wrong.
> Eventually, I reconstructed my working file from the "original" file
> (I always keep an unused original file), and was then able to
> replicate the cluster solution.
>
> I am now doing a different study with cluster analysis, and once
> again, the solution I produced two days ago can no longer be produced.
> I have not changed the datafile.
>
> Is there any known problem with cluster analysis (I am using K Means)?
> Has anyone else had this experience? How can I put any faith in my
> cluster solutions if this keeps happening?
>
> Many thanks,
> Dr Sharon Morris
> Senior Project Director
> DBM Consultants
> Market Research Professionals
> 5-7 Guest St
> Hawthorn, Victoria, Australia 3122
> ph 61 3 8862 5524
>
|