Date: Wed, 19 Nov 2003 10:03:05 +0100
Reply-To: Spousta Jan <JSpousta@CSAS.CZ>
Sender: "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From: Spousta Jan <JSpousta@CSAS.CZ>
Subject: Re: Problems with cluster analysis
Content-Type: text/plain; charset="iso-8859-1"
Hi Sharon,
I do not know the details of the SPSS implementation, but usually K-means algorithms begin with some random starting points as initial centroids. E.g. they can take first K cases as the starting points. Therefore you can get different segmentations on the same file, if you have different initial values and/or sorting.
SPSS allows you to set the initial values: "Centers -> Read initial from" - it could help you in this case.
But if you get _very_ different solutions every time you run K-means, may be that your data is not suitable for clustering or at least for the number of clusters you have choosen.
Greetings
Jan
-----Original Message-----
From: Sharon Morris [mailto:smorris@dbmcons.com.au]
Sent: Wednesday, November 19, 2003 6:37 AM
To: SPSSX-L@LISTSERV.UGA.EDU
Subject: FW: Problems with cluster analysis [faked-from][mx]
Hi all,
after playing with my data for nearly 3 hours (!), I think I MAY know why I
am having problems. Each time I run a cluster analysis, I use the SPLIT
FILE command to split the file into cluster groups so I can profile them.
When I do this, the cases in the datafile get resorted by cluster
membership. Does the order of cases affect the choice of initial cluster
centres, which in turn affects my final result?
If so, should I be worried about the stability of my solution given that it
depends to some extent on the order the data is in in the first place?
Many thanks,
Sharon Morris
-----Original Message-----
From: Sharon Morris [mailto:smorris@dbmcons.com.au]
Sent: Wednesday, 19 November 2003 3:21 PM
To: SPSSX-L@LISTSERV.UGA.EDU
Subject: Problems with cluster analysis
Dear listers,
I have been using SPSS for quite a few years now, generally without too many
problems. However, about 3 months ago, I ran a segmentation analysis in
which I used cluster analysis. A month or so later, I tried to replicate my
clusters and could not do so, even though the file had not been changed and
my syntax was saved. I even went through my syntax journal to ensure I
wasn't doing anything wrong. Eventually, I reconstructed my working file
from the "original" file (I always keep an unused original file), and was
then able to replicate the cluster solution.
I am now doing a different study with cluster analysis, and once again, the
solution I produced two days ago can no longer be produced. I have not
changed the datafile.
Is there any known problem with cluster analysis (I am using K Means)? Has
anyone else had this experience? How can I put any faith in my cluster
solutions if this keeps happening?
Many thanks,
Dr Sharon Morris
Senior Project Director
DBM Consultants
Market Research Professionals
5-7 Guest St
Hawthorn, Victoria, Australia 3122
ph 61 3 8862 5524
|