LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous (more recent) messageNext (less recent) messagePrevious (more recent) in topicNext (less recent) in topicPrevious (more recent) by same authorNext (less recent) by same authorPrevious page (November 2003)Back to main SPSSX-L pageJoin or leave SPSSX-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Wed, 19 Nov 2003 10:03:05 +0100
Reply-To:     Spousta Jan <JSpousta@CSAS.CZ>
Sender:       "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From:         Spousta Jan <JSpousta@CSAS.CZ>
Subject:      Re: Problems with cluster analysis
Content-Type: text/plain; charset="iso-8859-1"

Hi Sharon, I do not know the details of the SPSS implementation, but usually K-means algorithms begin with some random starting points as initial centroids. E.g. they can take first K cases as the starting points. Therefore you can get different segmentations on the same file, if you have different initial values and/or sorting. SPSS allows you to set the initial values: "Centers -> Read initial from" - it could help you in this case. But if you get _very_ different solutions every time you run K-means, may be that your data is not suitable for clustering or at least for the number of clusters you have choosen. Greetings Jan

-----Original Message----- From: Sharon Morris [mailto:smorris@dbmcons.com.au] Sent: Wednesday, November 19, 2003 6:37 AM To: SPSSX-L@LISTSERV.UGA.EDU Subject: FW: Problems with cluster analysis [faked-from][mx]

Hi all,

after playing with my data for nearly 3 hours (!), I think I MAY know why I am having problems. Each time I run a cluster analysis, I use the SPLIT FILE command to split the file into cluster groups so I can profile them. When I do this, the cases in the datafile get resorted by cluster membership. Does the order of cases affect the choice of initial cluster centres, which in turn affects my final result?

If so, should I be worried about the stability of my solution given that it depends to some extent on the order the data is in in the first place?

Many thanks, Sharon Morris

-----Original Message----- From: Sharon Morris [mailto:smorris@dbmcons.com.au] Sent: Wednesday, 19 November 2003 3:21 PM To: SPSSX-L@LISTSERV.UGA.EDU Subject: Problems with cluster analysis

Dear listers,

I have been using SPSS for quite a few years now, generally without too many problems. However, about 3 months ago, I ran a segmentation analysis in which I used cluster analysis. A month or so later, I tried to replicate my clusters and could not do so, even though the file had not been changed and my syntax was saved. I even went through my syntax journal to ensure I wasn't doing anything wrong. Eventually, I reconstructed my working file from the "original" file (I always keep an unused original file), and was then able to replicate the cluster solution.

I am now doing a different study with cluster analysis, and once again, the solution I produced two days ago can no longer be produced. I have not changed the datafile.

Is there any known problem with cluster analysis (I am using K Means)? Has anyone else had this experience? How can I put any faith in my cluster solutions if this keeps happening?

Many thanks, Dr Sharon Morris Senior Project Director DBM Consultants Market Research Professionals 5-7 Guest St Hawthorn, Victoria, Australia 3122 ph 61 3 8862 5524


Back to: Top of message | Previous page | Main SPSSX-L page