Date:         Mon, 31 Jul 2006 12:55:44 +0200
Reply-To:     Spousta Jan <JSpousta@CSAS.CZ>
Sender:       "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From:         Spousta Jan <JSpousta@CSAS.CZ>
Subject:      Re: Distance from cluster centre query.
Mark Webb
Content-Type: text/plain; charset="US-ASCII"

Hi Mark,

While K-Means operates in a metric Euclidean space or something similar, and therefore can easily define the centroids (and uses them during the computing), the Hierarchical algorithm can be used in a more general topological spaces where there are no well defined centroids. Imagine clustering species; take a cluster {baboon, human, chimpanzee} - what is the centroid here? Michael Jackson? Really hard to say. And that is perhaps the reason why SPSS does not prompt you to save the centroid-derived statistics.

Otherwise, if you think that they really do give a sense, you can compute the centroid coordinates easily using Aggregate and add them to the file. And then you can compute the distance case - centroid using the familiar formula for the Euclidean distance.

Unfortunately, my SPSS 14 is broken now, so I will draft the example syntax in SPSS 12 which is more cumbersome because of the lack of ADDVARIABLES mode in Aggregate.

GET FILE='C:\Program Files\SPSS\Cars.sav'. SELE IF nmiss(mpg to cylinder)=0 and uniform(1) < 0.2. DESCRIPTIVES mpg to accel /SAVE. CLUSTER Zmpg to Zaccel /SAVE CLUSTER(5).

*Save the coordinates of the centroids. AGGREGATE /OUTF='C:\Program Files\SPSS/aggr.sav' /BREAK=CLU5_1 /Cmpg Cengine Chorse Cweight Caccel = MEAN(Zmpg Zengine Zhorse Zweight Zaccel).

*Add them to the file. SORT CASES BY CLU5_1 (A) . MATCH FILES /FILE=* /TABLE='C:\Program Files\SPSS\aggr.sav' /BY CLU5_1. exe.

*Compute the Euclidean distance case-centroid. comp distance = 0. do repe centr = Cmpg to Caccel /case = Zmpg to Zaccel. - comp distance = distance + (centr-case)**2. end repe. comp distance = sqrt(distance). var lab distance "Distance case-centroid". exe.

*End of the example.



From: Mark Webb
Sent: Monday, July 31, 2006 7:43 AM
Subject: Distance from cluster centre query.

In K Means it's possible to save this information as a variable. Is this possible in any of the hierarchical methods offered in SPSS ? They offer a proximity matrix - which I see as different - as this shows distances between individual respondents NOT the classification mean. Am I missing something ?

