```========================================================================= Date: Mon, 31 Jul 2006 12:55:44 +0200 Reply-To: Spousta Jan Sender: "SPSSX(r) Discussion" From: Spousta Jan Subject: Re: Distance from cluster centre query. Comments: To: Mark Webb Content-Type: text/plain; charset="US-ASCII" Hi Mark, While K-Means operates in a metric Euclidean space or something similar, and therefore can easily define the centroids (and uses them during the computing), the Hierarchical algorithm can be used in a more general topological spaces where there are no well defined centroids. Imagine clustering species; take a cluster {baboon, human, chimpanzee} - what is the centroid here? Michael Jackson? Really hard to say. And that is perhaps the reason why SPSS does not prompt you to save the centroid-derived statistics. Otherwise, if you think that they really do give a sense, you can compute the centroid coordinates easily using Aggregate and add them to the file. And then you can compute the distance case - centroid using the familiar formula for the Euclidean distance. Unfortunately, my SPSS 14 is broken now, so I will draft the example syntax in SPSS 12 which is more cumbersome because of the lack of ADDVARIABLES mode in Aggregate. GET FILE='C:\Program Files\SPSS\Cars.sav'. SELE IF nmiss(mpg to cylinder)=0 and uniform(1) < 0.2. DESCRIPTIVES mpg to accel /SAVE. CLUSTER Zmpg to Zaccel /SAVE CLUSTER(5). *Save the coordinates of the centroids. AGGREGATE /OUTF='C:\Program Files\SPSS/aggr.sav' /BREAK=CLU5_1 /Cmpg Cengine Chorse Cweight Caccel = MEAN(Zmpg Zengine Zhorse Zweight Zaccel). *Add them to the file. SORT CASES BY CLU5_1 (A) . MATCH FILES /FILE=* /TABLE='C:\Program Files\SPSS\aggr.sav' /BY CLU5_1. exe. *Compute the Euclidean distance case-centroid. comp distance = 0. do repe centr = Cmpg to Caccel /case = Zmpg to Zaccel. - comp distance = distance + (centr-case)**2. end repe. comp distance = sqrt(distance). var lab distance "Distance case-centroid". exe. *End of the example. Greetings Jan -----Original Message----- From: SPSSX(r) Discussion [mailto:SPSSX-L@LISTSERV.UGA.EDU] On Behalf Of Mark Webb Sent: Monday, July 31, 2006 7:43 AM To: SPSSX-L@LISTSERV.UGA.EDU Subject: Distance from cluster centre query. In K Means it's possible to save this information as a variable. Is this possible in any of the hierarchical methods offered in SPSS ? They offer a proximity matrix - which I see as different - as this shows distances between individual respondents NOT the classification mean. Am I missing something ? ```

