=========================================================================
Date: Mon, 31 Jul 2006 12:55:44 +0200
Reply-To: Spousta Jan <JSpousta@CSAS.CZ>
Sender: "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From: Spousta Jan <JSpousta@CSAS.CZ>
Subject: Re: Distance from cluster centre query.
Content-Type: text/plain; charset="US-ASCII"
Hi Mark,
While K-Means operates in a metric Euclidean space or something similar,
and therefore can easily define the centroids (and uses them during the
computing), the Hierarchical algorithm can be used in a more general
topological spaces where there are no well defined centroids. Imagine
clustering species; take a cluster {baboon, human, chimpanzee} - what is
the centroid here? Michael Jackson? Really hard to say. And that is
perhaps the reason why SPSS does not prompt you to save the
centroid-derived statistics.
Otherwise, if you think that they really do give a sense, you can
compute the centroid coordinates easily using Aggregate and add them to
the file. And then you can compute the distance case - centroid using
the familiar formula for the Euclidean distance.
Unfortunately, my SPSS 14 is broken now, so I will draft the example
syntax in SPSS 12 which is more cumbersome because of the lack of
ADDVARIABLES mode in Aggregate.
GET FILE='C:\Program Files\SPSS\Cars.sav'.
SELE IF nmiss(mpg to cylinder)=0 and uniform(1) < 0.2.
DESCRIPTIVES mpg to accel /SAVE.
CLUSTER Zmpg to Zaccel /SAVE CLUSTER(5).
*Save the coordinates of the centroids.
AGGREGATE /OUTF='C:\Program Files\SPSS/aggr.sav' /BREAK=CLU5_1
/Cmpg Cengine Chorse Cweight Caccel = MEAN(Zmpg Zengine Zhorse Zweight
Zaccel).
*Add them to the file.
SORT CASES BY CLU5_1 (A) .
MATCH FILES /FILE=* /TABLE='C:\Program Files\SPSS\aggr.sav' /BY CLU5_1.
exe.
*Compute the Euclidean distance case-centroid.
comp distance = 0.
do repe centr = Cmpg to Caccel /case = Zmpg to Zaccel.
- comp distance = distance + (centr-case)**2.
end repe.
comp distance = sqrt(distance).
var lab distance "Distance case-centroid".
exe.
*End of the example.
Greetings
Jan
-----Original Message-----
From: SPSSX(r) Discussion [mailto:SPSSX-L@LISTSERV.UGA.EDU] On Behalf Of
Mark Webb
Sent: Monday, July 31, 2006 7:43 AM
To: SPSSX-L@LISTSERV.UGA.EDU
Subject: Distance from cluster centre query.
In K Means it's possible to save this information as a variable.
Is this possible in any of the hierarchical methods offered in SPSS ?
They offer a proximity matrix - which I see as different - as this shows
distances between individual respondents NOT the classification mean.
Am I missing something ?