**=========================================================================
****Date:** Mon, 31 Jul 2006 13:27:04 +0200
**Reply-To:** Mark Webb <targetlk@iafrica.com>
**Sender:** "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
**From:** Mark Webb <targetlk@iafrica.com>
**Subject:** Re: Distance from cluster centre query.
**Content-Type:** text/plain; format=flowed; charset="iso-8859-1";
reply-type=original
Thanks for this Jan.
I may well use your suggestion & compute the centroids BUT would like to
discuss the idea of a cluster centroid in the context of what I'm trying to
do.
I'm finding that discriminant analysis [DA] based on clusters[dep var] & the
statements used to make the clusters [indep vars] are not working well in
practice.
I would like to remove "weakly"associated respondents from each clusters and
put them into an additional cluster representing "unclassifiable".
I was hoping to define these weak respondents by using the distance from
centriod idea but I use Hierarchical methods [Wards] most often - hence my
initial querry.
Do you think what I'm suggesting is feasible ?
I would then run DA on the original clusters plus 1.

Regards

Mark

----- Original Message -----
From: "Spousta Jan" <JSpousta@CSAS.CZ>
To: "Mark Webb" <targetlk@iafrica.com>; <SPSSX-L@LISTSERV.UGA.EDU>
Sent: Monday, July 31, 2006 12:55 PM
Subject: RE: Distance from cluster centre query.

Hi Mark,

While K-Means operates in a metric Euclidean space or something similar,
and therefore can easily define the centroids (and uses them during the
computing), the Hierarchical algorithm can be used in a more general
topological spaces where there are no well defined centroids. Imagine
clustering species; take a cluster {baboon, human, chimpanzee} - what is
the centroid here? Michael Jackson? Really hard to say. And that is
perhaps the reason why SPSS does not prompt you to save the
centroid-derived statistics.

Otherwise, if you think that they really do give a sense, you can
compute the centroid coordinates easily using Aggregate and add them to
the file. And then you can compute the distance case - centroid using
the familiar formula for the Euclidean distance.

Unfortunately, my SPSS 14 is broken now, so I will draft the example
syntax in SPSS 12 which is more cumbersome because of the lack of
ADDVARIABLES mode in Aggregate.

GET FILE='C:\Program Files\SPSS\Cars.sav'.
SELE IF nmiss(mpg to cylinder)=0 and uniform(1) < 0.2.
DESCRIPTIVES mpg to accel /SAVE.
CLUSTER Zmpg to Zaccel /SAVE CLUSTER(5).

*Save the coordinates of the centroids.
AGGREGATE /OUTF='C:\Program Files\SPSS/aggr.sav' /BREAK=CLU5_1
/Cmpg Cengine Chorse Cweight Caccel = MEAN(Zmpg Zengine Zhorse Zweight
Zaccel).

*Add them to the file.
SORT CASES BY CLU5_1 (A) .
MATCH FILES /FILE=* /TABLE='C:\Program Files\SPSS\aggr.sav' /BY CLU5_1.
exe.

*Compute the Euclidean distance case-centroid.
comp distance = 0.
do repe centr = Cmpg to Caccel /case = Zmpg to Zaccel.
- comp distance = distance + (centr-case)**2.
end repe.
comp distance = sqrt(distance).
var lab distance "Distance case-centroid".
exe.

*End of the example.

Greetings

Jan

-----Original Message-----
From: SPSSX(r) Discussion [mailto:SPSSX-L@LISTSERV.UGA.EDU] On Behalf Of
Mark Webb
Sent: Monday, July 31, 2006 7:43 AM
To: SPSSX-L@LISTSERV.UGA.EDU
Subject: Distance from cluster centre query.

In K Means it's possible to save this information as a variable.
Is this possible in any of the hierarchical methods offered in SPSS ?
They offer a proximity matrix - which I see as different - as this shows
distances between individual respondents NOT the classification mean.
Am I missing something ?

Regards

__________ NOD32 1.1684 (20060729) Information __________

This message was checked by NOD32 antivirus system.