=========================================================================
Date: Mon, 31 Jul 2006 10:46:34 -0400
Reply-To: Steve Peck <link@umich.edu>
Sender: "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From: Steve Peck <link@umich.edu>
Subject: Re: Distance from cluster centre query.
In-Reply-To: <006201c6b494$43d50030$0400000a@Work>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
a methodological framework designed to handle these (and many other
related) issues can be found here:
http://www.psychology.su.se/sleipner/
(e.g., you can remove multivariate outliers prior to clustering and
work directly with the centroids after clustering)
Mark Webb wrote:
> Thanks for this Jan.
> I may well use your suggestion & compute the centroids BUT would like to
> discuss the idea of a cluster centroid in the context of what I'm
> trying to
> do.
> I'm finding that discriminant analysis [DA] based on clusters[dep var]
> & the
> statements used to make the clusters [indep vars] are not working well in
> practice.
> I would like to remove "weakly"associated respondents from each
> clusters and
> put them into an additional cluster representing "unclassifiable".
> I was hoping to define these weak respondents by using the distance from
> centriod idea but I use Hierarchical methods [Wards] most often -
> hence my
> initial querry.
> Do you think what I'm suggesting is feasible ?
> I would then run DA on the original clusters plus 1.
>
> Regards
>
> Mark
>
>
> ----- Original Message -----
> From: "Spousta Jan" <JSpousta@CSAS.CZ>
> To: "Mark Webb" <targetlk@iafrica.com>; <SPSSX-L@LISTSERV.UGA.EDU>
> Sent: Monday, July 31, 2006 12:55 PM
> Subject: RE: Distance from cluster centre query.
>
>
> Hi Mark,
>
> While K-Means operates in a metric Euclidean space or something similar,
> and therefore can easily define the centroids (and uses them during the
> computing), the Hierarchical algorithm can be used in a more general
> topological spaces where there are no well defined centroids. Imagine
> clustering species; take a cluster {baboon, human, chimpanzee} - what is
> the centroid here? Michael Jackson? Really hard to say. And that is
> perhaps the reason why SPSS does not prompt you to save the
> centroid-derived statistics.
>
> Otherwise, if you think that they really do give a sense, you can
> compute the centroid coordinates easily using Aggregate and add them to
> the file. And then you can compute the distance case - centroid using
> the familiar formula for the Euclidean distance.
>
> Unfortunately, my SPSS 14 is broken now, so I will draft the example
> syntax in SPSS 12 which is more cumbersome because of the lack of
> ADDVARIABLES mode in Aggregate.
>
> GET FILE='C:\Program Files\SPSS\Cars.sav'.
> SELE IF nmiss(mpg to cylinder)=0 and uniform(1) < 0.2.
> DESCRIPTIVES mpg to accel /SAVE.
> CLUSTER Zmpg to Zaccel /SAVE CLUSTER(5).
>
> *Save the coordinates of the centroids.
> AGGREGATE /OUTF='C:\Program Files\SPSS/aggr.sav' /BREAK=CLU5_1
> /Cmpg Cengine Chorse Cweight Caccel = MEAN(Zmpg Zengine Zhorse Zweight
> Zaccel).
>
> *Add them to the file.
> SORT CASES BY CLU5_1 (A) .
> MATCH FILES /FILE=* /TABLE='C:\Program Files\SPSS\aggr.sav' /BY CLU5_1.
> exe.
>
> *Compute the Euclidean distance case-centroid.
> comp distance = 0.
> do repe centr = Cmpg to Caccel /case = Zmpg to Zaccel.
> - comp distance = distance + (centr-case)**2.
> end repe.
> comp distance = sqrt(distance).
> var lab distance "Distance case-centroid".
> exe.
>
> *End of the example.
>
> Greetings
>
> Jan
>
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:SPSSX-L@LISTSERV.UGA.EDU] On Behalf Of
> Mark Webb
> Sent: Monday, July 31, 2006 7:43 AM
> To: SPSSX-L@LISTSERV.UGA.EDU
> Subject: Distance from cluster centre query.
>
> In K Means it's possible to save this information as a variable.
> Is this possible in any of the hierarchical methods offered in SPSS ?
> They offer a proximity matrix - which I see as different - as this shows
> distances between individual respondents NOT the classification mean.
> Am I missing something ?
>
> Regards
>
> __________ NOD32 1.1684 (20060729) Information __________
>
> This message was checked by NOD32 antivirus system.
> http://www.eset.com
>
>
|