|
Further to my previous message:
Clustering starts with a collection of single "cases" gradually
aggregated into groups or "clusters" based on the cases' similarity in a
number of variables. This is done iteratively. At the first step, the
two closest cases are grouped into one cluster. At the second step,
either a third cases goes into that cluster, or a new cluster of two is
formed. At each subsequent step, either two single cases join together
into a new cluster, or a case is added to a cluister, or two clusters
are merged into one.
It would probably help you to recall that the most usual methods to
gradually aggregate cases into clusters are called "single linkage
method" and "complete linkage method". Assume you have a cluster (made
of one or more cases) and other isolated cases at varying "distances"
from your cluster. (Distances can be measured in a variety of ways, but
ordinary Euclidean distance if al variables are previously stasndardized
is one of the simplest of those ways.)
The problem of clustering is to decide what is the next case to be
included in the cluster. The "single linkage method" selects the case
which is closest to ANYONE of the cases already inside the cluster. The
"complete linkage method" selects the case that is closest to the
FURTHEST case in the cluster (thus ensuring it is even closer to the
rest). The "centroid method" selects the case that is closest to the
cluster's "center of gravity" (roughly, the average position of the
cases included in the cluster). These are some of the methods of
clustering usually available (SPSS has these and more). The choice of
method depends on the purpose of the clustering, and the nature of data.
Brian Norton wrote:
>
> If anybody knows where I can find some information on linkage cluster
> analysis (what it is, how to do it to a small data set, anything) please let
> me know. Sooner is better
> thanks in advance
|