Date: Fri, 8 Feb 2008 14:48:28 -0800
Reply-To: "Dennis G. Fisher, Ph.D." <dfisher@CSULB.EDU>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Dennis G. Fisher, Ph.D." <dfisher@CSULB.EDU>
Subject: Re: Hierarchical agglomerative clustering - a couple of questions
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
The algorithm is not really the issue here. The issue is the proximity
measure. The measure that is specifically for this purpose
is Gower's coefficient. Other measures have been "forced" into service
for these applications, but their use may
sometimes be questioned by knowledgeable reviewers. As far as the
number of variables to use for cluster
analysis, this can vary quite a bit. Using too few will make the
analysis trivial, and using too many may make the
results hard to interpret. A "sweet spot" may be from 6-18 variables
for a "nice" cluster analysis although people have
done good analyses with more or fewer than this. A good reference is
Aldenderfer, M. S., & Blashfield, R. K. (1984). Cluster analysis.
Beverly Hills: Sage.
> Dear SAS L-ers,
> I was wondering whether a cluster analysis method exists that is
> capable to detect groups of patients when they are caracterised by
> both categorical xand continuous variables. What is the distance used
> between patients ? What is the algorithm for merging clusters
> together ? Any reference for that ?
> Another question I was wondering is: I've heard that it was
> recommended to not use too many variables to define clusters. Does one
> know a rule of thumb of the maximal number of descriptors to use
> according to the sample sze ? Any reference for that ? (FYI: This
> topic is not considered in the book "Statistical Rules of Thumb")
> Thank you very much.
Dennis G. Fisher, Ph.D.
Professor and Director
Center for Behavioral Research and Services
1090 Atlantic Avenue
Long Beach, CA 90813
Ph: 562-495-2330 x121