Date: Thu, 7 Jul 2005 12:13:51 -0700
Reply-To: "Dennis G. Fisher" <dfisher@CSULB.EDU>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Dennis G. Fisher" <dfisher@CSULB.EDU>
Organization: California State University, Long Beach
Subject: Re: Cluster analysis for binary data
Content-Type: text/html; charset=us-ascii
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1">
<body text="#000000" bgcolor="#ffffff">
Not to confuse the issue, but as a "little birdie" who is very good
with cluster analysis pointed out to me, the simple matching
coefficient is mathematically equivalent to squared euclidean distance,
hence you may be able to skip the PROC DISTANCE step of the cluster
analysis. I have not recently done this myself. Also, FWIW, when the
data are entirely binary, then the Gower coefficient is equivalent to
the Jaccard coefficient. <br>
Jerry Davis wrote:<br>
<pre wrap="">Peter Flom wrote:
<pre wrap="">Can cluster analysis be done with binary data?
I did this recently with genetic marker data from soybean varieties.
Calculate a distance matrix and use it for the clustering. There is an
example of this under PROC CLUSTER in the STAT documentation. I think
version 9 includes PROC DISTANCE which may replace the distance macro.
I used Ward's method based on some previous analyses.
I rarely do cluster analysis and make no claims to being expert at it.
UGA, CAES, Griffin Campus
<pre class="moz-signature" cols="72">--
Dennis G. Fisher, Ph.D.
Professor and Director
Center for Behavioral Research and Services
California State University, Long Beach
1090 Atlantic Avenue
Long Beach, California 90813
(562) 495-2330 x121
fax (562) 983-1421</pre>