Date:         Thu, 7 Jul 2005 12:13:51 -0700
Reply-To:     "Dennis G. Fisher" <dfisher@CSULB.EDU>
From:         "Dennis G. Fisher" <dfisher@CSULB.EDU>
Subject:      Re: Cluster analysis for binary data
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head> <meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"> <title></title> </head> <body text="#000000" bgcolor="#ffffff"> Not to confuse the issue, but as a "little birdie" who is very good with cluster analysis pointed out to me, the simple matching coefficient is mathematically equivalent to squared euclidean distance, hence you may be able to skip the PROC DISTANCE step of the cluster analysis.&nbsp; I have not recently done this myself.&nbsp; Also, FWIW, when the data are entirely binary, then the Gower coefficient is equivalent to the Jaccard coefficient.&nbsp; <br> Dennis Fisher<br> <br> Jerry Davis wrote:<br> <blockquote type="cite" cite=""> <pre wrap="">Peter Flom wrote:

</pre> <blockquote type="cite"> <pre wrap="">Can cluster analysis be done with binary data? </pre> </blockquote> <pre wrap=""><!----> I did this recently with genetic marker data from soybean varieties. Calculate a distance matrix and use it for the clustering. There is an example of this under PROC CLUSTER in the STAT documentation. I think version 9 includes PROC DISTANCE which may replace the distance macro. I used Ward's method based on some previous analyses.

I rarely do cluster analysis and make no claims to being expert at it.

Jerry -- Jerry Davis Experimental Statistics UGA, CAES, Griffin Campus

-- Dennis G. Fisher, Ph.D. Professor and Director Center for Behavioral Research and Services California State University, Long Beach 1090 Atlantic Avenue Long Beach, California 90813 (562) 495-2330 x121 fax (562) 983-1421

