LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (July 2005, week 1)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Thu, 7 Jul 2005 16:27:15 +0100
Reply-To:     Ian Wakeling <ian.wakeling@HANANI.QISTATS.CO.UK>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Ian Wakeling <ian.wakeling@HANANI.QISTATS.CO.UK>
Subject:      Re: Cluster analysis for binary data
Content-Type: text/plain; charset="iso-8859-1"

> >>> Susie Li <Susie.Li@TVGUIDE.COM> 7/7/2005 8:21:29 AM >>> > > <<< > With nominal and binary data, you are better off using > multinomial/logistic > regression instead of clustering, because you are violating too many > clustering assumptions. > > With nominal data, you need to do some data transformation (changing > them to binary) before logistic regressions. > >>>

"Peter Flom" <flom@NDRI.ORG replied

> Logistic regression is not really a substitute for cluster analysis, as > far as I can see. In logistic regression (whether binary or multinomial > logistic) you need to know the categories BEFORE you start the analysis. > With cluster analysis, you are attempting to determine the number of > categories and which subjects go into which cluster. > > Can cluster analysis be done with binary data? > > Well, I am no expert; the OP might want to search the archives of > SAS-L, I think this has been discussed before. Also, the OP might want > to write to CLASS-L, which is all about classification and clustering. > It's not very busy, but it's there. > > The problem I see is not with the clustering method, but with the > determination of distance. But that's just a gut feeling, not backed by > literature or research. > > I'd be interested in hearing what the statistics experts on this list > think about this. > >

I don't claim to be an expert, however I think Peter is right. If I look in the SAS sample library I have the file C:\Program Files\SAS\SAS 9.1\stat\sample\distanx2.sas that contains an interesting example of clustering with binary data on conditions for divorce in US states. It uses the new PROC DISTANCE procedure to compute a Jacard Dissimilarity Coefficient. With this sort of data it's important to decide if zero-zero matches are important or not as this influences the choice of distance measure.

Ian.


Back to: Top of message | Previous page | Main SAS-L page