LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (July 2005, week 1)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Thu, 7 Jul 2005 11:19:21 -0400
Reply-To:     Susie Li <Susie.Li@TVGUIDE.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Susie Li <Susie.Li@TVGUIDE.COM>
Subject:      Re: Cluster analysis for binary data
Comments: To: Peter Flom <flom@ndri.org>
Content-Type: text/plain

Peter was right. I forgot about the flow.

How about Chaid analysis?

Susie Li TV Guide 1211 Avenue of the Americas New York, NY 10036 Tel 212.852.7453 Email susie.li@tvguide.com

-----Original Message----- From: Peter Flom [mailto:flom@ndri.org] Sent: Thursday, July 07, 2005 10:34 AM To: SAS-L@LISTSERV.UGA.EDU; Susie Li Subject: Re: Cluster analysis for binary data

>>> Susie Li <Susie.Li@TVGUIDE.COM> 7/7/2005 8:21:29 AM >>>

<<< With nominal and binary data, you are better off using multinomial/logistic regression instead of clustering, because you are violating too many clustering assumptions.

With nominal data, you need to do some data transformation (changing them to binary) before logistic regressions. >>>

Logistic regression is not really a substitute for cluster analysis, as far as I can see. In logistic regression (whether binary or multinomial logistic) you need to know the categories BEFORE you start the analysis. With cluster analysis, you are attempting to determine the number of categories and which subjects go into which cluster.

Can cluster analysis be done with binary data?

Well, I am no expert; the OP might want to search the archives of SAS-L, I think this has been discussed before. Also, the OP might want to write to CLASS-L, which is all about classification and clustering. It's not very busy, but it's there.

The problem I see is not with the clustering method, but with the determination of distance. But that's just a gut feeling, not backed by literature or research.

I'd be interested in hearing what the statistics experts on this list think about this.

Peter

Peter L. Flom, PhD Assistant Director, Statistics and Data Analysis Core Center for Drug Use and HIV Research National Development and Research Institutes 71 W. 23rd St www.peterflom.com New York, NY 10010 (212) 845-4485 (voice) (917) 438-0894 (fax)


Back to: Top of message | Previous page | Main SAS-L page