Date: Thu, 7 Jul 2005 11:19:21 -0400
Reply-To: Susie Li <Susie.Li@TVGUIDE.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Susie Li <Susie.Li@TVGUIDE.COM>
Subject: Re: Cluster analysis for binary data
Content-Type: text/plain
Peter was right. I forgot about the flow.
How about Chaid analysis?
Susie Li
TV Guide
1211 Avenue of the Americas
New York, NY 10036
Tel 212.852.7453
Email susie.li@tvguide.com
-----Original Message-----
From: Peter Flom [mailto:flom@ndri.org]
Sent: Thursday, July 07, 2005 10:34 AM
To: SAS-L@LISTSERV.UGA.EDU; Susie Li
Subject: Re: Cluster analysis for binary data
>>> Susie Li <Susie.Li@TVGUIDE.COM> 7/7/2005 8:21:29 AM >>>
<<<
With nominal and binary data, you are better off using
multinomial/logistic
regression instead of clustering, because you are violating too many
clustering assumptions.
With nominal data, you need to do some data transformation (changing
them to binary) before logistic regressions.
>>>
Logistic regression is not really a substitute for cluster analysis, as
far as I can see. In logistic regression (whether binary or multinomial
logistic) you need to know the categories BEFORE you start the analysis.
With cluster analysis, you are attempting to determine the number of
categories and which subjects go into which cluster.
Can cluster analysis be done with binary data?
Well, I am no expert; the OP might want to search the archives of
SAS-L, I think this has been discussed before. Also, the OP might want
to write to CLASS-L, which is all about classification and clustering.
It's not very busy, but it's there.
The problem I see is not with the clustering method, but with the
determination of distance. But that's just a gut feeling, not backed by
literature or research.
I'd be interested in hearing what the statistics experts on this list
think about this.
Peter
Peter L. Flom, PhD
Assistant Director, Statistics and Data Analysis Core
Center for Drug Use and HIV Research
National Development and Research Institutes
71 W. 23rd St
www.peterflom.com
New York, NY 10010
(212) 845-4485 (voice)
(917) 438-0894 (fax)