Date: Thu, 7 Jul 2005 11:39:58 -0400
Reply-To: Wensui Liu <firstname.lastname@example.org>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Wensui Liu <liuwensui@GMAIL.COM>
Subject: Re: Cluster analysis for binary data
Content-Type: text/plain; charset=ISO-8859-1
chaid is similar to logistic regression in the sense that it is
designed for supervised learning. So chaid is not suitable for cluster
analysis, which is unsupervised learning.
Paul might be right that latent class model can be used rather than
cluster analysis. Then what you need is latent gold rather than SAS.
Am I right, Paul?
On 7/7/05, Susie Li <Susie.Li@tvguide.com> wrote:
> Peter was right. I forgot about the flow.
> How about Chaid analysis?
> Susie Li
> TV Guide
> 1211 Avenue of the Americas
> New York, NY 10036
> Tel 212.852.7453
> Email email@example.com
> -----Original Message-----
> From: Peter Flom [mailto:firstname.lastname@example.org]
> Sent: Thursday, July 07, 2005 10:34 AM
> To: SAS-L@LISTSERV.UGA.EDU; Susie Li
> Subject: Re: Cluster analysis for binary data
> >>> Susie Li <Susie.Li@TVGUIDE.COM> 7/7/2005 8:21:29 AM >>>
> With nominal and binary data, you are better off using
> regression instead of clustering, because you are violating too many
> clustering assumptions.
> With nominal data, you need to do some data transformation (changing
> them to binary) before logistic regressions.
> Logistic regression is not really a substitute for cluster analysis, as
> far as I can see. In logistic regression (whether binary or multinomial
> logistic) you need to know the categories BEFORE you start the analysis.
> With cluster analysis, you are attempting to determine the number of
> categories and which subjects go into which cluster.
> Can cluster analysis be done with binary data?
> Well, I am no expert; the OP might want to search the archives of
> SAS-L, I think this has been discussed before. Also, the OP might want
> to write to CLASS-L, which is all about classification and clustering.
> It's not very busy, but it's there.
> The problem I see is not with the clustering method, but with the
> determination of distance. But that's just a gut feeling, not backed by
> literature or research.
> I'd be interested in hearing what the statistics experts on this list
> think about this.
> Peter L. Flom, PhD
> Assistant Director, Statistics and Data Analysis Core
> Center for Drug Use and HIV Research
> National Development and Research Institutes
> 71 W. 23rd St
> New York, NY 10010
> (212) 845-4485 (voice)
> (917) 438-0894 (fax)
WenSui Liu, MS MA
Senior Decision Support Analyst
Division of Health Policy and Clinical Effectiveness
Cincinnati Children Hospital Medical Center