LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (July 2005, week 1)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Thu, 7 Jul 2005 11:39:58 -0400
Reply-To:     Wensui Liu <liuwensui@gmail.com>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Wensui Liu <liuwensui@GMAIL.COM>
Subject:      Re: Cluster analysis for binary data
Comments: To: Susie Li <Susie.Li@tvguide.com>
In-Reply-To:  <A45FFB324336484D9AEF368378FA7FBD9EF98B@nyc1po2.int.tvguideinc.com>
Content-Type: text/plain; charset=ISO-8859-1

chaid is similar to logistic regression in the sense that it is designed for supervised learning. So chaid is not suitable for cluster analysis, which is unsupervised learning.

Paul might be right that latent class model can be used rather than cluster analysis. Then what you need is latent gold rather than SAS. Am I right, Paul?

On 7/7/05, Susie Li <Susie.Li@tvguide.com> wrote: > Peter was right. I forgot about the flow. > > How about Chaid analysis? > > Susie Li > TV Guide > 1211 Avenue of the Americas > New York, NY 10036 > Tel 212.852.7453 > Email susie.li@tvguide.com > > -----Original Message----- > From: Peter Flom [mailto:flom@ndri.org] > Sent: Thursday, July 07, 2005 10:34 AM > To: SAS-L@LISTSERV.UGA.EDU; Susie Li > Subject: Re: Cluster analysis for binary data > > >>> Susie Li <Susie.Li@TVGUIDE.COM> 7/7/2005 8:21:29 AM >>> > > <<< > With nominal and binary data, you are better off using > multinomial/logistic > regression instead of clustering, because you are violating too many > clustering assumptions. > > With nominal data, you need to do some data transformation (changing > them to binary) before logistic regressions. > >>> > > Logistic regression is not really a substitute for cluster analysis, as > far as I can see. In logistic regression (whether binary or multinomial > logistic) you need to know the categories BEFORE you start the analysis. > With cluster analysis, you are attempting to determine the number of > categories and which subjects go into which cluster. > > Can cluster analysis be done with binary data? > > Well, I am no expert; the OP might want to search the archives of > SAS-L, I think this has been discussed before. Also, the OP might want > to write to CLASS-L, which is all about classification and clustering. > It's not very busy, but it's there. > > The problem I see is not with the clustering method, but with the > determination of distance. But that's just a gut feeling, not backed by > literature or research. > > I'd be interested in hearing what the statistics experts on this list > think about this. > > > Peter > > Peter L. Flom, PhD > Assistant Director, Statistics and Data Analysis Core > Center for Drug Use and HIV Research > National Development and Research Institutes > 71 W. 23rd St > www.peterflom.com > New York, NY 10010 > (212) 845-4485 (voice) > (917) 438-0894 (fax) >

-- WenSui Liu, MS MA Senior Decision Support Analyst Division of Health Policy and Clinical Effectiveness Cincinnati Children Hospital Medical Center


Back to: Top of message | Previous page | Main SAS-L page