```Date: Tue, 28 Feb 2006 19:02:41 +0100 Reply-To: "adel F." Sender: "SAS(r) Discussion" From: "adel F." Subject: Re: QR:cluster analysis with binary variables Comments: To: Dennis Fisher In-Reply-To: Content-Type: text/plain; charset=iso-8859-1 Thanks Dennis for your help, I do not know Clustan , is this an independent package to SAS, how I can get it? Thanks a lot Adel Dennis Fisher a écrit : Adel, With 12 binary variables you can do cluster analysis directly and do not need to do correspondence analysis first. With binary variables you need to use proximity measures that are suitable for binary variables. That is all. The two approaches answer somewhat different questions. SAS has come a long way in its cluster analysis procedures. I use CLUSTAN usually for cluster analysis and it has proximity measures specifically for binary data, but SAS does also now I believe. With 12 variables this should make a very nice cluster analysis, if cluster analysis will answer the question that you are asking. Aldenderfer and Blashfield have a good explanation of some of the measures for binary data. HTH Dennis On Tue, 28 Feb 2006 15:01:34 +0100 (CET) "adel F." wrote: > Thank you very much Dennis, your comments are very >interesting. > > I have used 12 binary variables. > First, I have done the correspondence analysis to >extract axes (first step), the two first axes explain 34% >of the variance. > I have used 9 axes in the cluster step (second step) , >the 9 axes explain 92% of the variance. > > I have used the cluster analysis, with the results from >correspondence analysis, because my understanding is that >Cluster analysis are appropriate for continuous variables >and not for binary variables, as in my case. > > Adel > > Dennis Fisher a écrit : > Adel, > Part of the issue is how many binary variables do you >have > and how many axies do you think you will wind up with? >My > experience with correspondence analysis is that it is >used > for large two-way tables. I wind up with two axies. Two > variables would not make a very interesting cluster > analysis. An ideal number of variables for a cluster > analysis would be in the neighborhood of about half a > dozen up to about two dozen. This would be both > interesting and manageable. If you do a correspondence > analysis, I do not understand why you do not stop there > and interpret your results and present them. If you want > to do a cluster analysis, then just do a cluster >analysis > with binary variables. I do not understand why you want > the additional comnplication of doing both. > Dennis > On Tue, 28 Feb 2006 11:04:58 +0100 (CET) > "adel F." wrote: >> Hi, >> Do you think, that a correspondence analysis approach >>for binary variables, followed by a cluster analysis >>using axies in the correspondence analysis step, is not >>appropriate to obtain the clusters? >> Adel >> >> "Dennis G. Fisher" a écrit : >> I am glad you said this. In their book Aldenderfer and >>Blashfield attack >> this idea a great deal. Their example is that people >>will do a cluster >> analysis and then do a discriminant analysis on the same >>variables that >> made up the cluster and discriminate the clusters. They >>then try to >> claim that this is a measure of the validity of the >>cluster analysis. >> A&B come out strongly against doing this. >> Dennis Fisher >> >> David L Cassell wrote: >> >>> adel_tangi@YAHOO.FR wrote back: >>> >>>> Hi David, >>>> Thank you very much for your interesting reply; this >>>>encourages me to go >>>> further in my analysis. I will try to interpret the axes >>>>at the >>>> correspondence analysis step, as you suggest, in order >>>>to interpret the >>>> results of the cluster analysis easily. >>>> >>>> I am also thinking that after obtaining the clusters, I >>>>can do a >>>> multinomial analysis (proc catmod) with the clusters as >>>>dependent >>>> variable >>>> and using the original binary variables as independent >>>>variables, >>>> this can >>>> also explain the association between the original binary >>>>variables >>>> and the >>>> clusters. >>> >>> >>> I don't recommend this last part. You built the cluster >>>specifically as a >>> clustering >>> algorithm on linear combinations of (some of) the binary >>>variables. This >>> addition >>> only makes your reasoning particularly circular. You >>>will be forcing the >>> results to >>> show you the axes from the correspondence analysis. Any >>>variables which >>> were >>> important in those axes will come out as important here, >>>and any >>> variables >>> which >>> were overlooked in those axes (the ones you used) will >>>be overlooked >>> here. >>> I >>> do not think that this extra analysis will help you any. >>> >>> HTH, >>> David >>> -- >>> David L. Cassell >>> mathematical statistician >>> Design Pathways >>> 3115 NW Norwood Pl. >>> Corvallis OR 97330 >>> >>> _________________________________________________________________ >>> FREE pop-up blocking with the new MSN Toolbar – get it >>>now! >>> http://toolbar.msn.click-url.com/go/onm00200415ave/direct/01/ >>> >> >> -- >> Dennis G. Fisher, Ph.D. >> Director >> Center for Behavioral Research and Services >> 1090 Atlantic Avenue >> Long Beach, CA 90813 >> 562-495-2330 >> 562-983-1421 fax >> >> >> >> --------------------------------- >> Nouveau : téléphonez moins cher avec Yahoo! Messenger ! >>Découvez les tarifs exceptionnels pour appeler la France >>et l'international.Téléchargez la version beta. > > > > --------------------------------- > Nouveau : téléphonez moins cher avec Yahoo! Messenger ! >Découvez les tarifs exceptionnels pour appeler la France >et l'international.Téléchargez la version beta. --------------------------------- Nouveau : téléphonez moins cher avec Yahoo! Messenger ! Découvez les tarifs exceptionnels pour appeler la France et l'international.Téléchargez la version beta. ```

Back to: Top of message | Previous page | Main SAS-L page