| Date: | Tue, 28 Feb 2006 19:21:49 +0100 |
| Reply-To: | "adel F." <adel_tangi@YAHOO.FR> |
| Sender: | "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU> |
| From: | "adel F." <adel_tangi@YAHOO.FR> |
| Subject: | Re: QR:cluster analysis with binary variables |
|
| In-Reply-To: | <000801c63c81$4ae44740$d3dca7c8@THINQ> |
| Content-Type: | text/plain; charset=iso-8859-1 |
Thanks a lot for all these informations, these are really helpful, with the information that Dennis gave.
I will try these and come back to the list
Adel
Rogerio Porto <rdporto1@TERRA.COM.BR> a écrit :
adel F." wrote:
> I have used 12 binary variables.
> First, I have done the correspondence analysis to extract axes (first
> step), >the two first axes explain 34% of the variance.
> I have used 9 axes in the cluster step (second step) , the 9 axes
> explain >92% of the variance.
Technically you are throwing away little information (8%). You could
use all the 12 axes to do your cluster analysis.
> I have used the cluster analysis, with the results from correspondence
> >analysis, because my understanding is that Cluster analysis are
> appropriate >for continuous variables and not for binary variables, as in
> my case.
Actually, you can do cluster analysis with any kind of variable: nominal,
ordinal, interval or ratio scales. For each one you have to choose an
appropiated distance measure. There are tons of them and you may
have to waste some time choosing the most appropriate. You can create
your own distance measure but it could be a little programming.
These distance measures can be computed using a macro supplied by
SAS (macro %distance):
http://support.sas.com/ctx/samples/index.jsp?sid=475
If you are using SAS 9.1, you can compute the distances using the new
PROC DISTANCE.
The %DISTANCE macro computes various measures of distance,
dissimilarity, or similarity between the observations (rows) of a SAS data
set.
These proximity measures are stored as a lower triangular or a square matrix
in an output data set, depending on the specification of the SHAPE=, that
can
then be used as input to the CLUSTER, MDS, or MODECLUS procedures.
I think this is accordingly with what Dennis Fisher
said about doing cluster analysis directly using the binary variables.
HTH,
Rogerio Porto.
---------------------------------
Nouveau : téléphonez moins cher avec Yahoo! Messenger ! Découvez les tarifs exceptionnels pour appeler la France et l'international.Téléchargez la version beta.
|