Date: Tue, 1 Jun 2004 16:56:22 -0400
Reply-To: Art@DrKendall.org
Sender: "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From: Art Kendall <Arthur.Kendall@verizon.net>
Organization: Social Research Consultants
Subject: Re: Cluster analysis and normality
In-Reply-To: <s0bc82fc.028@gwdom2-med.med.utah.edu>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Cluster analysis is an exploratory procedure. There is no assumption of
normality. To draw the parallel coordinate (aka profile graphs) , it
necessary to have all of the variables on a common scale. You already
have that.
Rarely is it appropriate to use just one procedure and one similarity
measure. You should try different approaches to assure you results are
not particular to a method-coefficient combination. Treating the data
as categorical and as continuous in different runs will give you some
insight into the "reality" of your clusters.
Discriminant function analyses, ignoring conventional interpretation of
the tests, are very useful in interpreting solutions. You would use the
cluster membership as the group variable and the variables the
clustering was based on as the predictors.
This could be a very interesting application. I would like to hear what
you come up with.
Art
Art@DrKendall.org
Social Research Consultants
University Park, MD USA
(301) 864-5570
William Dudley wrote:
>I have symptom severity data across 10 symptoms (e.g. pain, anxiety,
>etc), with responses ranging from 1 (mild ) to 10 (severe)
> that I would like to cluster analyze (my N is about 500).
>My problem is that not everyone has all symptoms.
>As many as 80% might report NOT having the symptom.
>In no case are the symptoms exclusionary as we might have if we had
>both males and females and asked about prostate enlargement for
>instance.
>That is there is always some non zero probability that a given patient
>may exhibit a symptom.
>
>
>If I recode NOT having the symptoms as a zero and create a new score
>ranging from 0 to 10, then I get
>very non normal distributions. Even if I recode the 1 - 10 scores into
>mild moderate of severe, I end up with non normal distributions.
>If I only use those cases reporting all symptoms, I end up losing over
>90% of my sample.
>Of course, this scale is only approximately interval level, even before
>the recoding. l
>
>I have thought of using the two step cluster analysis which allows for
>categorical variables, however, the symptom severity numbers,
>although non normal are certainly NOT categorical.
>
>My question, "How robust are the cluster analysis routines to
>deviations from normality?"..
>
>or
>
>"Any suggestions on how to proceed?"
>
>
>Thanks in advance,
>Bill
>
>
>**********************************************************************
>
>
> William N. Dudley, PhD
> Emma Eccels Jones Nursing Research Center
>
> University of Utah
>
> College of Nursing
>
> 10 South 2000 East
>
> Salt Lake City, UT 84122-5880
>
> http://www.nurs.utah.edu/faculty/william_dudley.htm
>
>**********************************************************************
>
>
>
|