Date: Tue, 1 Jun 2004 13:21:54 -0600
Reply-To: William Dudley <william.dudley@nurs.utah.edu>
Sender: "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From: William Dudley <william.dudley@nurs.utah.edu>
Subject: Cluster analysis and normality
Content-Type: text/plain; charset=US-ASCII
I have symptom severity data across 10 symptoms (e.g. pain, anxiety,
etc), with responses ranging from 1 (mild ) to 10 (severe)
that I would like to cluster analyze (my N is about 500).
My problem is that not everyone has all symptoms.
As many as 80% might report NOT having the symptom.
In no case are the symptoms exclusionary as we might have if we had
both males and females and asked about prostate enlargement for
instance.
That is there is always some non zero probability that a given patient
may exhibit a symptom.
If I recode NOT having the symptoms as a zero and create a new score
ranging from 0 to 10, then I get
very non normal distributions. Even if I recode the 1 - 10 scores into
mild moderate of severe, I end up with non normal distributions.
If I only use those cases reporting all symptoms, I end up losing over
90% of my sample.
Of course, this scale is only approximately interval level, even before
the recoding. l
I have thought of using the two step cluster analysis which allows for
categorical variables, however, the symptom severity numbers,
although non normal are certainly NOT categorical.
My question, "How robust are the cluster analysis routines to
deviations from normality?"..
or
"Any suggestions on how to proceed?"
Thanks in advance,
Bill
**********************************************************************
William N. Dudley, PhD
Emma Eccels Jones Nursing Research Center
University of Utah
College of Nursing
10 South 2000 East
Salt Lake City, UT 84122-5880
http://www.nurs.utah.edu/faculty/william_dudley.htm
**********************************************************************