Date: Fri, 4 Apr 1997 00:55:27 -0800
Sender: "SPSSX(r) Discussion" <SPSSX-L@UGA.CC.UGA.EDU>
From: "Hector E. Maletta" <hmaletta@OVERNET.COM.AR>
Subject: Re: cluster analysis question
Content-Type: text/plain; charset=us-ascii
Jerry Vogt, Z4, ex. 4945 VOGTJ1 - AAL wrote:
> As a part of a market segmentation study, we are
> doing cluster analysis on about 3000 consumers
> who responded to a mail survey. We are analyzing
> about 50 of the variables in the survey. Most
> of these variables involved a 1-10 preference
> rating about an attribute and thus could be considered
> interval data. However, a few of the variables are
> nominal. In my background reading on cluster analysis,
> this problem of different levels of measurement
> is not extensively discussed (Churchill's
> "Marketing Research" 6th ed. does devote a few
> pages to the topic).
However, the discussion should be very brief: only interval measures are
allowed, since cluster analysis is based on arithmetic means. However,
ordinal measures might be acceptable as long as you bet the distances
between consecutive ranks are not very different.
My questions are:
> 1) how would you recommend I deal with this
> issue within SPSS using the K-means quick cluster?
> (standardize or transform the data, etc.)
It is absolutely necessary to standardize the data, producing z-scores
(this is done by the DESCRIPTIVES procedure, which includes an option to
copy z-scores as variables onto the working file). Otherwise, any
clustering would be dependent on the particular units of measurement
used for the variables (e.g. you change HEIGHT from inches to
centimeters and there go your clusters...)
If you don't know in advance how many clusters you want to create, try
CLUSTER on the complete file or a sample, then choose a reasonable
number of clusters (k), and finally apply k-means QUICK CLUSTER.
> 2) what good background sources are recommended
> for cluster analysis, especially in the context
> of market segmentation work?
More than a source, a suggestion: if the variables are nominal, or they
are ordinal and you have qualms about treating them as interval, try
CHAID for segmentation.
> Thanks in advance
Universidad del Salvador
Buenos Aires, Argentina