LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (October 2003)Back to main SPSSX-L pageJoin or leave SPSSX-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Mon, 27 Oct 2003 10:11:09 +0100
Reply-To:     Spousta Jan <JSpousta@CSAS.CZ>
Sender:       "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From:         Spousta Jan <JSpousta@CSAS.CZ>
Subject:      Re: workaround for clustering
Content-Type: text/plain; charset="iso-8859-1"

Dear Frank,

I would probably prefer your workaround B (try both B1 and B2). It can somehow use the information contained in the 70% of cases dropped in the first step.

I fear you are true in what you write about TwoStep. This procedure gives often very mysterious results of poor quality. If you think about bayesian clustering, you should perhaps try something other - e.g. AutoClass (a freeware from NASA).

http://ic-www.arc.nasa.gov/ic/projects/bayes-group/autoclass/

Does please somebody know how to work with TwoStep in real situations and how to obtain reliable results from it? I am going to make a segmentation of some 2,000,000+ clients of our bank in next months and I feel rather unlucky if I remember TwoStep, too.

Greetings

Jan

-----Original Message----- From: Frank Thomas [mailto:news.ftr@FREE.FR] Sent: Friday, October 24, 2003 7:27 PM To: SPSSX-L@LISTSERV.UGA.EDU Subject: workaround for clustering

Hi, and here comes the following problem, after having solved the first one with your help.

I want to classify about 2000 cases and 40 yes/no items with hierarchical clustering. I use CLUSTER (hierarchical clustering) as I have no idea about the number of types which might result from the classification. But this approach doesn't work as the number of cells is too large (over 900,000).

2 possible workarounds:

workaround A: I sample 30%, thus reduce the number of cells. run a hierarchical clustering of the sample take the resulting classification variable into a discriminant analysis let discriminant analysis reclassify the non sampled cases.

workaround B: I sample 30%, thus reduce the number of cells. run a hierarchical clustering of the sample, use the resulting classification variable in a k-means cluster analysis alternative B1: as starting configuration alternative B2: just as a clarification of how many classes I should demand.

What do you think about the workarounds, or are there more elegant ways? For instance, I didn't find a method to establish how many cells are demanded so that I could set the SET MXCELLS command to the appropriate level.

BTW: I used TwoStep Clustering but this procedure resulted in a large and variable number of outliers (depending upon the treatment of data noise) and very few classes, often one. Also, I don't' like to work without a good manual for a procedure; TwoStepClustering remains a mystery for me.

Thanks for your hints, Frank Thomas

-- ...................... Dr. Frank Thomas FTR Internet Research Rosny, France


Back to: Top of message | Previous page | Main SPSSX-L page