LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (April 2006)Back to main SPSSX-L pageJoin or leave SPSSX-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Fri, 21 Apr 2006 15:32:43 +0400
Reply-To:   Anton.Balabanov@fup.unn.ru
Sender:   "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From:   Anton Balabanov <Anton.Balabanov@fup.unn.ru>
Subject:   Re: Question about initial cluster centers for k-means
In-Reply-To:   <741218F5-2AB6-440E-9E3E-CA5674639170@duke.edu>
Content-Type:   text/plain; charset="iso-8859-1"

I agree, that from the algorithm described it is not clear, which cases are used as VERY initial seeds. The implication is 'first k cases', however. The syntax below, may be, is not a proof, but an illustration that SPSS indeed uses the first k cases as those unexplained (very initial) seeds.

The 4 points form a square on the scatter. If we request two-cluster solution than the points 1 and 2 have the same 'rights' to be initial seeds as the points 3 and 4 (whichever pair are the most distant points). Clearly, if the 1 and 2 will be the first cases, they will be chosen as initials. If the 3 and 4 will be the first they will be chosen. For the 1-3-2-4 order the point 2 will replace point 3 and the 1st and 2nd points will be initials again, etc.

Best, Anton

DATA LIST LIST /point x y. BEGIN DATA 1 1 1 2 2 2 3 1 2 4 2 1 END DATA.

GRAPH /SCATTERPLOT(BIVAR)=x WITH y BY point (NAME) /MISSING=LISTWISE .

QUICK CLUSTER x y /CRITERIA= CLUSTER(2) /METHOD=KMEANS(NOUPDATE) /PRINT INITIAL.

SORT CASES BY point (D).

QUICK CLUSTER x y /CRITERIA= CLUSTER(2) /METHOD=KMEANS(NOUPDATE) /PRINT INITIAL.

-----Original Message----- From: SPSSX(r) Discussion [mailto:SPSSX-L@LISTSERV.UGA.EDU]On Behalf Of Kyongje Sung Sent: Thursday, April 20, 2006 10:14 PM To: SPSSX-L@LISTSERV.UGA.EDU Subject: Re: Question about initial cluster centers for k-means

Thanks for your information.

I have the PDF file from SPSS that shows the exact algorithm you mentioned. It looks like what you have explained in your message is exactly the same as the algorithm explained in that file. According to this algorithm from SPSS, quick cluster algorithm consists of three steps and the very first step is what you have described in your message except that it does not explain how it chooses the initial seeds to begin with. My question is all about this unexplained starting seeds.

It may be the case that the first step is not actually the part of quick cluster algorithm. But whatever the situation is, I'm now clear about the k-mean process and thank you guys.

K. Sung

The algorithm does not explain how to choose the initial seeds.

On Apr 20, 2006, at 12:48 PM, Anton Balabanov wrote:

> Exactly so. There is a syntax at Ray's site which demonstrates how > to use > cluster centers from hierarchical for k-means analysis: > http://www.spsstools.net/Syntax/ClusterA/ > CentersOfHiearchicalCAasInitialValO > fK-means.txt > > Turning back to the original question. The algorithm used for QUICK > CLUSTER > in SPSS is described among algorithms (the collection of files > which could > be found at SPSS tech. support pages). There is the special > algorithm for > choosing well-spaced initial centroids. And it is not the same the > QUICK > CLUSTER. And initial seeds are not choosen randomly. > > General idea is to take k first nonmissing cases as initial seeds > and than > consider the k+1 case. If it's minimum distance to first k seeds is > greater, > than minimum distance between any pair from k seeds, the k+1 case > replaces > one of the first k seeds from the closest pair of seeds. Et cetera. > At the > last case we will have k well-spaced seeds. The clustering itself > begins. > That is why the result of the k-means clustering is slightly > depends on the > order of cases in the data file. > > Much detailed explanation is given, at the Subhash Sharma. Applied > Multivariate Techniques, Wiley, 1996. Three different algorithms > explaned. > One of them is said to be used by SAS, I think, the one of 3 is > used by SPSS > as well. If someone interested, I could scan and send to your private > e-mails corresponding pages. > > Best, > Anton > > > > > -----Original Message----- > From: SPSSX(r) Discussion [mailto:SPSSX-L@LISTSERV.UGA.EDU]On > Behalf Of > Hector Maletta > Sent: Thursday, April 20, 2006 8:21 PM > To: SPSSX-L@LISTSERV.UGA.EDU > Subject: Re: Question about initial cluster centers for k-means > > > Your strategy is sound, Dan, but I don't see the relation to > choosing SAS > over SPSS: you can do exactly the same with SPSS. QUICK CLUSTER > accepts a > matrix of starting cluster means to begin the analysis, and the > matrix could > of course come from a previous hierarchical clustering exercise. > Hector > > -----Mensaje original----- > De: SPSSX(r) Discussion [mailto:SPSSX-L@LISTSERV.UGA.EDU] En nombre > de Dan > Zetu > Enviado el: Thursday, April 20, 2006 1:11 PM > Para: SPSSX-L@LISTSERV.UGA.EDU > Asunto: Re: Question about initial cluster centers for k-means > > This is one reason I use SAS for clustering. I typically start with > hierarchical clustering, find the optimal number of clusters I > need, save > their centroids and subsequently submit them as "seeds" to a k-means > technique. As of now, I have not been able to figure out an equivalent > procedure in SPSS. > > Dan > >> From: Kyongje Sung <kjsung@duke.edu> >> Reply-To: Kyongje Sung <kjsung@duke.edu> >> To: SPSSX-L@LISTSERV.UGA.EDU >> Subject: Question about initial cluster centers for k-means >> Date: Thu, 20 Apr 2006 10:23:35 -0400 >> >> Hi, everyone... >> >> I have went through all the postings about cluster analysis and still >> not clear >> about the way K-means analysis choose initial cluster centers. >> When I looked at the manual for K-means analysis about initial >> centers, it says, >> >> "By default, a number of well-spaced cases equal to the number of >> clusters >> is selected from the data" >> >> >> I'm not sure about the part "well-spaced cases". Since K-means >> procedure uses >> QUICK CLUSTER algorithm and syntax, does this "well-spaced cases" >> mean that >> it used quick cluster algorithm to find the initial centers? Since >> quick cluster >> algorithm goes through all data points and makes initial >> centers as far away as possible, I guess this process means "well- >> spaced cases"... >> >> If it is, Quick Cluster, still, needs to start with some kind of seed >> (?) value to find >> the initial cluster centers. How does SPSS choose these initial value >> for quick cluster algorithm so that >> it can calculate real(?) initial cluster centers for k-means? >> Dose SPSS choose randomly from the data for the seed points for quick >> cluster? >> >> >> K. Sung

============================================ Kyongje Sung, Ph.D. -------------------------------------------- Postdoctoral Fellow, Purves Lab Center for Cognitive Neuroscience Levine Science Research Center (Rm. B243E) Box 90999 Duke University Durham, NC 27708 -------------------------------------------- Email: k.sung@duke.edu Tel: (919) 684-6276 Fax: (919) 681-0815 www.mind.duke.edu www.purveslab.net ============================================


Back to: Top of message | Previous page | Main SPSSX-L page