I agree, that from the algorithm described it is not clear, which cases are used as VERY initial seeds. The implication is 'first k cases', however. The syntax below, may be, is not a proof, but an illustration that SPSS indeed uses the first k cases as those unexplained (very initial) seeds.

The 4 points form a square on the scatter. If we request two-cluster solution than the points 1 and 2 have the same 'rights' to be initial seeds as the points 3 and 4 (whichever pair are the most distant points). Clearly, if the 1 and 2 will be the first cases, they will be chosen as initials. If the 3 and 4 will be the first they will be chosen. For the 1-3-2-4 order the point 2 will replace point 3 and the 1st and 2nd points will be initials again, etc.

DATA LIST LIST /point x y. BEGIN DATA 1 1 1 2 2 2 3 1 2 4 2 1 END DATA.

GRAPH /SCATTERPLOT(BIVAR)=x WITH y BY point (NAME) /MISSING=LISTWISE .

QUICK CLUSTER x y /CRITERIA= CLUSTER(2) /METHOD=KMEANS(NOUPDATE) /PRINT INITIAL.

SORT CASES BY point (D).

QUICK CLUSTER x y /CRITERIA= CLUSTER(2) /METHOD=KMEANS(NOUPDATE) /PRINT INITIAL.

I have the PDF file from SPSS that shows the exact algorithm you mentioned. It looks like what you have explained in your message is exactly the same as the algorithm explained in that file. According to this algorithm from SPSS, quick cluster algorithm consists of three steps and the very first step is what you have described in your message except that it does not explain how it chooses the initial seeds to begin with. My question is all about this unexplained starting seeds.

It may be the case that the first step is not actually the part of quick cluster algorithm. But whatever the situation is, I'm now clear about the k-mean process and thank you guys.

The algorithm does not explain how to choose the initial seeds.

