Date: Mon, 7 Sep 1998 23:12:43 -0400
Reply-To: Stefan Jonsson <stefanj@POP.PSU.EDU>
Sender: "SPSSX(r) Discussion" <SPSSX-L@UGA.CC.UGA.EDU>
From: Stefan Jonsson <stefanj@POP.PSU.EDU>
Subject: Cluster analysis - input data
Content-Type: text/plain; charset="us-ascii"
Dear all SPSS-ers
I have two related technical questions on Cluster analysis.
The first one is on input format and the second one on memory and
time limitation.
I am going to perform a cluster analysis on a sample with n=5500 I am going to calculate distances between cases in another application (TDA).
Is it possible to insert the distance matrix or distance list from a file looking like this?
27 26 6.00
28 1 2.29
28 4 2.29
28 6 3.58
28 7 4.29
28 9 2.29
28 10 9.75
28 11 10.60
28 12 12.57
28 14 9.40
28 17 4.60
28 18 6.20
28 21 9.00
28 23 10.75
28 24 3.29
28 25 10.00
The first column is the ID for case A and second column is the ID for case B the last column is the distance between these two individuals. The lower diagonal matrix would be to big for me to construct when this is the output format from TDA
If it is not possible to input these data into SPSS suggestions of alternative program for cluster analysis highly appreciated
The second question is is not as important as I will probably find out after I insert the data. Given that I will be able to squeeze the data into
SPSS in the format mentioned above, do I still face the same memory-time limitation. In other words, using Cluster, is the distance measure-procedure less or more time-space consuming than the clustering-process of the cases given the distances? Where is the cluster analysis memory-time bottleneck located?
Best regards
Stefan Jonsson
Graduate student in sociology
Penn State University