LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (July 2006)Back to main SPSSX-L pageJoin or leave SPSSX-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
=========================================================================
Date:         Fri, 14 Jul 2006 12:25:08 -0400
Reply-To:     "Thomas M. Guterbock" <tmg1p@cms.mail.virginia.edu>
Sender:       "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From:         "Thomas M. Guterbock" <tmg1p@cms.mail.virginia.edu>
Subject:      Achieving robust solution in two-step cluster analysis
Comments: cc: "Hartman, David" <deh9q@virginia.edu>
Content-Type: text/plain; charset=us-ascii; format=flowed

Hello: I'm brand new to this list, but have been using SPSS in its various forms for some 35 years. I'm engaged with colleagues in a fairly large project that seeks to 'segment' members of the public according to their preferences and practices in seeking health information. We have collected survey data from a sample of 1,200 Virginia adults, and have several hundred variables in our data set. Some variables are categorical and some are interval-level. We have set to work using two-step cluster to segment the data, and our initial work yielded a 7-cluster solution that seemed to make sense in relation to our theories. (Autocluster gave us only two clusters, so we asked for more and walked up to seven clusters before things got weird.) But then, we found that any minor change in the list of basis variables, or in the way we specified these variables, leads the two-step cluster procedure to yield a very different clustering of the cases. (We detected this by simpling cross-tabbing the cluster ID's from one solution with those from the next.) We have tried to manipulate the 'outlier handling' function in the program, but this has not led to a more stable solution under various fairly similar specifications. I have been wondering if the data set may include cases that cluster tightly and others that are not easily classified and have an undue effect on the cluster outcome? If so, I'd like such cases to be excluded. Is there a way to change the solution specifications so that more cases will be seen as outliers by the program? Could that lead to a more stable result? Again, just setting the 'outliers' subcommand to a positive value doesn't seem to hold out many cases (about 30 out of 1200). I also encountered a note somewhere on the internet that suggested the procedure is sensitive to the order in which the cases are read. Is that true? Should I manipulate the case sort-order and could that be helpful in getting a more stable result? Any ideas would be most welcome. Thanks in advance, Tom Guterbock

Thomas M. Guterbock Voice: (434)243-5223 Director CSR Main Number: (434)243-5222 Center for Survey Research FAX: (434)243-5233 University of Virginia EXPRESS DELIVERY: 2400 Old Ivy Road P. O. Box 400767 Suite 223 Charlottesville, VA 22904-4767 Charlottesville, VA 22903


Back to: Top of message | Previous page | Main SPSSX-L page