LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (March 2002)Back to main SPSSX-L pageJoin or leave SPSSX-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Thu, 14 Mar 2002 13:37:58 -0500
Reply-To:     "Powhatan J. Wooldridge, Ph.D." <pjw@ACSU.BUFFALO.EDU>
Sender:       "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From:         "Powhatan J. Wooldridge, Ph.D." <pjw@ACSU.BUFFALO.EDU>
Subject:      Re: analysis for best cluster solution
Comments: To: Hector Maletta <hmaletta@fibertel.com.ar>
In-Reply-To:  <3C8F5EA9.251689DC@fibertel.com.ar>
Content-Type: TEXT/PLAIN; charset=US-ASCII

On Wed, 13 Mar 2002, Hector Maletta wrote:

(SNIP)

> E.g. sometimes one looks for broad clusters, more or less homogeneous > but not extremely homogeneous, more or less different from each other > but not extremely different, for purposes of constructing a broad > typology. But if you're searching for clusters of units with unusual > characteristics, this might not be good.

(SNIP)

> I have one or two books on cluster analysis but strangely enough this > matters are seldom treated.

To use a similar and perhaps even more fundamental and common example, inventories of casues (such as sources of social support) are often summed to arrive at a single scale value. There is really no a priori theoretical reason to expect that the presence of one source of support would be positively correlated with another source (having relatives close by, having relatives far away, and going to church regularly are three of the items in one such scale, for example); but reviewers of journal articles often ask for the "psychometric properties" of the scale. I usually argue that assessments of internal consistency are inappropriate under such circumstances; but I do sometimes have a nagging doubt that I may be missing something. Am I? Do you know of any books on psychometric properties that explicitly address handling scales of this type? (Predictive validity for the subject's perceived overall level of social support comes to mind; perhaps in conjunction with canonical analysis.)

-- Pow

PS: I have no objection in your replying to the net, with my query attached, if you believe the matter to be of sufficient general interest. I didn't post directly to the net because I wasn't sure that I wasn't either a) digressing too far from the original topic, or b) asking a question to which the answer was obvious (and/or widely known). I know that you much prefer to have your correspondence be disseminated as widely as possible, however, so do whatever in your judgement seems most appropriate.

The issue isn't just hypothetical, by the way. I was in the process of composing an answer to a reviewer's request for the psychometric properties of Bell's social support scale when I saw your posting. *************************************************************************** Powhatan J. Wooldridge, Assoc. Professor, Nursing, State Univ. NY at Buffalo

On Wed, 13 Mar 2002, Hector Maletta wrote:

> Thanks, Craig. I'll have a look at it. I have noticed in the past that > (as I remarked in my response on this matter) the purpose of clustering > varies and so does the criterion for acceptance of a particular > solution. It may well be that one measure such as the cophenetic coeff. > is appropriate for some purposes and not for others. > E.g. sometimes one looks for broad clusters, more or less homogeneous > but not extremely homogeneous, more or less different from each other > but not extremely different, for purposes of constructing a broad > typology. But if you're searching for clusters of units with unusual > characteristics, this might not be good. > Suppose for instance your units are counties or > communities, and your subject is ethnicity. You have a large list of > ethnic communities, with one variable each (% of population) and other > variables reflecting social behavior in the cvommunity level (crime, > public libraries, average education, income and the like). Suppose > you're looking for small clusters where one or two ethnic communities > are concentrated (groups of communities with high proportions of Swedes > AND Turkish but low proportions of Hispano and Jewish people). Once you > did so, you end up with a number of such small highly homogeneous > clusters plus one or more large not-so-homogeneous clusters of > not-so-special communities. For this kind of purpose perhaps other > criteria should be used. > > I have one or two books on cluster analysis but strangely enough this > matters are seldom treated. > > Hector Maletta > Universidad del Salvador > Buenos Aires, Argentina > > Hector > > Craig Kolb wrote: > > > > Hi, > > > > This comes from the NCSS help file. Unfortunately I don't have further > > references. > > > > "Given the large number of techniques, it is often difficult to decide which > > is best. One criterion that has become popular is to use the result that has > > largest cophenetic correlation coefficient. This is the correlation between > > the original distances and those that result from the cluster configuration. > > Values above 0.75 are felt to be good. The Group Average method appears to > > produce high values of this statistic. This may be one reason that it is so > > popular. > > A second measure of goodness of fit called delta is described in Mather > > (1976). These statistics measure degree of distortion rather than degree of > > resemblance (as with the cophenetic correlation). The two delta coefficients > > are given by > > > > where A is either 0.5 or 1 and is the distance obtained from the cluster > > configuration. Values close to zero are desirable. > > Mather (1976) suggests that the Group Average method is the safest to use as > > an exploratory method, although he goes on to suggest that several methods > > should be tried and the one with the largest cophenetic correlation be > > selected for further investigation." > > > > Craig Kolb > > Research Analyst - Enterprise Solutions > > Tel: +27 11 803 6412 > > Fax: +27 11 803 7840 > > __________________ > > BMI-T / IDC Africa > > Your knowledge partner in IT, telecoms and the Internet. > > > > Visit BMI-T online at http://www.bmi-t.co.za > > Confidentiality Warning > > ======================= > > The contents of this e-mail and any accompanying documentation > > are confidential and any use thereof, in what ever form, by > > anyone other than the addressee is strictly prohibited. > > > > ----- Original Message ----- > > From: "Hector Maletta" <hmaletta@fibertel.com.ar> > > To: "Craig Kolb" <craig@bmi-t.co.za> > > Sent: Wednesday, March 13, 2002 2:40 PM > > Subject: Re: analysis for best cluster solution > > > > > I never used it, nor am familiar with it. Sorry. Please explain and > > > suggest sources. > > > > > > Hector Maletta > > > Universidad del Salvador > > > Buenos Aires, Argentina > > > > > > > > > > > > Craig Kolb wrote: > > > > > > > > Hi, > > > > > > > > Hector, what do you think of the "cophenetic correlation coefficient"? I > > use > > > > that a lot as a quantitative measure of cluster heterogeneity. > > > > > > > > Regards, > > > > Craig Kolb > > > > Research Analyst - Enterprise Solutions > > > > Tel: +27 11 803 6412 > > > > Fax: +27 11 803 7840 > > > > __________________ > > > > BMI-T / IDC Africa > > > > Your knowledge partner in IT, telecoms and the Internet. > > > > > > > > Visit BMI-T online at http://www.bmi-t.co.za > > > > Confidentiality Warning > > > > ======================= > > > > The contents of this e-mail and any accompanying documentation > > > > are confidential and any use thereof, in what ever form, by > > > > anyone other than the addressee is strictly prohibited. > > > > > > > > ----- Original Message ----- > > > > From: "Hector Maletta" <hmaletta@fibertel.com.ar> > > > > Newsgroups: bit.listserv.spssx-l > > > > To: <SPSSX-L@LISTSERV.UGA.EDU> > > > > Sent: Tuesday, March 12, 2002 8:33 PM > > > > Subject: Re: analysis for best cluster solution > > > > > > > > > Svetlana: > > > > > If your seven providers do provide 100 services, and you have these > > 100 > > > > > services characterized by a measure of proximity, you have the > > essential > > > > > elements for a cluster analysis, but your cases would not be seven: > > > > > you'll want to think you have 100 cases. > > > > > If each of the seven participants sorted ALL the services (not only > > > > > those each provider provides) by similarity, you'd have seven sortings > > > > > of the same services, and this constitutes seven variables that could > > be > > > > > used to group the services by their alleged similarity. On the other > > > > > hand, if each participant ranked only the small number of services it > > > > > provided, each service would be evaluated for similarity with other > > > > > services coming from the same provider, but not for similarity with > > the > > > > > services offered by other providers; if this is the case, you don't > > have > > > > > a measure of similarity of all the services (i.e. the similarity on > > > > > every service to every other service), and you cannot define a > > sensible > > > > > way of clustering them unless you have other variables (of an interval > > > > > nature) that can be used to characterize them. > > > > > > > > > > In the first alternative, you have a matrix of 100x100, each cell with > > > > > seven measurements of perceived similarity which may be reduced to an > > > > > average measure of perceived similarity between each pair of services. > > > > > Also, the SPSS procedure PROXIMITIES could be used to generate the > > > > > matrix of proximities. > > > > > > > > > > The CLUSTER command with the MATRIX IN subcommand would read a matrix > > of > > > > > proximities as input, instead of the raw data. > > > > > > > > > > Now to your CLUSTER output confusion. You're right, the output is > > > > > horrendous. In particular, the graphs are made of ASCII characters and > > > > > (for more than a few rows and columns) produce a jumble of symbols and > > a > > > > > stream of icomplete pages, certainly unreadable by any but the most > > > > > trained of cluster buffs. > > > > > > > > > > With N cases CLUSTER produces N solutions, from the initial one in > > which > > > > > each single case is a cluster of one, to a final one in which the > > entire > > > > > sample is one giant cluster of N. Of course, only intermediate results > > > > > count. As I explained before there is no general indicator to rank the > > > > > various intermediate solutions from best to worse. Moreover, even when > > > > > you consider one particular solution (e.g. the grouping of cases into > > > > > six clusters) you're not sure the grouping achieved is the "best" way > > of > > > > > grouping the cases into six groups, since the solution is influenced > > by > > > > > factors such as the measure of similarity chosen, the number of > > > > > iterations, the criteria for terminating the iteration, etc. > > > > > An evaluation of the solutions should surely be external. By this I > > mean > > > > > applying certain statistical criteria in order to judge whether one > > > > > solution is superior to others. For instance, if your idea is > > producing > > > > > "homogeneous" groups of cases in terms of a certain criterion variable > > > > > Y, you may apply ANOVA with the clustering variable as a factor and Y > > as > > > > > the dependent variable, and choose the number of clusters that yields > > > > > the maximum F. But this is not a good idea: the best grouping would be > > > > > no grouping at all, for within-group variance is certainly minimized > > if > > > > > each group is composed of only one case. To use ANOVA you should aim > > at > > > > > a satisfying balance, trying to get high internal homogeneity, maximum > > > > > separation between clusters, and a reasonably low number of clusters. > > > > > > > > > > On the other hand, if you're interested only in "abnormal" clusters, > > you > > > > > should concentrate on them only, choosing the solution that produces > > > > > more homogeneity and maximum separation between the small "abnormal" > > > > > clusters, leaving aside the large "ordinary" clusters. > > > > > > > > > > I'm afraid formalizing these criteria into quantitative indicators > > that > > > > > could be applied everywhere would be impossible. > > > > > > > > > > Hector Maletta > > > > > Universidad del Salvador > > > > > Buenos Aires, Argentina > > > > > > > > > > > > > > > "Yampolskaya, Svetlana" wrote: > > > > > > > > > > > > Hector: > > > > > > > > > > > > I will try to clarify my question and a little bit of history of it. > > > > Yes, we > > > > > > have seven participants (they are cases) who are Mental Health > > service > > > > > > providers. We did "Concept mapping" (Trochim, 1989) with them. I > > don't > > > > know > > > > > > if you are familiar with this program but it is a combination of mds > > > > (though > > > > > > it cannot give you more than two-dimensional solution) and > > hierarchical > > > > > > cluster analysis. > > > > > > > > > > > > According to the concept mapping procedure, the participants > > produced > > > > 100 > > > > > > (actually 103 but we had to do a data reduction because SPSS cannot > > > > process > > > > > > more than 100 for mds analysis) statements. Each statement > > represents a > > > > > > service these participants (or mental health service providers) > > provide. > > > > > > > > > > > > We wanted to see a) what services they provide; b) the structure of > > > > these > > > > > > services; c) how similar (the participants sorted the statements > > based > > > > on > > > > > > perceived similarity) the perceived certain services; finally, d) > > what > > > > > > clusters or groups these services can form. > > > > > > > > > > > > After we finished this concept mapping procedure I wanted to do > > these > > > > > > analyses in SPSS. MDS was quite successful in terms of we got very > > > > similar > > > > > > picture (at least in two-dimensional space) but the cluster analysis > > > > output > > > > > > got me puzzled. Then I thought that if mds has a stress measure, > > maybe > > > > > > something like that can be calculated or done to figure out "best" > > > > cluster > > > > > > solution based on similarity of the services. > > > > > > > > > > > > I deeply appreciate all your time and your help. > > > > > > > > > > > > Svetlana > >


Back to: Top of message | Previous page | Main SPSSX-L page