LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (December 2004, week 4)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Mon, 27 Dec 2004 13:03:28 -0800
Reply-To:     cassell.david@EPAMAIL.EPA.GOV
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         "David L. Cassell" <cassell.david@EPAMAIL.EPA.GOV>
Subject:      Re: Classifying observations to an already obtained cluster
              solution
In-Reply-To:  <7d2875ae04122708053b8f9bc@mail.gmail.com>
Content-type: text/plain; charset=US-ASCII

Nomi <sajeelm@GMAIL.COM> wrote: > I'm trying to classify a set of observations to an already available > cluster solution. I have the cluster means and their standard > deviations. What would the best way of classifying the new set be? > > I have the scores for each observations on all the variables that went > into the original cluster schema.

If your already-generated clusters are guaranteed to be spherical, which can be a really major (and often unwarranted) assumption, then you can simply:

take the coordinates of the point to be identified, compute the Euclidean distance to each of the cluster means, scale each distance to cluster mean by its standard deviation, and pick the minimum scaled distance.

This won't work all the time, as soon as you have any deviation from the above assumptions. Even moving to ellipsoids instead of spheres will cause difficulties, since you're not properly accounting for the volume of the ellipsoids with a single mean and a single 'standard deviation'. And if you have more complex clustering algorithms, then the above method can be downright misleading.

Do you have anything besides just means and stds? Do you know what method of clustering was used? Do you know whether the method used was appropriate for the given data?

David -- David Cassell, CSC Cassell.David@epa.gov Senior computing specialist mathematical statistician


Back to: Top of message | Previous page | Main SAS-L page