Date: Fri, 21 Sep 2007 08:54:46 -0500
Reply-To: "Swank, Paul R" <Paul.R.Swank@UTH.TMC.EDU>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Swank, Paul R" <Paul.R.Swank@UTH.TMC.EDU>
Subject: Re: Cluster analysis help needed
In-Reply-To: <385670.72446.qm@web33314.mail.mud.yahoo.com>
Content-Type: text/plain; charset="us-ascii"
Once you have clusters for the 20000, find the cluster centroids and
input these into fastclus for the total data set, specifying the number
of clusters found originally. You will probably get some "drift" in the
cluster centroids in the larger data set but if the original 20000 is
fairly representative of the whole sample then they should be pretty
close.
Paul R. Swank, Ph.D. Professor
Director of Reseach
Children's Learning Institute
University of Texas Health Science Center-Houston
-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of
cherub
Sent: Friday, September 21, 2007 8:40 AM
To: SAS-L@LISTSERV.UGA.EDU
Subject: Cluster analysis help needed
Any help will be highly appreciated!
Now I am running a cluster analysis for a large dataset (more than
400,000 obs), I ramdonly slected 20000 obs to do cluster analysis, and
want to use the results of the result of this ramdom sample to be
guidence for the the rest obs.
However, how to use the result of the random sample to score the rest
and give the cluste for the rest obs?
Thanks very much.
---------------------------------
Pinpoint customers who are looking for what you sell.