LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (July 2003, week 1)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Wed, 2 Jul 2003 10:47:19 -0700
Reply-To:     Bin <bztt@MSN.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Bin <bztt@MSN.COM>
Organization: http://groups.google.com/
Subject:      clustering with known class
Content-Type: text/plain; charset=ISO-8859-1

Hi, all,

I hope this is the right list. if not, I am sorry to bother you.

My task is to cluster a set of protein sequence(for example, 3000 sequences, I can get the distance matrix of them). some sequence(1000 sequences) are the known class(protein fold). others(2000 sequences) are the unknown class. it does not belong to supervised classification. there are a lot of new class among the sequences.

My question is how can i take the advantage of known class to get the number of class?

I want to do systematical trial to cluster for all of 3000 sequences(try 200 class, 300,...), and calculate the purity or entropy of the known class, so I can choose the clustering result which has reasonable purity/entropy.

I would appreciate it if you can give me some suggestion.

Bin


Back to: Top of message | Previous page | Main SAS-L page