Date: Wed, 31 Mar 2004 23:03:17 -0500
Reply-To: "Chang Y. Chung" <chang_y_chung@HOTMAIL.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Chang Y. Chung" <chang_y_chung@HOTMAIL.COM>
Subject: Re: How to clustering the data variables instead of data points
On Wed, 31 Mar 2004 20:31:52 -0600, Fred
<fzh113@HECKY.IT.NORTHWESTERN.EDU> wrote:
>Dear SAS users,
>
>Do you know if there are some specific functions under SAS
>to do clustering?
>In the general case, given a set of sample data, we just use
>clustering algorithm to classify these data sample into some groups.
>
>Now my problems is different from the above.
>
>To be specific, suppose I have a d-dimensional vector x = [x1,x2, ...,
xd]',
>and wish to clustering these d variables of x into some finite groups
>using a user-defined distance measure a.
>The distance measure a was defined to measure the similarity between
>any two data variables xi and xj (1<= i, j, <= d).
hi, Fred,
Sas/stat has proc cluster, which accepts a dataset of type=distance as its
input. You can readily make a such dataset in many different ways. Let me
try a data step. Here is a simple example. HTH.
Cheers,
Chang
data xes;
x1=0.73902;
x2=0.27248;
x3=0.70953;
x4=0.31916;
x5=0.36785;
x6=0.10449;
run;
data ds(type=distance keep=d1-d6 i);
set xes;
array x[1:6] x1-x6;
array d[1:6] d1-d6;
do i = 1 to 6;
do j = 1 to i;
/* assumes that the distance measure d is
simply the absolute difference */
d[j] = abs(x[i] - x[j]);
end;
output;
end;
run;
proc print data=ds;
format d1-d6 6.4;
run;
/* on lst
Obs d1 d2 d3 d4 d5 d6 i
1 0.0000 . . . . . 1
2 0.4665 0.0000 . . . . 2
3 0.0295 0.4371 0.0000 . . . 3
4 0.4199 0.0467 0.3904 0.0000 . . 4
5 0.3712 0.0954 0.3417 0.0487 0.0000 . 5
6 0.6345 0.1680 0.6050 0.2147 0.2634 0.0000 6
*/
/* average link -- this part from sas/stat online doc */
proc cluster data=ds method=average pseudo;
id i;
run;
proc tree horizontal spaces=2;
id i;
run;
|