```Date: Wed, 31 Mar 2004 23:03:17 -0500 Reply-To: "Chang Y. Chung" Sender: "SAS(r) Discussion" From: "Chang Y. Chung" Subject: Re: How to clustering the data variables instead of data points Comments: To: fzh113@HECKY.IT.NORTHWESTERN.EDU On Wed, 31 Mar 2004 20:31:52 -0600, Fred wrote: >Dear SAS users, > >Do you know if there are some specific functions under SAS >to do clustering? >In the general case, given a set of sample data, we just use >clustering algorithm to classify these data sample into some groups. > >Now my problems is different from the above. > >To be specific, suppose I have a d-dimensional vector x = [x1,x2, ..., xd]', >and wish to clustering these d variables of x into some finite groups >using a user-defined distance measure a. >The distance measure a was defined to measure the similarity between >any two data variables xi and xj (1<= i, j, <= d). hi, Fred, Sas/stat has proc cluster, which accepts a dataset of type=distance as its input. You can readily make a such dataset in many different ways. Let me try a data step. Here is a simple example. HTH. Cheers, Chang data xes; x1=0.73902; x2=0.27248; x3=0.70953; x4=0.31916; x5=0.36785; x6=0.10449; run; data ds(type=distance keep=d1-d6 i); set xes; array x[1:6] x1-x6; array d[1:6] d1-d6; do i = 1 to 6; do j = 1 to i; /* assumes that the distance measure d is simply the absolute difference */ d[j] = abs(x[i] - x[j]); end; output; end; run; proc print data=ds; format d1-d6 6.4; run; /* on lst Obs d1 d2 d3 d4 d5 d6 i 1 0.0000 . . . . . 1 2 0.4665 0.0000 . . . . 2 3 0.0295 0.4371 0.0000 . . . 3 4 0.4199 0.0467 0.3904 0.0000 . . 4 5 0.3712 0.0954 0.3417 0.0487 0.0000 . 5 6 0.6345 0.1680 0.6050 0.2147 0.2634 0.0000 6 */ /* average link -- this part from sas/stat online doc */ proc cluster data=ds method=average pseudo; id i; run; proc tree horizontal spaces=2; id i; run; ```

Back to: Top of message | Previous page | Main SAS-L page