Date: Mon, 12 Mar 2001 06:54:06 GMT
Reply-To: Ya Huang <huanga@WORLDNET.ATT.NET>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Ya Huang <huanga@WORLDNET.ATT.NET>
Organization: AT&T Worldnet
Subject: Re: Grouping within the dataset
Rahul,
I am sorry, but I am having trouble understand your sample,
you said var1=1 and var1=2 are in the same group because
their var2 is common, but how? Indeed, I found them in common
not in var2 but in var3, they both has a var3=10002, same role
apply to your second group, because they are in common at
var3=10003.
If I understand your question correctly, you want to grouping
by any var (var2, va3 ..) that has a common value. If this is what
you want, then here is my solution:
data xx;
input var1 var2 var3;
cards;
1 101 10001
1 102 10001
1 102 10002
2 103 10002
2 104 10002
3 105 10003
3 105 10004
4 106 10005
4 107 10003
;
data xx;
set xx;
array v var2-var3;
do over v;
commkey=v;
output;
end;
drop var2-var3;
proc sql;
create table xx as
select distinct a.*
from xx a, xx b
where a.commkey=b.commkey and a.var1^=b.var1
order by commkey, var1
;
data xx;
set xx;
by commkey var1;
retain group;
if first.commkey then group+1;
proc print;
run;
-----------------
Obs var1 commkey group
1 1 10002 1
2 2 10002 1
3 3 10003 2
4 4 10003 2
The last data step is really unnecessary, since
commkey already can serve as a group var.
Let me know if I understand correctly.
Ya Huang
Rahul Chahal wrote in message
<200103111505.f2BF5Fq146694@listserv.cc.uga.edu>...
>I am trying to solve a programming question using SAS.
>
>The data I have looks like:
>var1 var2 var3
>1 101 10001
>1 102 10001
>1 102 10002
>2 103 10002
>2 104 10002
>3 105 10003
>3 105 10004
>4 106 10005
>4 107 10003
>
>I am trying to find all relationships between records
>so that I can create a new variable which gives the
>same number to all records with some association hence
>the results look like -
>var1 group
>1 1
>2 1
>3 2
>4 2
>
>Of course group=1 for var1=1 and var1=2 since var2 is
>common ..... group=2 for var1=3 since it is not
>related to var1=1&2 but related to var1=4 through var3.
>The program I have has to deal with 150,000 records and 10
>variables and it take about 10 days to run. I am
>using combination of freq and merges in a "do loop" for each
>of the unique var1's however I am looking for a more
>efficient way to solve this. Would appreciate your help.
>
>Thanks.
>
>-Rahul Chahal