| Date: | Tue, 7 Jan 2003 20:41:08 -0500 |
| Reply-To: | Sigurd Hermansen <HERMANS1@WESTAT.COM> |
| Sender: | "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU> |
| From: | Sigurd Hermansen <HERMANS1@WESTAT.COM> |
| Subject: | Re: Q: Identifying repeated patterns in different groups |
|
| Content-Type: | text/plain; charset="iso-8859-1" |
|---|
For what it's worth, this version of a pattern numbering program
successfully uses the MIN() summary function and keeps the pattern numbers
in the right place. For some reason the undocumented MONOTONIC() function
interferes with the SQL MIN summary function. Just a word of warning ....
data test;
input group code @5 codetext $Char10.;
cards;
1 1 Yes
1 2 No
2 1 No
2 2 Yes
3 1 Yes
3 2 No
4 1 Yeah
4 2 Naw
5 1 Yes
5 2 No
5 3 Maybe
;
run;
proc sort data=test;
by group code;
run;
proc transpose data=test (keep=group codetext) out=tesTrans (drop=_name_);
by group;
var codetext;
run;
data tesTrans;
set tesTrans;
pattern=_N_;
run;
proc sql;
create table patterns as
select t1.group,t2.pattern,t1.col1,t1.col2,t1.col3
from tesTrans as t1 left join (select col1,col2,col3,pattern
from tesTrans
group by col1,col2,col3
having pattern=min(pattern)
) as t2
on t1.col1=t2.col1
and t1.col2=t2.col2
and t1.col3=t2.col3
order by group,pattern
;
quit;
Sig
-----Original Message-----
From: Richard A. DeVenezia [mailto:radevenz@IX.NETCOM.COM]
Sent: Monday, January 06, 2003 2:43 PM
To: SAS-L@LISTSERV.UGA.EDU
Subject: Re: Q: Identifying repeated patterns in different groups
"Charles Patridge" <Charles_S_Patridge@PRODIGY.NET> wrote in message
news:200301061729.h06HTcg16180@listserv.cc.uga.edu...
> So Richard,
>
> Given the REAL "S", how would you want to have these groups "matched"
when
> string lengths can be different?
>
> Do not want to provide a solution until I know on basis to consider a
> pattern to be a match.
>
> Regards,
> Charles Patridge
Let me try again...
Presume a table has columns
group code codetext
where each group has multiple codes and corresponding codetexts.
The 'in-order' (as read from table) pattern of a group would be the
sequence of code||codetext down the group.
I want to add a new column codeSeqPatternId that maps a groups pattern to a
identifying number.
e.g.
group code codetext
1 1 Yes
1 2 No
2 1 No
2 2 Yes
3 1 Yes
3 2 No
4 1 Yeah
4 2 Naw
5 1 Yes
5 2 No
5 3 Maybe
This data would have 4 group patterns (and thus be assigned
codeSeqPatternId values 1,2,3,4 respectively)
patternid=1 -> 1-Yes,2-No (group 1 and 3)
patternid=2 -> 1-No,2-Yes (group 2)
patternid=3 -> 1-Yeah,2-Naw (group 4)
patternid=4 -> 1-Yes,2-No,3-Maybe (group 5)
Later on I would want to also 'reduce' the number of patterns by combining
shorter ones with longer ones they preface.
i.e. patternseq4 =: patternseq1
--
Richard A. DeVenezia
|