Date: Wed, 6 Aug 2003 16:51:48 -0400
Reply-To: "Zhang, Jianying" <Jianying.Zhang@UMASSMED.EDU>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Zhang, Jianying" <Jianying.Zhang@UMASSMED.EDU>
Subject: Re: how to select the population
Content-Type: text/plain; charset="us-ascii"
Hi, Ya:
Thank you for the quick response.
I guess I gave a very bad sample data set. Actually, the diagnosis codes
are numbers or combination of numbers and letter with 3 to 5 bytes.
Likes '311' ,'39064', 'E36781'.
-----Original Message-----
From: Huang, Ya [mailto:yhuang@amylin.com]
Sent: Wednesday, August 06, 2003 4:32 PM
To: Zhang, Jianying; SAS-L@LISTSERV.UGA.EDU
Subject: RE: how to select the population
Here is one:
data one;
input id dx1 $2. dx2 $2. dx3 $2. dx4 $2. dx5 $2.;
cards;
1 A H D L T
1 N F L K .
1 K B . . .
1 M F H O S
1 R . . . .
2 C K F . .
2 G K C . .
2 N T L K .
3 E F G H .
3 I K P M .
4 A K B . M
4 C L D . .
;
proc sql;
select *
from one
group by id
having count(distinct
compress(dx1||dx2||dx3||dx4||dx5,'EFGHIJKLMNOPQRSTUVWXYZ ')) >=2
;
--------------
id dx1 dx2 dx3 dx4 dx5
------------------------------
1 K B
1 N F L K
1 M F H O S
1 R
1 A H D L T
4 C L D
4 A K B M
Assuming that each diag code takes one of the 26 capital letters,
therefore, concatenate dx1-dx5 and compress out 'EFGH....'
will leave a string with the combination of A,B,C,D, then a simple
search by id for unique combination of these four letter
will be sufficient to subset the population, if the number of
unique combination is greater than one.
Kind regards,
Ya Huang
-----Original Message-----
From: Zhang, Jianying [mailto:Jianying.Zhang@UMASSMED.EDU]
Sent: Wednesday, August 06, 2003 1:08 PM
To: SAS-L@LISTSERV.UGA.EDU
Subject: how to select the population
Dear SAS-L:
I have a data set looks like the following (ID diagnosis1, 2, 3, 4, 5):
data one;
input id dx1 $ dx2 $ dx3 $ dx4 $ dx5 $;
cards;
1 A H D L T
1 N F L K
1 K B
1 M F H O S G
1 R
2 C K F
2 G K C
2 N T L K
;
RUN;
I am only interested in the people who had diagnosis A or B or C or D
and at least two different diagnoses at two different records.
From the above data set, I expect to select ID 1 who had two records,
one with A and other one with B but not id 2. ID2 had two records but
both with C.
Could you please help me for the coding?
Thanks in advance.
Jianying zhang
|