Date: Tue, 20 Oct 2009 16:53:26 -0700
Reply-To: "Huang, Ya" <Ya.Huang@AMYLIN.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Huang, Ya" <Ya.Huang@AMYLIN.COM>
Subject: Re: how to do this kind of tricky selection?
In-Reply-To: <200910202309.n9KGQnQ3018758@malibu.cc.uga.edu>
Content-Type: text/plain; charset="us-ascii"
Ok, Try this:
proc sql;
select *
from patientnew
where patientid in (select distinct patientid
from (select *, count(distinct disease_type) as ndt
from patientnew
group by patientid,dx_count,dx_date_latest)
group by patientid
having count(distinct disease_type) >= 2 and
dx_count=max(dx_count) and ndt >=2)
;
-----Original Message-----
From: Jerry [mailto:i89rt5@GMAIL.COM]
Sent: Tuesday, October 20, 2009 4:10 PM
To: SAS-L@LISTSERV.UGA.EDU; Huang, Ya
Cc: Jerry
Subject: Re: how to do this kind of tricky selection?
Ya,
I really appreciate your quick and easy-to-understand SQL solution,
however, it does not address the "SAME HIGHTEST diagnosis counts" part
of my criteria, which I'm quoting here again
Purpose: I want to select the kind of patients if they meet both
criteria below 1, have more than one disease type 2, if TWO OR MORE
disease types s/he suffers have the SAME HIGHEST diagnosis counts and
SAME LATEST diagnosis date
Yes, your code did select out Patient A and D with my provided data.
But, if I add 2 more records to Patient C, such as
C hiv 4 05/10/92
C flu 4 05/10/92
and change this record "C eye 4 02/16/93" to "C eye 5 02/16/93"
(as you see, just change the count for eye disease from 4 to 5 for
patient C)
So the new modified data would be
/*
modified input data (only difference with previous input data is patient
C got 2 more records above, and the count for patient C's eye disease
changed from 4 to 5), please change the input data name in your code to
"patientNEW"
*/
data patientNEW;
input patientID $ disease_type $ dx_count dx_date_latest mmddyy8.;
datalines; A flu 2 12/27/75 A cancer 3 12/30/76 A copd 3 12/30/76 B adhd
5 01/06/88 C infec 2 05/22/91
C dep 3 04/18/92
C lek 3 04/18/92
C hiv 4 05/10/92
C flu 4 05/10/92
C eye 5 02/16/93
D dep 3 05/01/05
D flu 3 05/01/05
E adhd 2 06/12/92
E copd 2 09/12/92
F eye 3 11/18/05
F cancer 4 01/18/02
;
run;
Your code will select out patient C, in addition to patient A and D.
But patient C is NOT eligible based on my selection criteria, because
s/he just have ONE disease type (eye disease) with HIGHEST diagnosis
counts (5)
Any possible fix to your solution? Your help and time is really
appreciated.
Jerry
On Tue, 20 Oct 2009 18:01:13 -0400, Ya Huang <ya.huang@AMYLIN.COM>
wrote:
>Here is one solution:
>
>proc sql;
>select *
>from (select *, count(distinct disease_type) as ndt
> from patient
> group by patientid,dx_count,dx_date_latest)
>group by patientid
>having count(distinct disease_type) >= 2 and max(ndt) >=2 ;
>
> dx_date_
>patientID disease_type dx_count latest ndt
>---------------------------------------------------------
>A flu 2 5839 1
>A copd 3 6208 2
>A cancer 3 6208 2
>D flu 3 14731 2
>D dep 3 14731 2
>
>
>On Tue, 20 Oct 2009 17:32:54 -0400, Jerry <i89rt5@GMAIL.COM> wrote:
>
>>(Or not so tricky for you). Anyway, here is my question:
>>
>>I have a data (shown below) with 4 variables:
>>
>>patientID: ID for each patient
>>disease_type: the type of disease s/he has
>>dx_count: how many times s/he has been diagnosed with this type of
>>diseae
>>dx_date_latest: the latest diagnosis date for this type of disease
>>
>>The data is already sorted by patientID, dx_count, and dx_date_latest
>>
>>data patient;
>>input patientID $ disease_type $ dx_count dx_date_latest mmddyy8.;
>>datalines; A flu 2 12/27/75 A cancer 3 12/30/76 A copd 3 12/30/76 B
>>adhd 5 01/06/88 C infec 2 05/22/91
>>C dep 3 04/18/92
>>C lek 3 04/18/92
>>C eye 4 02/16/93
>>D dep 3 05/01/05
>>D flu 3 05/01/05
>>E adhd 2 06/12/92
>>E copd 2 09/12/92
>>F eye 3 11/18/05
>>F cancer 4 01/18/02
>>;
>>run;
>>
>>
>>Purpose: I want to select the kind of patients if they meet both
>>criteria
>below
>>1, have more than one disease type
>>2, if TWO OR MORE disease types s/he suffers have the SAME HIGHTEST
>>diagnosis counts and SAME LATEST diagnosis date
>>
>>It would be easier to understand my criteria if you could spare a few
>>minutes to go thru the explanations I have for each patient, RE:
>>whether or not s/he should be selected
>>
>>So based the criteria above,
>>
>>Patient A will be selected, because s/he has cancer and copd, both
>>have the SAME HIGHTEST diagnosis counts (3) and SAME LATEST diagnosis
>>date
>(12/30/76)
>>
>>Patient B won't be selected, because s/he has one type of disease only
>>
>>Patient C won't be selected, because the number of disease type with
>HIGHEST
>> diagnosis counts (4) is just one, which is eye disease, even though
>>s/he has dep and lek both diagnosed 3 times and on the same latest
>>diagnosis
>date
>>(04/18/92)
>>
>>Patient D will be selected, because s/he has dep and flu, both have
>>the SAME HIGHTEST diagnosis counts (3) and SAME LATEST diagnosis date
>(05/01/05)
>>
>>Patient E won't be selected, because even though s/he has adhd and
>>copd,
>and
>>both have the SAME HIGHTEST diagnosis counts (2), but these 2 types
>>of disease have DIFFERENT latest diagnosis date
>>
>>Patient F won't be selected, because s/he just have ONE disease type
>>(eye
>>disease) with HIGHEST diagnosis counts (4)
>>
>>Could anyone please show me how to achieve my purpose stated above?
>>Your help and time is much appreciated.
>>
>>Let me know if my explantion is not clear.
>>
>>Thanks,
>>Jerry
|