Date: Tue, 20 Oct 2009 21:46:38 -0500
Reply-To: Joe Matise <snoopy369@GMAIL.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Joe Matise <snoopy369@GMAIL.COM>
Subject: Re: how to do this kind of tricky selection?
In-Reply-To: <200910202309.n9KGQnQ3018758@malibu.cc.uga.edu>
Content-Type: text/plain; charset=ISO-8859-1
Here's my version with the new data:
data patientNEW;
input patientID $ disease_type $ dx_count dx_date_latest mmddyy8.;
datalines;
A flu 2 12/27/75
A cancer 3 12/30/76
A copd 3 12/30/76
B adhd 5 01/06/88
C infec 2 05/22/91
C dep 3 04/18/92
C lek 3 04/18/92
C hiv 4 05/10/92
C flu 4 05/10/92
C eye 5 02/16/93
D dep 3 05/01/05
D flu 3 05/01/05
E adhd 2 06/12/92
E copd 2 09/12/92
F eye 3 11/18/05
F cancer 4 01/18/02
;
run;
proc sql;
select P.patientID from patientNEW P, (select patientID, max(dx_date_latest)
as dx_date_latest from patientNEW group by patientID) D
where P.patientID=D.patientID
and P.dx_date_latest =D.dx_date_latest
group by P.patientID
having count(1) > 1;
quit;
Slightly different from the one I posted earlier, instead of merging the
dataset to itself I only merge it to the max(date) portion of that which
will be somewhat faster.
-Joe
On Tue, Oct 20, 2009 at 6:09 PM, Jerry <i89rt5@gmail.com> wrote:
> Ya,
>
> I really appreciate your quick and easy-to-understand SQL solution,
> however,
> it does not address the "SAME HIGHTEST diagnosis counts" part of my
> criteria, which I'm quoting here again
>
> Purpose: I want to select the kind of patients if they meet both criteria
> below
> 1, have more than one disease type
> 2, if TWO OR MORE disease types s/he suffers have the SAME HIGHEST
> diagnosis
> counts and SAME LATEST diagnosis date
>
>
> Yes, your code did select out Patient A and D with my provided data. But,
> if
> I add 2 more records to Patient C, such as
>
> C hiv 4 05/10/92
> C flu 4 05/10/92
>
> and change this record "C eye 4 02/16/93" to "C eye 5 02/16/93"
> (as you see, just change the count for eye disease from 4 to 5 for patient
> C)
>
> So the new modified data would be
>
> /*
> modified input data (only difference with previous input data is patient C
> got 2 more records above, and the count for patient C's eye disease changed
> from 4 to 5), please change the input data name in your code to
> "patientNEW"
> */
>
> data patientNEW;
> input patientID $ disease_type $ dx_count dx_date_latest mmddyy8.;
> datalines;
> A flu 2 12/27/75
> A cancer 3 12/30/76
> A copd 3 12/30/76
> B adhd 5 01/06/88
> C infec 2 05/22/91
> C dep 3 04/18/92
> C lek 3 04/18/92
> C hiv 4 05/10/92
> C flu 4 05/10/92
> C eye 5 02/16/93
> D dep 3 05/01/05
> D flu 3 05/01/05
> E adhd 2 06/12/92
> E copd 2 09/12/92
> F eye 3 11/18/05
> F cancer 4 01/18/02
> ;
> run;
>
> Your code will select out patient C, in addition to patient A and D.
>
> But patient C is NOT eligible based on my selection criteria, because s/he
> just have ONE disease type (eye disease) with HIGHEST diagnosis counts (5)
>
> Any possible fix to your solution? Your help and time is really
> appreciated.
>
> Jerry
>
>
>
>
> On Tue, 20 Oct 2009 18:01:13 -0400, Ya Huang <ya.huang@AMYLIN.COM> wrote:
>
> >Here is one solution:
> >
> >proc sql;
> >select *
> >from (select *, count(distinct disease_type) as ndt
> > from patient
> > group by patientid,dx_count,dx_date_latest)
> >group by patientid
> >having count(distinct disease_type) >= 2 and max(ndt) >=2
> >;
> >
> > dx_date_
> >patientID disease_type dx_count latest ndt
> >---------------------------------------------------------
> >A flu 2 5839 1
> >A copd 3 6208 2
> >A cancer 3 6208 2
> >D flu 3 14731 2
> >D dep 3 14731 2
> >
> >
> >On Tue, 20 Oct 2009 17:32:54 -0400, Jerry <i89rt5@GMAIL.COM> wrote:
> >
> >>(Or not so tricky for you). Anyway, here is my question:
> >>
> >>I have a data (shown below) with 4 variables:
> >>
> >>patientID: ID for each patient
> >>disease_type: the type of disease s/he has
> >>dx_count: how many times s/he has been diagnosed with this type of diseae
> >>dx_date_latest: the latest diagnosis date for this type of disease
> >>
> >>The data is already sorted by patientID, dx_count, and dx_date_latest
> >>
> >>data patient;
> >>input patientID $ disease_type $ dx_count dx_date_latest mmddyy8.;
> >>datalines;
> >>A flu 2 12/27/75
> >>A cancer 3 12/30/76
> >>A copd 3 12/30/76
> >>B adhd 5 01/06/88
> >>C infec 2 05/22/91
> >>C dep 3 04/18/92
> >>C lek 3 04/18/92
> >>C eye 4 02/16/93
> >>D dep 3 05/01/05
> >>D flu 3 05/01/05
> >>E adhd 2 06/12/92
> >>E copd 2 09/12/92
> >>F eye 3 11/18/05
> >>F cancer 4 01/18/02
> >>;
> >>run;
> >>
> >>
> >>Purpose: I want to select the kind of patients if they meet both criteria
> >below
> >>1, have more than one disease type
> >>2, if TWO OR MORE disease types s/he suffers have the SAME HIGHTEST
> >>diagnosis counts and SAME LATEST diagnosis date
> >>
> >>It would be easier to understand my criteria if you could spare a few
> >>minutes to go thru the explanations I have for each patient, RE: whether
> or
> >>not s/he should be selected
> >>
> >>So based the criteria above,
> >>
> >>Patient A will be selected, because s/he has cancer and copd, both have
> the
> >>SAME HIGHTEST diagnosis counts (3) and SAME LATEST diagnosis date
> >(12/30/76)
> >>
> >>Patient B won't be selected, because s/he has one type of disease only
> >>
> >>Patient C won't be selected, because the number of disease type with
> >HIGHEST
> >> diagnosis counts (4) is just one, which is eye disease, even though s/he
> >>has dep and lek both diagnosed 3 times and on the same latest diagnosis
> >date
> >>(04/18/92)
> >>
> >>Patient D will be selected, because s/he has dep and flu, both have the
> >>SAME HIGHTEST diagnosis counts (3) and SAME LATEST diagnosis date
> >(05/01/05)
> >>
> >>Patient E won't be selected, because even though s/he has adhd and copd,
> >and
> >>both have the SAME HIGHTEST diagnosis counts (2), but these 2 types of
> >>disease have DIFFERENT latest diagnosis date
> >>
> >>Patient F won't be selected, because s/he just have ONE disease type (eye
> >>disease) with HIGHEST diagnosis counts (4)
> >>
> >>Could anyone please show me how to achieve my purpose stated above? Your
> >>help and time is much appreciated.
> >>
> >>Let me know if my explantion is not clear.
> >>
> >>Thanks,
> >>Jerry
>
|