LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous (more recent) messageNext (less recent) messagePrevious (more recent) in topicNext (less recent) in topicPrevious (more recent) by same authorNext (less recent) by same authorPrevious page (October 2009, week 3)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Tue, 20 Oct 2009 21:46:38 -0500
Reply-To:     Joe Matise <snoopy369@GMAIL.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Joe Matise <snoopy369@GMAIL.COM>
Subject:      Re: how to do this kind of tricky selection?
Comments: To: Jerry <i89rt5@gmail.com>
In-Reply-To:  <200910202309.n9KGQnQ3018758@malibu.cc.uga.edu>
Content-Type: text/plain; charset=ISO-8859-1

Here's my version with the new data: data patientNEW; input patientID $ disease_type $ dx_count dx_date_latest mmddyy8.; datalines; A flu 2 12/27/75 A cancer 3 12/30/76 A copd 3 12/30/76 B adhd 5 01/06/88 C infec 2 05/22/91 C dep 3 04/18/92 C lek 3 04/18/92 C hiv 4 05/10/92 C flu 4 05/10/92 C eye 5 02/16/93 D dep 3 05/01/05 D flu 3 05/01/05 E adhd 2 06/12/92 E copd 2 09/12/92 F eye 3 11/18/05 F cancer 4 01/18/02 ; run;

proc sql; select P.patientID from patientNEW P, (select patientID, max(dx_date_latest) as dx_date_latest from patientNEW group by patientID) D where P.patientID=D.patientID and P.dx_date_latest =D.dx_date_latest group by P.patientID having count(1) > 1; quit;

Slightly different from the one I posted earlier, instead of merging the dataset to itself I only merge it to the max(date) portion of that which will be somewhat faster.

-Joe

On Tue, Oct 20, 2009 at 6:09 PM, Jerry <i89rt5@gmail.com> wrote:

> Ya, > > I really appreciate your quick and easy-to-understand SQL solution, > however, > it does not address the "SAME HIGHTEST diagnosis counts" part of my > criteria, which I'm quoting here again > > Purpose: I want to select the kind of patients if they meet both criteria > below > 1, have more than one disease type > 2, if TWO OR MORE disease types s/he suffers have the SAME HIGHEST > diagnosis > counts and SAME LATEST diagnosis date > > > Yes, your code did select out Patient A and D with my provided data. But, > if > I add 2 more records to Patient C, such as > > C hiv 4 05/10/92 > C flu 4 05/10/92 > > and change this record "C eye 4 02/16/93" to "C eye 5 02/16/93" > (as you see, just change the count for eye disease from 4 to 5 for patient > C) > > So the new modified data would be > > /* > modified input data (only difference with previous input data is patient C > got 2 more records above, and the count for patient C's eye disease changed > from 4 to 5), please change the input data name in your code to > "patientNEW" > */ > > data patientNEW; > input patientID $ disease_type $ dx_count dx_date_latest mmddyy8.; > datalines; > A flu 2 12/27/75 > A cancer 3 12/30/76 > A copd 3 12/30/76 > B adhd 5 01/06/88 > C infec 2 05/22/91 > C dep 3 04/18/92 > C lek 3 04/18/92 > C hiv 4 05/10/92 > C flu 4 05/10/92 > C eye 5 02/16/93 > D dep 3 05/01/05 > D flu 3 05/01/05 > E adhd 2 06/12/92 > E copd 2 09/12/92 > F eye 3 11/18/05 > F cancer 4 01/18/02 > ; > run; > > Your code will select out patient C, in addition to patient A and D. > > But patient C is NOT eligible based on my selection criteria, because s/he > just have ONE disease type (eye disease) with HIGHEST diagnosis counts (5) > > Any possible fix to your solution? Your help and time is really > appreciated. > > Jerry > > > > > On Tue, 20 Oct 2009 18:01:13 -0400, Ya Huang <ya.huang@AMYLIN.COM> wrote: > > >Here is one solution: > > > >proc sql; > >select * > >from (select *, count(distinct disease_type) as ndt > > from patient > > group by patientid,dx_count,dx_date_latest) > >group by patientid > >having count(distinct disease_type) >= 2 and max(ndt) >=2 > >; > > > > dx_date_ > >patientID disease_type dx_count latest ndt > >--------------------------------------------------------- > >A flu 2 5839 1 > >A copd 3 6208 2 > >A cancer 3 6208 2 > >D flu 3 14731 2 > >D dep 3 14731 2 > > > > > >On Tue, 20 Oct 2009 17:32:54 -0400, Jerry <i89rt5@GMAIL.COM> wrote: > > > >>(Or not so tricky for you). Anyway, here is my question: > >> > >>I have a data (shown below) with 4 variables: > >> > >>patientID: ID for each patient > >>disease_type: the type of disease s/he has > >>dx_count: how many times s/he has been diagnosed with this type of diseae > >>dx_date_latest: the latest diagnosis date for this type of disease > >> > >>The data is already sorted by patientID, dx_count, and dx_date_latest > >> > >>data patient; > >>input patientID $ disease_type $ dx_count dx_date_latest mmddyy8.; > >>datalines; > >>A flu 2 12/27/75 > >>A cancer 3 12/30/76 > >>A copd 3 12/30/76 > >>B adhd 5 01/06/88 > >>C infec 2 05/22/91 > >>C dep 3 04/18/92 > >>C lek 3 04/18/92 > >>C eye 4 02/16/93 > >>D dep 3 05/01/05 > >>D flu 3 05/01/05 > >>E adhd 2 06/12/92 > >>E copd 2 09/12/92 > >>F eye 3 11/18/05 > >>F cancer 4 01/18/02 > >>; > >>run; > >> > >> > >>Purpose: I want to select the kind of patients if they meet both criteria > >below > >>1, have more than one disease type > >>2, if TWO OR MORE disease types s/he suffers have the SAME HIGHTEST > >>diagnosis counts and SAME LATEST diagnosis date > >> > >>It would be easier to understand my criteria if you could spare a few > >>minutes to go thru the explanations I have for each patient, RE: whether > or > >>not s/he should be selected > >> > >>So based the criteria above, > >> > >>Patient A will be selected, because s/he has cancer and copd, both have > the > >>SAME HIGHTEST diagnosis counts (3) and SAME LATEST diagnosis date > >(12/30/76) > >> > >>Patient B won't be selected, because s/he has one type of disease only > >> > >>Patient C won't be selected, because the number of disease type with > >HIGHEST > >> diagnosis counts (4) is just one, which is eye disease, even though s/he > >>has dep and lek both diagnosed 3 times and on the same latest diagnosis > >date > >>(04/18/92) > >> > >>Patient D will be selected, because s/he has dep and flu, both have the > >>SAME HIGHTEST diagnosis counts (3) and SAME LATEST diagnosis date > >(05/01/05) > >> > >>Patient E won't be selected, because even though s/he has adhd and copd, > >and > >>both have the SAME HIGHTEST diagnosis counts (2), but these 2 types of > >>disease have DIFFERENT latest diagnosis date > >> > >>Patient F won't be selected, because s/he just have ONE disease type (eye > >>disease) with HIGHEST diagnosis counts (4) > >> > >>Could anyone please show me how to achieve my purpose stated above? Your > >>help and time is much appreciated. > >> > >>Let me know if my explantion is not clear. > >> > >>Thanks, > >>Jerry >


Back to: Top of message | Previous page | Main SAS-L page