LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous (more recent) messageNext (less recent) messagePrevious (more recent) in topicNext (less recent) in topicPrevious (more recent) by same authorNext (less recent) by same authorPrevious page (October 2009, week 3)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Tue, 20 Oct 2009 16:53:26 -0700
Reply-To:     "Huang, Ya" <Ya.Huang@AMYLIN.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         "Huang, Ya" <Ya.Huang@AMYLIN.COM>
Subject:      Re: how to do this kind of tricky selection?
Comments: To: Jerry <i89rt5@GMAIL.COM>
In-Reply-To:  <200910202309.n9KGQnQ3018758@malibu.cc.uga.edu>
Content-Type: text/plain; charset="us-ascii"

Ok, Try this:

proc sql; select * from patientnew where patientid in (select distinct patientid from (select *, count(distinct disease_type) as ndt from patientnew group by patientid,dx_count,dx_date_latest) group by patientid having count(distinct disease_type) >= 2 and dx_count=max(dx_count) and ndt >=2) ;

-----Original Message----- From: Jerry [mailto:i89rt5@GMAIL.COM] Sent: Tuesday, October 20, 2009 4:10 PM To: SAS-L@LISTSERV.UGA.EDU; Huang, Ya Cc: Jerry Subject: Re: how to do this kind of tricky selection?

Ya,

I really appreciate your quick and easy-to-understand SQL solution, however, it does not address the "SAME HIGHTEST diagnosis counts" part of my criteria, which I'm quoting here again

Purpose: I want to select the kind of patients if they meet both criteria below 1, have more than one disease type 2, if TWO OR MORE disease types s/he suffers have the SAME HIGHEST diagnosis counts and SAME LATEST diagnosis date

Yes, your code did select out Patient A and D with my provided data. But, if I add 2 more records to Patient C, such as

C hiv 4 05/10/92 C flu 4 05/10/92

and change this record "C eye 4 02/16/93" to "C eye 5 02/16/93" (as you see, just change the count for eye disease from 4 to 5 for patient C)

So the new modified data would be

/* modified input data (only difference with previous input data is patient C got 2 more records above, and the count for patient C's eye disease changed from 4 to 5), please change the input data name in your code to "patientNEW" */

data patientNEW; input patientID $ disease_type $ dx_count dx_date_latest mmddyy8.; datalines; A flu 2 12/27/75 A cancer 3 12/30/76 A copd 3 12/30/76 B adhd 5 01/06/88 C infec 2 05/22/91 C dep 3 04/18/92 C lek 3 04/18/92 C hiv 4 05/10/92 C flu 4 05/10/92 C eye 5 02/16/93 D dep 3 05/01/05 D flu 3 05/01/05 E adhd 2 06/12/92 E copd 2 09/12/92 F eye 3 11/18/05 F cancer 4 01/18/02 ; run;

Your code will select out patient C, in addition to patient A and D.

But patient C is NOT eligible based on my selection criteria, because s/he just have ONE disease type (eye disease) with HIGHEST diagnosis counts (5)

Any possible fix to your solution? Your help and time is really appreciated.

Jerry

On Tue, 20 Oct 2009 18:01:13 -0400, Ya Huang <ya.huang@AMYLIN.COM> wrote:

>Here is one solution: > >proc sql; >select * >from (select *, count(distinct disease_type) as ndt > from patient > group by patientid,dx_count,dx_date_latest) >group by patientid >having count(distinct disease_type) >= 2 and max(ndt) >=2 ; > > dx_date_ >patientID disease_type dx_count latest ndt >--------------------------------------------------------- >A flu 2 5839 1 >A copd 3 6208 2 >A cancer 3 6208 2 >D flu 3 14731 2 >D dep 3 14731 2 > > >On Tue, 20 Oct 2009 17:32:54 -0400, Jerry <i89rt5@GMAIL.COM> wrote: > >>(Or not so tricky for you). Anyway, here is my question: >> >>I have a data (shown below) with 4 variables: >> >>patientID: ID for each patient >>disease_type: the type of disease s/he has >>dx_count: how many times s/he has been diagnosed with this type of >>diseae >>dx_date_latest: the latest diagnosis date for this type of disease >> >>The data is already sorted by patientID, dx_count, and dx_date_latest >> >>data patient; >>input patientID $ disease_type $ dx_count dx_date_latest mmddyy8.; >>datalines; A flu 2 12/27/75 A cancer 3 12/30/76 A copd 3 12/30/76 B >>adhd 5 01/06/88 C infec 2 05/22/91 >>C dep 3 04/18/92 >>C lek 3 04/18/92 >>C eye 4 02/16/93 >>D dep 3 05/01/05 >>D flu 3 05/01/05 >>E adhd 2 06/12/92 >>E copd 2 09/12/92 >>F eye 3 11/18/05 >>F cancer 4 01/18/02 >>; >>run; >> >> >>Purpose: I want to select the kind of patients if they meet both >>criteria >below >>1, have more than one disease type >>2, if TWO OR MORE disease types s/he suffers have the SAME HIGHTEST >>diagnosis counts and SAME LATEST diagnosis date >> >>It would be easier to understand my criteria if you could spare a few >>minutes to go thru the explanations I have for each patient, RE: >>whether or not s/he should be selected >> >>So based the criteria above, >> >>Patient A will be selected, because s/he has cancer and copd, both >>have the SAME HIGHTEST diagnosis counts (3) and SAME LATEST diagnosis

>>date >(12/30/76) >> >>Patient B won't be selected, because s/he has one type of disease only >> >>Patient C won't be selected, because the number of disease type with >HIGHEST >> diagnosis counts (4) is just one, which is eye disease, even though >>s/he has dep and lek both diagnosed 3 times and on the same latest >>diagnosis >date >>(04/18/92) >> >>Patient D will be selected, because s/he has dep and flu, both have >>the SAME HIGHTEST diagnosis counts (3) and SAME LATEST diagnosis date >(05/01/05) >> >>Patient E won't be selected, because even though s/he has adhd and >>copd, >and >>both have the SAME HIGHTEST diagnosis counts (2), but these 2 types >>of disease have DIFFERENT latest diagnosis date >> >>Patient F won't be selected, because s/he just have ONE disease type >>(eye >>disease) with HIGHEST diagnosis counts (4) >> >>Could anyone please show me how to achieve my purpose stated above? >>Your help and time is much appreciated. >> >>Let me know if my explantion is not clear. >> >>Thanks, >>Jerry


Back to: Top of message | Previous page | Main SAS-L page