```Date: Tue, 20 Oct 2009 22:15:02 -0400 Reply-To: Muthia Kachirayan Sender: "SAS(r) Discussion" From: Muthia Kachirayan Subject: Re: how to do this kind of tricky selection? In-Reply-To: <200910202309.n9KGQnQ3018758@malibu.cc.uga.edu> Content-Type: text/plain; charset=ISO-8859-1 Jerry, I missed your requirements. The data step solution is longer than SQL solutions you have received. Now it takes 2 steps. First is done to find the maximum of dx_count within patientID. The new data set and the 2 steps follow. data patient; input patientID \$ disease_type \$ dx_count dx_date_latest mmddyy8.; datalines; A flu 2 12/27/75 A cancer 3 12/30/76 A copd 3 12/30/76 B adhd 5 01/06/88 C infec 2 05/22/91 C dep 3 04/18/92 C lek 3 04/18/92 C eye 4 02/16/93 D dep 3 05/01/05 D flu 3 05/01/05 E adhd 2 06/12/92 E copd 2 09/12/92 F eye 3 11/18/05 F cancer 4 01/18/02 ; run; data step; do until(last.patientID); set patient; by patientID; max_count = max(max_count, dx_count); end; do until(last.patientID); set patient; by patientID; output; end; run; data need; do until(last.patientID); set step; by patientID; if first.patientID then do; prev_disease = disease_type; prev_date = dx_date_latest; end; else do; ind = ( (dx_count = max_count) and (dx_date_latest = prev_date) and (disease_type ne prev_disease) ); if ind then output; prev_disease = disease_type; prev_date = dx_date_latest; end; end; drop prev_date ind max_count; run; proc print data = need; format dx_date_latest mmddyy8.; run; Kind regards, Muthia Kachirayan On Tue, Oct 20, 2009 at 7:09 PM, Jerry wrote: > Ya, > > I really appreciate your quick and easy-to-understand SQL solution, > however, > it does not address the "SAME HIGHTEST diagnosis counts" part of my > criteria, which I'm quoting here again > > Purpose: I want to select the kind of patients if they meet both criteria > below > 1, have more than one disease type > 2, if TWO OR MORE disease types s/he suffers have the SAME HIGHEST > diagnosis > counts and SAME LATEST diagnosis date > > > Yes, your code did select out Patient A and D with my provided data. But, > if > I add 2 more records to Patient C, such as > > C hiv 4 05/10/92 > C flu 4 05/10/92 > > and change this record "C eye 4 02/16/93" to "C eye 5 02/16/93" > (as you see, just change the count for eye disease from 4 to 5 for patient > C) > > So the new modified data would be > > /* > modified input data (only difference with previous input data is patient C > got 2 more records above, and the count for patient C's eye disease changed > from 4 to 5), please change the input data name in your code to > "patientNEW" > */ > > data patientNEW; > input patientID \$ disease_type \$ dx_count dx_date_latest mmddyy8.; > datalines; > A flu 2 12/27/75 > A cancer 3 12/30/76 > A copd 3 12/30/76 > B adhd 5 01/06/88 > C infec 2 05/22/91 > C dep 3 04/18/92 > C lek 3 04/18/92 > C hiv 4 05/10/92 > C flu 4 05/10/92 > C eye 5 02/16/93 > D dep 3 05/01/05 > D flu 3 05/01/05 > E adhd 2 06/12/92 > E copd 2 09/12/92 > F eye 3 11/18/05 > F cancer 4 01/18/02 > ; > run; > > Your code will select out patient C, in addition to patient A and D. > > But patient C is NOT eligible based on my selection criteria, because s/he > just have ONE disease type (eye disease) with HIGHEST diagnosis counts (5) > > Any possible fix to your solution? Your help and time is really > appreciated. > > Jerry > > > > > On Tue, 20 Oct 2009 18:01:13 -0400, Ya Huang wrote: > > >Here is one solution: > > > >proc sql; > >select * > >from (select *, count(distinct disease_type) as ndt > > from patient > > group by patientid,dx_count,dx_date_latest) > >group by patientid > >having count(distinct disease_type) >= 2 and max(ndt) >=2 > >; > > > > dx_date_ > >patientID disease_type dx_count latest ndt > >--------------------------------------------------------- > >A flu 2 5839 1 > >A copd 3 6208 2 > >A cancer 3 6208 2 > >D flu 3 14731 2 > >D dep 3 14731 2 > > > > > >On Tue, 20 Oct 2009 17:32:54 -0400, Jerry wrote: > > > >>(Or not so tricky for you). Anyway, here is my question: > >> > >>I have a data (shown below) with 4 variables: > >> > >>patientID: ID for each patient > >>disease_type: the type of disease s/he has > >>dx_count: how many times s/he has been diagnosed with this type of diseae > >>dx_date_latest: the latest diagnosis date for this type of disease > >> > >>The data is already sorted by patientID, dx_count, and dx_date_latest > >> > >>data patient; > >>input patientID \$ disease_type \$ dx_count dx_date_latest mmddyy8.; > >>datalines; > >>A flu 2 12/27/75 > >>A cancer 3 12/30/76 > >>A copd 3 12/30/76 > >>B adhd 5 01/06/88 > >>C infec 2 05/22/91 > >>C dep 3 04/18/92 > >>C lek 3 04/18/92 > >>C eye 4 02/16/93 > >>D dep 3 05/01/05 > >>D flu 3 05/01/05 > >>E adhd 2 06/12/92 > >>E copd 2 09/12/92 > >>F eye 3 11/18/05 > >>F cancer 4 01/18/02 > >>; > >>run; > >> > >> > >>Purpose: I want to select the kind of patients if they meet both criteria > >below > >>1, have more than one disease type > >>2, if TWO OR MORE disease types s/he suffers have the SAME HIGHTEST > >>diagnosis counts and SAME LATEST diagnosis date > >> > >>It would be easier to understand my criteria if you could spare a few > >>minutes to go thru the explanations I have for each patient, RE: whether > or > >>not s/he should be selected > >> > >>So based the criteria above, > >> > >>Patient A will be selected, because s/he has cancer and copd, both have > the > >>SAME HIGHTEST diagnosis counts (3) and SAME LATEST diagnosis date > >(12/30/76) > >> > >>Patient B won't be selected, because s/he has one type of disease only > >> > >>Patient C won't be selected, because the number of disease type with > >HIGHEST > >> diagnosis counts (4) is just one, which is eye disease, even though s/he > >>has dep and lek both diagnosed 3 times and on the same latest diagnosis > >date > >>(04/18/92) > >> > >>Patient D will be selected, because s/he has dep and flu, both have the > >>SAME HIGHTEST diagnosis counts (3) and SAME LATEST diagnosis date > >(05/01/05) > >> > >>Patient E won't be selected, because even though s/he has adhd and copd, > >and > >>both have the SAME HIGHTEST diagnosis counts (2), but these 2 types of > >>disease have DIFFERENT latest diagnosis date > >> > >>Patient F won't be selected, because s/he just have ONE disease type (eye > >>disease) with HIGHEST diagnosis counts (4) > >> > >>Could anyone please show me how to achieve my purpose stated above? Your > >>help and time is much appreciated. > >> > >>Let me know if my explantion is not clear. > >> > >>Thanks, > >>Jerry > ```

Back to: Top of message | Previous page | Main SAS-L page