LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous (more recent) messageNext (less recent) messagePrevious (more recent) in topicNext (less recent) in topicPrevious (more recent) by same authorNext (less recent) by same authorPrevious page (October 2009, week 3)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Tue, 20 Oct 2009 17:37:18 -0500
Reply-To:     Joe Matise <snoopy369@GMAIL.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Joe Matise <snoopy369@GMAIL.COM>
Subject:      Re: how to do this kind of tricky selection?
Comments: To: Arthur Tabachneck <art297@netscape.net>
In-Reply-To:  <200910202229.n9KIvSZC018735@malibu.cc.uga.edu>
Content-Type: text/plain; charset=ISO-8859-1

It limits the selection to patientID groups with 2 rows. Not sure which part of it you are asking about, so I'll answer both halves:

Count(1) could be count(anything), (1) is simplest (often you see count(*), not sure if in SAS this is any different, but in Oracle it seems to be faster to do count(1) then count(*).)

Having ( ) is like where( ) but applied after group by instead of before the group by, so give or take identical to

select * from (my query minus having) where count(1) > 1;

-Joe

On Tue, Oct 20, 2009 at 5:29 PM, Arthur Tabachneck <art297@netscape.net>wrote:

> Joe, > > Your solution definitely appears to provide the result the OP requested. > Please explain what the 'having count(1) > 1' does. > > Thanks, > Art > ------- > On Tue, 20 Oct 2009 16:49:41 -0500, Joe Matise <snoopy369@GMAIL.COM> > wrote: > > >Quick possible answer below. It may not be terribly efficient because of > >the join of patient to itself with NE as the operator in the where clause, > >but it doesn't complain about being a cartesian join, so who knows. It > >selects A and D, as specified. > > > >data patient; > >input patientID $ disease_type $ dx_count dx_date_latest mmddyy8.; > >datalines; > >A flu 2 12/27/75 > >A cancer 3 12/30/76 > >A copd 3 12/30/76 > >B adhd 5 01/06/88 > >C infec 2 05/22/91 > >C dep 3 04/18/92 > >C lek 3 04/18/92 > >C eye 4 02/16/93 > >D dep 3 05/01/05 > >D flu 3 05/01/05 > >E adhd 2 06/12/92 > >E copd 2 09/12/92 > >F eye 3 11/18/05 > >F cancer 4 01/18/02 > >; > >run; > > > >proc sql; > > select patientID from ( > > select P.patientID from patient P, patient R > > where P.patientID=R.patientID > > and P.disease_type ne R.disease_Type > > and P.dx_count=R.dx_count > > and P.dx_date_latest=R.dx_date_latest > > ) group by patientid > > having count(1) > 1; > >quit; > > > > > >-Joe > > > >On Tue, Oct 20, 2009 at 4:32 PM, Jerry <i89rt5@gmail.com> wrote: > > > >> (Or not so tricky for you). Anyway, here is my question: > >> > >> I have a data (shown below) with 4 variables: > >> > >> patientID: ID for each patient > >> disease_type: the type of disease s/he has > >> dx_count: how many times s/he has been diagnosed with this type of > diseae > >> dx_date_latest: the latest diagnosis date for this type of disease > >> > >> The data is already sorted by patientID, dx_count, and dx_date_latest > >> > >> data patient; > >> input patientID $ disease_type $ dx_count dx_date_latest mmddyy8.; > >> datalines; > >> A flu 2 12/27/75 > >> A cancer 3 12/30/76 > >> A copd 3 12/30/76 > >> B adhd 5 01/06/88 > >> C infec 2 05/22/91 > >> C dep 3 04/18/92 > >> C lek 3 04/18/92 > >> C eye 4 02/16/93 > >> D dep 3 05/01/05 > >> D flu 3 05/01/05 > >> E adhd 2 06/12/92 > >> E copd 2 09/12/92 > >> F eye 3 11/18/05 > >> F cancer 4 01/18/02 > >> ; > >> run; > >> > >> > >> Purpose: I want to select the kind of patients if they meet both > criteria > >> below > >> 1, have more than one disease type > >> 2, if TWO OR MORE disease types s/he suffers have the SAME HIGHTEST > >> diagnosis counts and SAME LATEST diagnosis date > >> > >> It would be easier to understand my criteria if you could spare a few > >> minutes to go thru the explanations I have for each patient, RE: whether > or > >> not s/he should be selected > >> > >> So based the criteria above, > >> > >> Patient A will be selected, because s/he has cancer and copd, both have > the > >> SAME HIGHTEST diagnosis counts (3) and SAME LATEST diagnosis date > >> (12/30/76) > >> > >> Patient B won't be selected, because s/he has one type of disease only > >> > >> Patient C won't be selected, because the number of disease type with > >> HIGHEST > >> diagnosis counts (4) is just one, which is eye disease, even though > s/he > >> has dep and lek both diagnosed 3 times and on the same latest diagnosis > >> date > >> (04/18/92) > >> > >> Patient D will be selected, because s/he has dep and flu, both have the > >> SAME HIGHTEST diagnosis counts (3) and SAME LATEST diagnosis date > >> (05/01/05) > >> > >> Patient E won't be selected, because even though s/he has adhd and copd, > >> and > >> both have the SAME HIGHTEST diagnosis counts (2), but these 2 types of > >> disease have DIFFERENT latest diagnosis date > >> > >> Patient F won't be selected, because s/he just have ONE disease type > (eye > >> disease) with HIGHEST diagnosis counts (4) > >> > >> Could anyone please show me how to achieve my purpose stated above? Your > >> help and time is much appreciated. > >> > >> Let me know if my explantion is not clear. > >> > >> Thanks, > >> Jerry > >> >


Back to: Top of message | Previous page | Main SAS-L page