Date: Tue, 20 Oct 2009 17:37:18 -0500
Reply-To: Joe Matise <snoopy369@GMAIL.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Joe Matise <snoopy369@GMAIL.COM>
Subject: Re: how to do this kind of tricky selection?
In-Reply-To: <200910202229.n9KIvSZC018735@malibu.cc.uga.edu>
Content-Type: text/plain; charset=ISO-8859-1
It limits the selection to patientID groups with 2 rows. Not sure which
part of it you are asking about, so I'll answer both halves:
Count(1) could be count(anything), (1) is simplest (often you see count(*),
not sure if in SAS this is any different, but in Oracle it seems to be
faster to do count(1) then count(*).)
Having ( ) is like where( ) but applied after group by instead of before the
group by, so give or take identical to
select * from (my query minus having) where count(1) > 1;
-Joe
On Tue, Oct 20, 2009 at 5:29 PM, Arthur Tabachneck <art297@netscape.net>wrote:
> Joe,
>
> Your solution definitely appears to provide the result the OP requested.
> Please explain what the 'having count(1) > 1' does.
>
> Thanks,
> Art
> -------
> On Tue, 20 Oct 2009 16:49:41 -0500, Joe Matise <snoopy369@GMAIL.COM>
> wrote:
>
> >Quick possible answer below. It may not be terribly efficient because of
> >the join of patient to itself with NE as the operator in the where clause,
> >but it doesn't complain about being a cartesian join, so who knows. It
> >selects A and D, as specified.
> >
> >data patient;
> >input patientID $ disease_type $ dx_count dx_date_latest mmddyy8.;
> >datalines;
> >A flu 2 12/27/75
> >A cancer 3 12/30/76
> >A copd 3 12/30/76
> >B adhd 5 01/06/88
> >C infec 2 05/22/91
> >C dep 3 04/18/92
> >C lek 3 04/18/92
> >C eye 4 02/16/93
> >D dep 3 05/01/05
> >D flu 3 05/01/05
> >E adhd 2 06/12/92
> >E copd 2 09/12/92
> >F eye 3 11/18/05
> >F cancer 4 01/18/02
> >;
> >run;
> >
> >proc sql;
> > select patientID from (
> > select P.patientID from patient P, patient R
> > where P.patientID=R.patientID
> > and P.disease_type ne R.disease_Type
> > and P.dx_count=R.dx_count
> > and P.dx_date_latest=R.dx_date_latest
> > ) group by patientid
> > having count(1) > 1;
> >quit;
> >
> >
> >-Joe
> >
> >On Tue, Oct 20, 2009 at 4:32 PM, Jerry <i89rt5@gmail.com> wrote:
> >
> >> (Or not so tricky for you). Anyway, here is my question:
> >>
> >> I have a data (shown below) with 4 variables:
> >>
> >> patientID: ID for each patient
> >> disease_type: the type of disease s/he has
> >> dx_count: how many times s/he has been diagnosed with this type of
> diseae
> >> dx_date_latest: the latest diagnosis date for this type of disease
> >>
> >> The data is already sorted by patientID, dx_count, and dx_date_latest
> >>
> >> data patient;
> >> input patientID $ disease_type $ dx_count dx_date_latest mmddyy8.;
> >> datalines;
> >> A flu 2 12/27/75
> >> A cancer 3 12/30/76
> >> A copd 3 12/30/76
> >> B adhd 5 01/06/88
> >> C infec 2 05/22/91
> >> C dep 3 04/18/92
> >> C lek 3 04/18/92
> >> C eye 4 02/16/93
> >> D dep 3 05/01/05
> >> D flu 3 05/01/05
> >> E adhd 2 06/12/92
> >> E copd 2 09/12/92
> >> F eye 3 11/18/05
> >> F cancer 4 01/18/02
> >> ;
> >> run;
> >>
> >>
> >> Purpose: I want to select the kind of patients if they meet both
> criteria
> >> below
> >> 1, have more than one disease type
> >> 2, if TWO OR MORE disease types s/he suffers have the SAME HIGHTEST
> >> diagnosis counts and SAME LATEST diagnosis date
> >>
> >> It would be easier to understand my criteria if you could spare a few
> >> minutes to go thru the explanations I have for each patient, RE: whether
> or
> >> not s/he should be selected
> >>
> >> So based the criteria above,
> >>
> >> Patient A will be selected, because s/he has cancer and copd, both have
> the
> >> SAME HIGHTEST diagnosis counts (3) and SAME LATEST diagnosis date
> >> (12/30/76)
> >>
> >> Patient B won't be selected, because s/he has one type of disease only
> >>
> >> Patient C won't be selected, because the number of disease type with
> >> HIGHEST
> >> diagnosis counts (4) is just one, which is eye disease, even though
> s/he
> >> has dep and lek both diagnosed 3 times and on the same latest diagnosis
> >> date
> >> (04/18/92)
> >>
> >> Patient D will be selected, because s/he has dep and flu, both have the
> >> SAME HIGHTEST diagnosis counts (3) and SAME LATEST diagnosis date
> >> (05/01/05)
> >>
> >> Patient E won't be selected, because even though s/he has adhd and copd,
> >> and
> >> both have the SAME HIGHTEST diagnosis counts (2), but these 2 types of
> >> disease have DIFFERENT latest diagnosis date
> >>
> >> Patient F won't be selected, because s/he just have ONE disease type
> (eye
> >> disease) with HIGHEST diagnosis counts (4)
> >>
> >> Could anyone please show me how to achieve my purpose stated above? Your
> >> help and time is much appreciated.
> >>
> >> Let me know if my explantion is not clear.
> >>
> >> Thanks,
> >> Jerry
> >>
>
|