| Date: | Sun, 10 Feb 2008 08:41:22 -0800 |
| Reply-To: | gopilth@YAHOO.COM |
| Sender: | "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU> |
| From: | gopilth@YAHOO.COM |
| Organization: | http://groups.google.com |
| Subject: | Re: Selecting pts with only cancer diagnosis |
|
| Content-Type: | text/plain; charset=ISO-8859-1 |
On Feb 8, 4:57 pm, HERMA...@WESTAT.COM (Sigurd Hermansen) wrote:
> Gopi:
> While you can look across rows of a dataset using procedural Data step
> programs with retains, flags, and first.x loops, what you have really
> amounts to a existential quantifier in predicate logic. SAS SQL has a
> solution built into its syntax.
>
> data test;
> input pt: 2. diacod: $char6. ;
> cards;
> 11 200
> 11 120
> 11 188.00
> 20 220
> 20 120.1
> 20 120.9
> 31 120
> 22 217.9
> 22 170.70
> 22 542.9
> 25 120.6
> 25 120.9
> 25 120
> ;
> run;
> proc sql;
> create table solution as
> select distinct pt from test as t1
> where substr(left(diacod),1,3) IN ('120')
> and
> NOT EXISTS (select 1 from test as t2 where t1.pt=t2.pt and diacod IN
> ('188.00','188.10','188.20','188.30','188.40','188.50','188.60','188.70'
> ,
>
> '188.80','188.90','170.00','170.10','170.20','170.30','170.40','170.50',
> '170.60','170.70','170.80',
>
> '170.90','191.00','191.10','191.20','191.30','191.40','191.50','191.60',
> '191.70','191.80','191.90',
>
> '174.00','174.10','174.20','174.30','174.40','174.50','174.60','174.70')
> )
> ;
> quit;
>
> Note that the Boolean logic WHERE condition looks at each individual
> tuple (row) in the dataset test. The NOT EXISTS operator, in contrast,
> looks across the entire dataset for tuples that match on pt and contain
> one of the infectious disease codes.
>
> The shorter version of the WHERE condition in the program not only saves
> typing, it may capture more of the variations in ICD codes (say,
> '120.'). I'd do the same with the list of infectious disease codes. Too
> specific criteria for selection may create as many problems as too loose
> criteria.
>
> A "reflexive join" could implement an existential quantifier in a more
> efficient way. I'd reserve that extra effort for datasets with more than
> 10,000 tuples.
>
> For extra credit, a programmer would look ahead to similar programming
> problems and build a solution that operates on lists of ICD codes in
> datasets. Both the IN operator accepts (SELECT <attribute> from
> <dataset> ...) as a list of attribute values in a column of a dataset.
> S
>
> I've changed the input data to include a wider range of data....
> S
>
>
>
> -----Original Message-----
> From: owner-sa...@listserv.uga.edu [mailto:owner-sa...@listserv.uga.edu]
>
> On Behalf Ofgopi...@yahoo.com
> Sent: Friday, February 08, 2008 2:13 PM
> To: sa...@uga.edu
> Subject: Re: Selecting pts with only cancer diagnosis
>
> On Jan 24, 10:26 am, mlhow...@avalon.net (Mary) wrote:
> > Yes, really you've got to come up with a more specific list of
> > diagnosis = to exclude than just any other diagnosis- it would seem
> > that you would = want to tailor what these specific diagnosis are- you
>
> > may want to keep a = patient with HIV and depression, for instance,
> > but throw out the patient = that has both HIV and influenza.
>
> > Here's a website to narrow that down:
>
> >http://icd9cm.chrisendres.com/index.php?action=3Dalpha
>
> > Once you get that far, then one approach might be to put all the =
> > diagnosises of a patient on one observation with variables DIAG1 to =
> > DIAG1000, and then use an Array or IML to scan for those that have
> > your = desired diagnosis (HIV) and at the same time do not have any of
>
> > your = undesired diagnosis codes (infections).
>
> > So the way I'd approach it is with 3 data sets
> > 1) HIV diagnosis codes
> > 2) non-desired diagnosis codes (infections)
> > 3) data with one record per patient and all diagnosis codes accross.
>
> > Then you'd process the third data set as to whether to "keep" or =
> > "reject" based on the values of the first and second data set. You =
> > could do so with SQL or IML; write back if you want to go this way.=20
>
> > Here's some of the diagnosis codes below:
>
> > -Mary
>
> > Influenza, influenzal 487.1=20
> > with=20
> > bronchitis 487.1=20
> > bronchopneumonia 487.0=20
> > cold (any type) 487.1=20
> > digestive manifestations 487.8=20
> > hemoptysis 487.1=20
> > involvement of=20
> > gastrointestinal tract 487.8=20
> > nervous system 487.8=20
> > laryngitis 487.1=20
> > manifestations NEC 487.8=20
> > respiratory 487.1=20
> > pneumonia 487.0=20
> > pharyngitis 487.1=20
> > pneumonia (any form classifiable to 480-483, 485-486) 487.0=20
> > respiratory manifestations NEC 487.1=20
> > sinusitis 487.1=20
> > sore throat 487.1=20
> > tonsillitis 487.1=20
> > racheitis 487.1=20
> > upper respiratory infection (acute) 487.1=20
>
> > ----- Original Message -----=20
> > From:gopi...@YAHOO.COM=20
> > To: SA...@LISTSERV.UGA.EDU=20
> > Sent: Tuesday, January 22, 2008 8:56 PM
> > Subject: Selecting pts with only cancer diagnosis
>
> > Hi,
> > Iam trying to clean data for some analysis and want to select only
> the
> > patients who have HIV only and no other infectious disease based on
> > the diagnosis code.In other words I want pts with diagnosis code
> > 120,120.1,120.2....120.9(all these codes are for hiv) and exclude
> > every person with a different diagnosis code as well as if they have
> > any other diagnosis code along with HIV diag code. The data set I
> have
> > is something like this.
>
> > pt diacod
> > 11 200
> > 11 120
> > 11 120.2
> > 20 220
> > 20 120.1
> > 20 120.9
> > 31 120
> > 22 217.9
> > 22 120.8
> > 22 542.9
> > 25 120.6
> > 25 120.9
> > 25 120
>
> > Thanks in advance.
> > Gopi.
>
> > .- Hide quoted text -
>
> > - Show quoted text -
>
> Hi Mary,
> Following is what I want to get from the list of pt's.
> All patients have multiple diagnoses. A patient (1)can have HIV and
> diabetes for example. Another patient(2) can have HIV, influenza and
> heart disease. I want patient who have HIV and no other infectious
> disease - they can have other diagnoses, but just no other infection
> diagnoses. So in my above example, I would want the first patient but
> not the second patient.
>
> My goal is to get every pt with HIV only and also pt's with HIV with
> other non infectious diseases. Since the data I have has diagnosis codes
> for HIV and other infectious diseases(I donot have diagnosis code for
> non infectious disease) I tried to get my final list by excluding the
> pt's with infectious disease, so I had to give extensive list of codes
> in my program as you can see below, and perhaps that's the reason my
> program is not working properly, as I still end up getting pt's with
> infection diagnosis in my final list. Please help.
>
> By using the following method I am unable to get the desired outcome.
>
> data test ;
> set test1;
> retain zzz other;
> by pt;
> if first.pt then do;
> zzz=0;
> other=0;
> end;
> if Diag_Cd in
> ('188.00','188.10','188.20','188.30','188.40','188.50','188.60','188.70'
> ,'188.80','188.90','170.00','170.10','170.20','170.30','170.40','170.50'
> ,'170.60','170.70','170.80','170.90','191.00','191.10','191.20','191.30'
> ,'191.40','191.50','191.60','191.70','191.80','191.90','174.00','174.10'
> ,'174.20','174.30','174.40','174.50','174.60','174.70')
> then zzz=1;
> else other=1;
> if pt and not(zzz) and other then output;
> run;
>
> Can you please give me the code for the method you were suggesting, it
> would be a huge help. Thanks in advance .Gopi.- Hide quoted text -
>
> - Show quoted text -
Thank you
|