Date: Sun, 10 Feb 2008 08:41:22 -0800 gopilth@YAHOO.COM "SAS(r) Discussion" gopilth@YAHOO.COM http://groups.google.com Re: Selecting pts with only cancer diagnosis To: sas-l@uga.edu text/plain; charset=ISO-8859-1

On Feb 8, 4:57 pm, HERMA...@WESTAT.COM (Sigurd Hermansen) wrote: > Gopi: > While you can look across rows of a dataset using procedural Data step > programs with retains, flags, and first.x loops, what you have really > amounts to a existential quantifier in predicate logic. SAS SQL has a > solution built into its syntax. > > data test; > input pt: 2. diacod: \$char6. ; > cards; > 11 200 > 11 120 > 11 188.00 > 20 220 > 20 120.1 > 20 120.9 > 31 120 > 22 217.9 > 22 170.70 > 22 542.9 > 25 120.6 > 25 120.9 > 25 120 > ; > run; > proc sql; > create table solution as > select distinct pt from test as t1 > where substr(left(diacod),1,3) IN ('120') > and > NOT EXISTS (select 1 from test as t2 where t1.pt=t2.pt and diacod IN > ('188.00','188.10','188.20','188.30','188.40','188.50','188.60','188.70' > , > > '188.80','188.90','170.00','170.10','170.20','170.30','170.40','170.50', > '170.60','170.70','170.80', > > '170.90','191.00','191.10','191.20','191.30','191.40','191.50','191.60', > '191.70','191.80','191.90', > > '174.00','174.10','174.20','174.30','174.40','174.50','174.60','174.70') > ) > ; > quit; > > Note that the Boolean logic WHERE condition looks at each individual > tuple (row) in the dataset test. The NOT EXISTS operator, in contrast, > looks across the entire dataset for tuples that match on pt and contain > one of the infectious disease codes. > > The shorter version of the WHERE condition in the program not only saves > typing, it may capture more of the variations in ICD codes (say, > '120.'). I'd do the same with the list of infectious disease codes. Too > specific criteria for selection may create as many problems as too loose > criteria. > > A "reflexive join" could implement an existential quantifier in a more > efficient way. I'd reserve that extra effort for datasets with more than > 10,000 tuples. > > For extra credit, a programmer would look ahead to similar programming > problems and build a solution that operates on lists of ICD codes in > datasets. Both the IN operator accepts (SELECT <attribute> from > <dataset> ...) as a list of attribute values in a column of a dataset. > S > > I've changed the input data to include a wider range of data.... > S > > > > -----Original Message----- > From: owner-sa...@listserv.uga.edu [mailto:owner-sa...@listserv.uga.edu] > > On Behalf Ofgopi...@yahoo.com > Sent: Friday, February 08, 2008 2:13 PM > To: sa...@uga.edu > Subject: Re: Selecting pts with only cancer diagnosis > > On Jan 24, 10:26 am, mlhow...@avalon.net (Mary) wrote: > > Yes, really you've got to come up with a more specific list of > > diagnosis = to exclude than just any other diagnosis- it would seem > > that you would = want to tailor what these specific diagnosis are- you > > > may want to keep a = patient with HIV and depression, for instance, > > but throw out the patient = that has both HIV and influenza. > > > Here's a website to narrow that down: > > >http://icd9cm.chrisendres.com/index.php?action=3Dalpha > > > Once you get that far, then one approach might be to put all the = > > diagnosises of a patient on one observation with variables DIAG1 to = > > DIAG1000, and then use an Array or IML to scan for those that have > > your = desired diagnosis (HIV) and at the same time do not have any of > > > your = undesired diagnosis codes (infections). > > > So the way I'd approach it is with 3 data sets > > 1) HIV diagnosis codes > > 2) non-desired diagnosis codes (infections) > > 3) data with one record per patient and all diagnosis codes accross. > > > Then you'd process the third data set as to whether to "keep" or = > > "reject" based on the values of the first and second data set. You = > > could do so with SQL or IML; write back if you want to go this way.=20 > > > Here's some of the diagnosis codes below: > > > -Mary > > > Influenza, influenzal 487.1=20 > > with=20 > > bronchitis 487.1=20 > > bronchopneumonia 487.0=20 > > cold (any type) 487.1=20 > > digestive manifestations 487.8=20 > > hemoptysis 487.1=20 > > involvement of=20 > > gastrointestinal tract 487.8=20 > > nervous system 487.8=20 > > laryngitis 487.1=20 > > manifestations NEC 487.8=20 > > respiratory 487.1=20 > > pneumonia 487.0=20 > > pharyngitis 487.1=20 > > pneumonia (any form classifiable to 480-483, 485-486) 487.0=20 > > respiratory manifestations NEC 487.1=20 > > sinusitis 487.1=20 > > sore throat 487.1=20 > > tonsillitis 487.1=20 > > racheitis 487.1=20 > > upper respiratory infection (acute) 487.1=20 > > > ----- Original Message -----=20 > > From:gopi...@YAHOO.COM=20 > > To: SA...@LISTSERV.UGA.EDU=20 > > Sent: Tuesday, January 22, 2008 8:56 PM > > Subject: Selecting pts with only cancer diagnosis > > > Hi, > > Iam trying to clean data for some analysis and want to select only > the > > patients who have HIV only and no other infectious disease based on > > the diagnosis code.In other words I want pts with diagnosis code > > 120,120.1,120.2....120.9(all these codes are for hiv) and exclude > > every person with a different diagnosis code as well as if they have > > any other diagnosis code along with HIV diag code. The data set I > have > > is something like this. > > > pt diacod > > 11 200 > > 11 120 > > 11 120.2 > > 20 220 > > 20 120.1 > > 20 120.9 > > 31 120 > > 22 217.9 > > 22 120.8 > > 22 542.9 > > 25 120.6 > > 25 120.9 > > 25 120 > > > Thanks in advance. > > Gopi. > > > .- Hide quoted text - > > > - Show quoted text - > > Hi Mary, > Following is what I want to get from the list of pt's. > All patients have multiple diagnoses. A patient (1)can have HIV and > diabetes for example. Another patient(2) can have HIV, influenza and > heart disease. I want patient who have HIV and no other infectious > disease - they can have other diagnoses, but just no other infection > diagnoses. So in my above example, I would want the first patient but > not the second patient. > > My goal is to get every pt with HIV only and also pt's with HIV with > other non infectious diseases. Since the data I have has diagnosis codes > for HIV and other infectious diseases(I donot have diagnosis code for > non infectious disease) I tried to get my final list by excluding the > pt's with infectious disease, so I had to give extensive list of codes > in my program as you can see below, and perhaps that's the reason my > program is not working properly, as I still end up getting pt's with > infection diagnosis in my final list. Please help. > > By using the following method I am unable to get the desired outcome. > > data test ; > set test1; > retain zzz other; > by pt; > if first.pt then do; > zzz=0; > other=0; > end; > if Diag_Cd in > ('188.00','188.10','188.20','188.30','188.40','188.50','188.60','188.70' > ,'188.80','188.90','170.00','170.10','170.20','170.30','170.40','170.50' > ,'170.60','170.70','170.80','170.90','191.00','191.10','191.20','191.30' > ,'191.40','191.50','191.60','191.70','191.80','191.90','174.00','174.10' > ,'174.20','174.30','174.40','174.50','174.60','174.70') > then zzz=1; > else other=1; > if pt and not(zzz) and other then output; > run; > > Can you please give me the code for the method you were suggesting, it > would be a huge help. Thanks in advance .Gopi.- Hide quoted text - > > - Show quoted text -

Thank you

Back to: Top of message | Previous page | Main SAS-L page