LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (February 2008, week 2)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Sun, 10 Feb 2008 08:41:22 -0800
Reply-To:   gopilth@YAHOO.COM
Sender:   "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:   gopilth@YAHOO.COM
Organization:   http://groups.google.com
Subject:   Re: Selecting pts with only cancer diagnosis
Comments:   To: sas-l@uga.edu
Content-Type:   text/plain; charset=ISO-8859-1

On Feb 8, 4:57 pm, HERMA...@WESTAT.COM (Sigurd Hermansen) wrote: > Gopi: > While you can look across rows of a dataset using procedural Data step > programs with retains, flags, and first.x loops, what you have really > amounts to a existential quantifier in predicate logic. SAS SQL has a > solution built into its syntax. > > data test; > input pt: 2. diacod: $char6. ; > cards; > 11 200 > 11 120 > 11 188.00 > 20 220 > 20 120.1 > 20 120.9 > 31 120 > 22 217.9 > 22 170.70 > 22 542.9 > 25 120.6 > 25 120.9 > 25 120 > ; > run; > proc sql; > create table solution as > select distinct pt from test as t1 > where substr(left(diacod),1,3) IN ('120') > and > NOT EXISTS (select 1 from test as t2 where t1.pt=t2.pt and diacod IN > ('188.00','188.10','188.20','188.30','188.40','188.50','188.60','188.70' > , > > '188.80','188.90','170.00','170.10','170.20','170.30','170.40','170.50', > '170.60','170.70','170.80', > > '170.90','191.00','191.10','191.20','191.30','191.40','191.50','191.60', > '191.70','191.80','191.90', > > '174.00','174.10','174.20','174.30','174.40','174.50','174.60','174.70') > ) > ; > quit; > > Note that the Boolean logic WHERE condition looks at each individual > tuple (row) in the dataset test. The NOT EXISTS operator, in contrast, > looks across the entire dataset for tuples that match on pt and contain > one of the infectious disease codes. > > The shorter version of the WHERE condition in the program not only saves > typing, it may capture more of the variations in ICD codes (say, > '120.'). I'd do the same with the list of infectious disease codes. Too > specific criteria for selection may create as many problems as too loose > criteria. > > A "reflexive join" could implement an existential quantifier in a more > efficient way. I'd reserve that extra effort for datasets with more than > 10,000 tuples. > > For extra credit, a programmer would look ahead to similar programming > problems and build a solution that operates on lists of ICD codes in > datasets. Both the IN operator accepts (SELECT <attribute> from > <dataset> ...) as a list of attribute values in a column of a dataset. > S > > I've changed the input data to include a wider range of data.... > S > > > > -----Original Message----- > From: owner-sa...@listserv.uga.edu [mailto:owner-sa...@listserv.uga.edu] > > On Behalf Ofgopi...@yahoo.com > Sent: Friday, February 08, 2008 2:13 PM > To: sa...@uga.edu > Subject: Re: Selecting pts with only cancer diagnosis > > On Jan 24, 10:26 am, mlhow...@avalon.net (Mary) wrote: > > Yes, really you've got to come up with a more specific list of > > diagnosis = to exclude than just any other diagnosis- it would seem > > that you would = want to tailor what these specific diagnosis are- you > > > may want to keep a = patient with HIV and depression, for instance, > > but throw out the patient = that has both HIV and influenza. > > > Here's a website to narrow that down: > > >http://icd9cm.chrisendres.com/index.php?action=3Dalpha > > > Once you get that far, then one approach might be to put all the = > > diagnosises of a patient on one observation with variables DIAG1 to = > > DIAG1000, and then use an Array or IML to scan for those that have > > your = desired diagnosis (HIV) and at the same time do not have any of > > > your = undesired diagnosis codes (infections). > > > So the way I'd approach it is with 3 data sets > > 1) HIV diagnosis codes > > 2) non-desired diagnosis codes (infections) > > 3) data with one record per patient and all diagnosis codes accross. > > > Then you'd process the third data set as to whether to "keep" or = > > "reject" based on the values of the first and second data set. You = > > could do so with SQL or IML; write back if you want to go this way.=20 > > > Here's some of the diagnosis codes below: > > > -Mary > > > Influenza, influenzal 487.1=20 > > with=20 > > bronchitis 487.1=20 > > bronchopneumonia 487.0=20 > > cold (any type) 487.1=20 > > digestive manifestations 487.8=20 > > hemoptysis 487.1=20 > > involvement of=20 > > gastrointestinal tract 487.8=20 > > nervous system 487.8=20 > > laryngitis 487.1=20 > > manifestations NEC 487.8=20 > > respiratory 487.1=20 > > pneumonia 487.0=20 > > pharyngitis 487.1=20 > > pneumonia (any form classifiable to 480-483, 485-486) 487.0=20 > > respiratory manifestations NEC 487.1=20 > > sinusitis 487.1=20 > > sore throat 487.1=20 > > tonsillitis 487.1=20 > > racheitis 487.1=20 > > upper respiratory infection (acute) 487.1=20 > > > ----- Original Message -----=20 > > From:gopi...@YAHOO.COM=20 > > To: SA...@LISTSERV.UGA.EDU=20 > > Sent: Tuesday, January 22, 2008 8:56 PM > > Subject: Selecting pts with only cancer diagnosis > > > Hi, > > Iam trying to clean data for some analysis and want to select only > the > > patients who have HIV only and no other infectious disease based on > > the diagnosis code.In other words I want pts with diagnosis code > > 120,120.1,120.2....120.9(all these codes are for hiv) and exclude > > every person with a different diagnosis code as well as if they have > > any other diagnosis code along with HIV diag code. The data set I > have > > is something like this. > > > pt diacod > > 11 200 > > 11 120 > > 11 120.2 > > 20 220 > > 20 120.1 > > 20 120.9 > > 31 120 > > 22 217.9 > > 22 120.8 > > 22 542.9 > > 25 120.6 > > 25 120.9 > > 25 120 > > > Thanks in advance. > > Gopi. > > > .- Hide quoted text - > > > - Show quoted text - > > Hi Mary, > Following is what I want to get from the list of pt's. > All patients have multiple diagnoses. A patient (1)can have HIV and > diabetes for example. Another patient(2) can have HIV, influenza and > heart disease. I want patient who have HIV and no other infectious > disease - they can have other diagnoses, but just no other infection > diagnoses. So in my above example, I would want the first patient but > not the second patient. > > My goal is to get every pt with HIV only and also pt's with HIV with > other non infectious diseases. Since the data I have has diagnosis codes > for HIV and other infectious diseases(I donot have diagnosis code for > non infectious disease) I tried to get my final list by excluding the > pt's with infectious disease, so I had to give extensive list of codes > in my program as you can see below, and perhaps that's the reason my > program is not working properly, as I still end up getting pt's with > infection diagnosis in my final list. Please help. > > By using the following method I am unable to get the desired outcome. > > data test ; > set test1; > retain zzz other; > by pt; > if first.pt then do; > zzz=0; > other=0; > end; > if Diag_Cd in > ('188.00','188.10','188.20','188.30','188.40','188.50','188.60','188.70' > ,'188.80','188.90','170.00','170.10','170.20','170.30','170.40','170.50' > ,'170.60','170.70','170.80','170.90','191.00','191.10','191.20','191.30' > ,'191.40','191.50','191.60','191.70','191.80','191.90','174.00','174.10' > ,'174.20','174.30','174.40','174.50','174.60','174.70') > then zzz=1; > else other=1; > if pt and not(zzz) and other then output; > run; > > Can you please give me the code for the method you were suggesting, it > would be a huge help. Thanks in advance .Gopi.- Hide quoted text - > > - Show quoted text -

Thank you


Back to: Top of message | Previous page | Main SAS-L page