LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (November 2004, week 5)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Mon, 29 Nov 2004 14:43:51 -0500
Reply-To:     "Chin, Stanley C" <stanley.c.chin@LMCO.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         "Chin, Stanley C" <stanley.c.chin@LMCO.COM>
Subject:      Re: Perl regular expressions - prxparse help needed
Content-type: text/plain; charset=us-ascii

Karen Olson writes

>Can someone help me write a regular expression? I want to grab some text >that follows the word INFLUENZA. This word can appear one or more times >and so I wanted to write a regular expression and use the PRXNEXT function >to get all instances. The glitch for me is that the word PARAINFLUENZA >can also appear one or more times and I do not care about what follows >that word. I am having trouble writing a PRXPARSE statement that >says "get INFLUENZA only when it is not preceded by PARA." INFLUENZA is >often preceded by a space except when it is the very first word and >preceeded by nothing. >If I write: > prx1=prxparse("/( |)INFLUENZA( VIRUS)?( TYPE)?/"); >I end up with parainfluenza records that I do no want. > >If I write: > prx1=prxparse("/[^A]INFLUENZA( VIRUS)?( TYPE)?/"); >I do not get the records where INFLUENZA is the very first word in the >text. > >What I'm aiming for is knowing whether the text refers to flu A or B or >both.

WARNING UNTESTED (we don't have SAS 9 but): if prxparse understands regex metacharacters then use the meta \b to bound the word "influenza" that is

/\binfluenza\b( virus)?( type)?/i

the "i" meta at the end makes it all case insensitive, which is another thread entirely.

\b recognizes either a space or the beginning of the string as a word boundary; see e.g. http://www.oreilly.com/catalog/regexppr/chapter/part1B.pdf

If PRXPARSE doesn't understand \b properly then you could alter your second search to look either for (not a) or for the text to be at the beginning of the string i.e. /([^a]influenza|^influenza)/i

hth

-stanley


Back to: Top of message | Previous page | Main SAS-L page