LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (March 2011, week 2)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Thu, 10 Mar 2011 14:19:05 -0800
Reply-To:   "Sprague, Webb (OFM)" <Webb.Sprague@OFM.WA.GOV>
Sender:   "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:   "Sprague, Webb (OFM)" <Webb.Sprague@OFM.WA.GOV>
Subject:   Re: "Unpacking" variable in a datastep into a "macro function"
Comments:   To: toby dunn <tobydunn@HOTMAIL.COM>
In-Reply-To:   A<BLU152-w3022409EA1EF0525C12DB6DEC80@phx.gbl>
Content-Type:   text/plain; charset="us-ascii"

I am not sure why you think I needed help with RE's (my question was about macros) but for the sake of all our edification, I have replied below.

Note that these RE's are the last attempt to filter out a housing type when other, more deterministic, approaches have already failed. So I am shooting for fuzzy.

> %SysFunc( PRXMatch( /...../ , <VarName> ) ) > > Only works in 9.2 and higher, in prior version the PrxMatch function > when used in teh macro facility assumed the pattern was already > precompiled via a PRXparse function call.

I couldnt get it to work, and decided to go with a simpler approach.

> /(RE*MO*DE*L)|(A*LTE*RA*T[IO]*N*)|(ADDI*T[IO]*)|(REHAB)|(REROOF)|(REMOV > E)/ > > Id loose the * and repace it with a +, Im pretty sure the 'E' and 'O'

No, the * is what I want, because often when people abbreviate they leave out the vowels. RMDL => Remodel. REEMODEL => Remodel. But RAMODIL probably not Remodel.

> in RE*MO*DE*L for example aren't optional. > Rather the poster is wanting atleast one or more occurances of these. > If it is optional then I would use the ?. > The * says 0 or more, which more often than not is not exactly what the > coder wants. In short the * is one > of the most over used and misunderstood Quantifiers.

Misunderstood by some people, but not by me. ;)

> Depending on how many alternatives are to searched for, I would be > tempted to loose the capturing parens or use > non-capturing parens, and if there is to be a preference which one is > to be matched first I would look at the order > the alternatives are specified in the pattern.

Why lose the parens? I thought you needed them with alteration "|". If not, yes they should go, as I am not substituing them in. putting most frequent first is a good idea.

> If distinct words are to be matched and not parts of a word I would add > the \b word boundary metasequences.

Yeah, I know how that works too. I compress, delete punctuation, and thus \b doesn't apply.

> As it is now it could match REMODEL, NONREMODEL, REMODELED.... you get > the idea.

Thats what we want. I am glad you confirm. ;) NONREMODELBUTREALLYSFR would fall through the cracks, but it is unlikely.

> If there is only one word and not a bunch of words in the Target String > I would add the ^ and $ line anchors to allow > the Reg Ex engine optmizer to take over.

Again, I am looking for the words anywhere in the string, so I don't use anchors (except once).

> I definitly add the /o pattern modifier to the RegEx pattern.

I will look into that

> Finally, it may be faster to break the alternatives down into a series > of PrxMatch function calls rather than one big > honker pattern and function call. This however, takes knowing ones > data, what is to and not to be matched.

Actually, I know my data fairly well, ... thanks. I keep the REs as big honkers because lots of different input (SFR, SINGLEFAM, SINGFAM, etc => SFR) all yield the same result, and I like the way it is organized this way. Like you say, the most common in the bunch near the front would be best.

Back to: Top of message | Previous page | Main SAS-L page