Date: Wed, 18 May 2011 19:05:43 -0400
Reply-To: Scott Bass <sas_l_739@YAHOO.COM.AU>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Scott Bass <sas_l_739@YAHOO.COM.AU>
Subject: Re: Regular expression help needed
Hi Toby,
I thought the Perl Regular Expressions were supposed to be compliant with
Perl functionality. Is this assumption incorrect? Why can Perl work it out
but SAS cannot?
changing the regular expressions from:
rx1=prxparse("s/\s{4,}"||"09"x||"/o");
rx2=prxparse("s/ /_/o");
should work.
Regards,
Scott
On Mon, 16 May 2011 13:13:19 +0000, toby dunn <tobydunn@HOTMAIL.COM> wrote:
>Art,
>
>I don't have the time at the moment to work up a RegEx. But let me answer
the \t in the replacement part of the PrxChange.
>
>By in large you can not use metacharacters/metasequences (depending on who
you as they will call them one or the other) in the replacement part of the
PRXChange.
>The few you can use are as follows: \i, \L, \u, \U, \E, \Q.
>
>If you think about it long enough you will figure out that many of the
metasequences map off to multiple Hex characters, so if they were used in
the replacement part of the PrxChange call which hex character should they
use. For example the \d, maps off to 0-9 digits, which one should it use,
so only a few metasequences are allowed thos being the ones that control the
case of the returned results.
>
>Toby Dunn
>
>
>"I'm a hell bent 100% Texan til I die"
>
>"Don't touch my Willie, I don't know you that well"
>
>
>
>
>> Date: Sat, 14 May 2011 11:21:11 -0400
>> From: art297@ROGERS.COM
>> Subject: Re: Regular expression help needed
>> To: SAS-L@LISTSERV.UGA.EDU
>>
>> Scott,
>>
>> Almost exactly what I was looking for, except the way it is written it
>> precedes the changed variables with "\t" rather than a tab character.
>>
>> I presume that would only require a simple tweak to the prxparse call, but I
>> don't know what has to be changed.
>>
>> Art
>> -------
>> On Fri, 13 May 2011 23:49:08 -0400, Scott Bass <sas_l_739@YAHOO.COM.AU>
>> wrote:
>>
>> >Hi,
>> >
>> >This is close, and might be a bit more flexible than previous approaches if
>> >the data is a bit dirty.
>> >
>> >data test;
>> > length have want $200;
>> > infile datalines dlm="|" truncover;
>> > input have;
>> > want=have;
>> > rx1=prxparse("s/\s{4,}/\t/o");
>> > rx2=prxparse("s/\s+/_/o");
>> > put (rx1 rx2) (=);
>> > if prxmatch(rx1,want) then want=prxchange(rx1,-1,trim(want));
>> > if prxmatch(rx2,want) then want=prxchange(rx2,-1,trim(want));
>> > put (have want) (=/);
>> > datalines;
>> >variable 1 variable 2 a variable3 1
>> >Household Income Gender Average Age 1 Average Age 2
>> >Household Income Gender Average Age 1 Average Age 2
>> >variable 1 variable 2 a variable3 1 variable3_a_b c
>> >;
>> >run;
>> >
>> >Obviously the regex's need to execute in that order ;-).
>> >
>> >Assumptions:
>> >1) What differentiates a "new" variable? 4 or more spaces between
>> "tokens".
>> >2) There could be 1-3 spaces that comprise a single variable. If it will
>> >*always* be 1, you can tweak the above regex. (change \s+ to \s{1} or
>> ><space>{1}).
>> >
>> >For this problem statement, I find two regex's to be easier, rather than
>> >trying to write an uber-expression that will do it in one.
>> >
>> >The above regex's are described as follows:
>> >
>> >rx1: substitute 4 or more spaces (\s{4,}) with a tab (\t). I'm not sure
>> if
>> >this would work with say two tab characters, which would appear as > 4
>> >spaces. You'd need to test all examples of dirty data.
>> >
>> >rx2: substitute one or more spaces (\s+) with a single underscore.
>> >
>> >The second record (your example) does not yield three variables because
>> >there are only two spaces between the 2nd and 3rd variables. I assume that
>> >is a typo?
>> >
>> >However, this doesn't exactly work, since \t as a replacement metacharacter
>> >becomes literally "\t". \x09 as a replacement metacharacter also fails.
>> >
>> >Does anyone know why this is the case? Is this an incomplete/incorrect
>> >implementation of Perl regular expressions by SAS?
>> >
>> >I say this because these Perl one-liners in Perl v5.12.2 on Windows work:
>> >
>> >test.txt:
>> >ABC
>> >DBF
>> >GBI
>> >
>> >perl -pi.bak -e "s/B/\t/g" test.txt
>> >perl -pi.bak -e "s/B/\x09/g" test.txt
>> >
>> >Both Perl one-liners yield:
>> >
>> >A<tab>B
>> >D<tab>F
>> >G<tab>I
>> >
>> >Regards,
>> >Scott
>
|