Date: Tue, 11 May 2004 19:17:47 +0100
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Peter Crawford2 <peter.crawford@BLUEYONDER.CO.UK>
Subject: Re: Can rxparse be useful for this address cleaning work??
Content-Type: text/plain; charset="utf-8"
It seemed too good a chance to follow up... here in Montreal - at the Futures Forum
Perhaps there might be some interest in an informat to support regular expressions ...
Meanwhile, if you use, or think you would use regular expressions (if simpler),
it might be worthn forwarding the idea to firstname.lastname@example.org indicating the
kind of business benefit you see in supporting regular expression informats
model ( non-functional.... yet)
proc format ;
'<complex regular expression string'(regExp) = _same_ ;
length my_data $30. ;
infile '<loads of text data file>' ls=32000 ;
input my_data $inpicture32000. ;
From: Jack Hamilton [mailto:email@example.com]
Sent: Tue 5/11/2004 2:30 AM
Subject: RE: [SAS-L] Can rxparse be useful for this address cleaning work??
I had asked Rick Langston for regex capabilities in formats at the last
SUGI, and he told me today that he was stilling thinking about it (maybe has
some code written, but not released). So yes, I think it's worth
From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU]On Behalf Of
Sent: Monday, May 10, 2004 8:39 PM
Subject: Re: [SAS-L] Can rxparse be useful for this address cleaning
If we could use a regular expression as an informat - almost
like an inpicture, it might simplify the implementation of this
Does anyone think it is worth escalating ?
Would regular expression informats make regular expressions easier to use ?
On Mon, 8 Mar 2004 22:34:31 -0500, Richard A. DeVenezia
>Duck-Hye Yang wrote:
>> My address data look like this:
>> data one;
>> length line $50;
>> line = "0S 810 SPRING GREEN"; output;
>> line = "0S0 42 PEARL ROAD"; output;
>> line = "0 S 336 EAST STREET"; output;
>> line = "0 SOUTH 531 JEFFERSON"; output;
>> line = "0 S 356 MADISON"; output;
>> line = "1 S 356 MADISON"; output;
>> line = "1 NORTH 356 MADISON"; output;
>> My goal is to first combine the three or two components into one
>> component so that the desired output is like the following:
>> "0S810 SPRING GREEN"
>> "0S042 PEARL ROAD"
>> "0S336 EAST STREET"
>> "0SOUTH531 JEFFERSON"
>> "0S356 MADISON"
>> "1S356 MADISON"
>> "1NORTH356 MADISON"
>> For the last 4 days, I have been trying to do this daunting work using
>> rxparse function as shown by Chang Y. Chung.
>> I gave up finally. Can anybody help me with this?
>I haven't followed the thread, but the output appears to indicate you want
>- remove all spaces prior to last digit encountered
>This SAS regular expression does that (well almost, it retains all
>characters A-z0-9 prior to last digit found) :
> length line $50;
> line = "0S 810 SPRING GREEN"; output;
> line = "0S0 42 PEARL ROAD"; output;
> line = "0 S 336 EAST STREET"; output;
> line = "0 SOUTH 531 JEFFERSON"; output;
> line = "0 S 356 MADISON"; output;
> line = "1 S 356 MADISON"; output;
> line = "1 NORTH 356 MADISON"; output;
>* retain only letters and digits upto and including last digit found;
> set one;
> if _n_ = 1 then do;
> retain rx;
> rx = rxparse (" ~'A-z0-9'*<$'A-z0-9'>*~'A-z0-9'*<$D> to =1=2 ");
> * shorter slightly different alternative;
> * rx = rxparse (" $W*<$C>*$W*<$D> to =1=2 ");
> put rx=;
> length scrunch $50;
> call rxchange (rx,99,line,scrunch);
> put @1 line= @30 scrunch=;
>Richard A. DeVenezia