Date: Tue, 31 May 2005 07:12:03 -0700
Reply-To: chris <fast_rabbit@GMX.CH>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: chris <fast_rabbit@GMX.CH>
Organization: http://groups.google.com
Subject: Re: documentation of Perl regex directly in the code
Content-Type: text/plain; charset="iso-8859-1"
Richard A. DeVenezia wrote:
> Rune:
>
> One perlish way to comment uses trailing #mycomment and the /x option. This
> does not work in SAS. Your best bet is probably to use CATS function with
> SAS comments. Note: CATS trims spaces, so blending in a space only can be
> frustrating. Use \x20 to match a space when build a regex pattern using
> CATS.
>
> In this sample I cheat a little. I use a capture group (\x20) to store a
> space and then use it in the replacement. This was only done because CATS
> trims out leading and trailing spaces and the original replacement part
> would have been reduced to $2$1 without a space.
>
> data namechange;
> set staff;
> re = prxparse (
> CATS
> ('s' /* Perform a substitution */
> ,'/' /* Start regular expression*/
> ,'(' /* Start capture buffer # 1 to store the last name*/
> ,'[^,]+' /* match one or more non-comma characters */
> ,')' /* end capture buffer # 1 */
> ,',' /* match a comma*/
> ,'(\x20)' /* match a space*/
> ,'(' /* start capture buffer # 2 to store the first name*/
> ,'\w+' /* match a word character one or more times */
> ,'(' /* Start capture buffer # 3. It is part of buffer # 2*/
> ,'\s+' /* match a white space*/
> ,'\w+' /* match a word character one or more times*/
> ,')' /* end capture buffer # 3, hold first name and middle name*/
> ,'?' /* match zero or one time*/
> ,')' /* end capture buffer # 2*/
> ,'/' /* end regular expression and start replacement text*/
> ,'$3' /* insert captured buffer # 2*/
> ,'$2' /* insert a space*/
> ,'$1' /* insert capture buffer # 1 */
> ,'/' /* end replacement text*/
> )
> );
> NewName = prxchange(re,1,Name);
> run;
>
> --
> Richard A. DeVenezia
> http://www.devenezia.com/
This is a good way to do this, but does it really help readability?
I would stick with the initial regex and document is seperately on a
new line:
re = prxparse( 's/([^,]+), (\w+(\s+\w+)?) /$2 $1/');
/* regex to reverse a string up to a comma with one or two words
following
the comma
([^,]+) capture one or more non-comma character into $1
, match a blank and a comma
(\w+(\s+\w+)?) capture one or two blocks of characters
separated by blanks into $2
*/
chris
|