Date: Wed, 19 May 2004 14:12:35 -0700
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "David L. Cassell" <cassell.david@EPAMAIL.EPA.GOV>
Subject: Re: Sort by numeric suffix ?
Content-type: text/plain; charset=US-ASCII
"Richard A. DeVenezia" <radevenz@IX.NETCOM.COM> wrote [in part]:
> 2. foundation is prxparse ('/(.*?)(\d*?)\s*?$/'); * v9 only;
> - important things here, ? is non-greedy specifier, and \s*?$ is
> match (\d*?) properly at the end. Why ? The trailing spaces in sas
> character variables are 'delivered' to the perl subsystem. This
> confounds the initial expectation one might have...
You could say:
re = prxparse('/(.*?)(\d*)/');
instead. By using the '$' you are *forcing* Perl to match to the
end of the line. So you have to include the \s* in case you have
whitespace after your digits. By dropping the 'end of string' part
of your regex, you can pitch the 'fill with whitespace to end of string'
part as well. Note that the 'comprehensiveness' of the '.' wildcard
that you have to use the non-greedy form of the '*' multiplier, or else
risk ending up with the first parens matching EVERYTHING. After all,
means '0 or more', and you certainly have zero digits followed by zero
spaces, if the first part matches the whole string.
But I see you are including cases where the \d never matches. Okay, you
might also consider simplifying by just telling Perl what you actually
want. You might try using a character class in the first part:
re = prxparse('/([a-zA-Z]*)(\d*)/');
By clearly specifying non-digit and digit portions, you don't need to
sweat the greedy-vs-nongreedy features.
On the other hand, if you want to include numbers and underscores in
the prefix portion, with a non-digit character as the end of the prefix,
followed by digits only, then you can use the non-greedy multiplier and
the \w wildcard:
re = prxparse('/(\w*?)(\d*)/';
Or perhaps I have missed a key aspect of your concerns. HTH anyway,
David Cassell, CSC
Senior computing specialist