Date: Fri, 14 Nov 1997 15:37:50 -0600
Reply-To: REXX Programming discussion list <REXXLIST@UGA.CC.UGA.EDU>
Sender: REXX Programming discussion list <REXXLIST@UGA.CC.UGA.EDU>
From: Doug Quale <qualed@MAIL.STATE.WI.US>
Organization: State of Wisconsin
Subject: Re: Why REXX is not my favorite scripting language (was Re:
regular expression matching)
Content-Type: text/plain; charset=us-ascii
I don't want to make this seem like a picky tit for tat thing where I
try to shoot down anything anyone else has to say. There is one very
interesting question about regular expressions in here that I did want
to say something about, so I'll write about a few more things too.
pforhan@millcomm.com wrote:
>
> In <346B918A.4709@mail.state.wi.us>, Doug Quale <qualed@mail.state.wi.us>
writes:
> >It is true that PARSE is a central point of REXX. Regular expressions
> >are a central point of Perl and awk. RE's are more powerful than PARSE,
> >hence Perl and awk are more powerful than REXX. If I weren't familiar
> >with RE's I would probably think PARSE is wonderful, but by comparison
> >to RE's, PARSE just doesn't carry the freight.
>
> I would dare say that REs and PARSE are almost different applications.
> Maybe PARSE could be called "the working man's RE." In a way. But
> their purposes are somewhat different. PARSE does more in the way
> of, well, parsing; splitting something up based on characteristics
> and positioning and what-not. REs are more for finding things.
In general this is true. RE's can also be used to split things up,
however, and this is common in awk and perl.
> There is no question that REs are powerful. But they cannot do what
> PARSE does, without extensions. To my knowledge, there is no such
> syntax as "^%1.*%2$" (where %1, %2 are variables to be loaded...)
> in any RE, is there? Come to think of it, I doubt you can do that
> query exactly in REXX, either. Here, my lack of knowledge of both
> REXX and Perl show through... so forgive me if I am really wrong.
No you are really right. The key is without extensions. I, like almost
everyone else, use somewhat sloppy nomenclature by saying "regular
expressions". In fact, what people usually think of as regular
expressions are enhanced or extended RE's. And, there is no single
enhanced RE design, but several rather many. Without going through all
the detail (which I would in any case surely get wrong), I would just
note that basic RE's would include only literal characters, alternation
(or, symbolized by |) and concatenation (and, usually symbolized by
juxtposition). That's it. From that you can build character classes,
but the bracket notation ([A-Z]) is much easier to use. Negated
character classes are possible without special support, but would be
extremely clumsy and would suffer character length portability
problems. The key enhancement is the closure operator (*) meaning 0 or
more matches. With that you can get the 1 or more matches + operator
and the curly braces {n,m} at least n but no more than m matches
functionality. So everything but concatenation, alternation and closure
is syntactic sugar.
One more extension is needed to parse with RE's -- we need a way to
refer to pieces that we have matched. Unix tools that use RE's to
specify substitutions do it by enclosing the matches to memorize in
parens and then referring to them with \1, \2, etc. So, to reverse the
first two words on a line you might say
s/([a-z]+) +([a-z]+)/\2 \1/
A similar use works for a programming language, so in perl you say
($first, $second) = /(RE for first match) stuff (RE for second match)/;
In fact in perl, you can do a whole lot more with RE's than that.
> >Personally I find other aspects of the syntax of REXX a bit annoying,
> >particularly conditionals. I don't like the special cases involving the
> >semicolon
> >
>
> Well, my first recommmendation is to not to do this, at all.
> Always split up if/then/else statments.
Good advice :).
> >The design of the REXX IF instruction also allows only a single
> >expression in the THEN and ELSE clauses, requiring frequent use of
> >DO...END blocks in conditionals. (This mistake is also in Pascal, but
>
> Well, many languages are like this, including C and C++. I prefer Ada,
> which, IIRC, always forces the use of an END IF. Modula-2 is like this
> also, no?
You are right that most languages get this wrong, including C. (C is
not a language I would use as a model of good syntax. Except the
expression syntax. C has wonderful expressions, but hideous variable
type declarations.) Wirth fixed this problem between Pascal and
Modula-2, and I'm sure it's ok in Oberon as well.
>
> >REXX scripts accept only a single argument. That means that every REXX
> >script must parse its own arguments, rather than letting the calling
> >environment handle argument parsing. (This is a design flaw that was
>
> Not that it is hard, using parse and friends. Far easier than C, IMHO.
>
The real problem here is that it can't be done reliably. If I invoke a
Unix script like this
example "script should see two args" "but this is tough for REXX"
the REXX example script gets one argument, a concatenated string of the
two initial arguments. There is in general no way to solve this problem
-- REXX really believes that scripts can only get a single argument.
This fits the command model of CMS and MVS just fine, but it loses in an
environment that permits multiple arguments. Unix REXX scripts must
either use a non-portable, Unix specific function to obtain command line
arguments or the user must be careful to use extra quotes whenever
invoking a script written in REXX (making invocation of REXX scripts
different from any other program on Unix).
> >repeated in PCDOS/MSDOS.) I frown at the "GREAT RUNES" IBM mainframe
> >mentality of uppercase translation by ARG and PULL not to mention
>
> As I am sure you know, you can use parse arg and parse pull to
> eliminate this problem.
True, but the point is that the behavior of the short versions is poorly
chosen. Instead ARG and PULL should not do case translation, and PARSE
ARG UPPER and PARSE PULL UPPER should be used if you want to smash case.
>
> >default values for undefined variables. I don't like the interface to
> >external environments. Although sending unrecognized expressions to the
> >external environment may have seemed like a great idea for a scripting
> >language, I find it confusing and error prone. It also necessitates the
> >odd CALL instruction. Therefore
>
> My suggestion would be to use one or the other consistently. If you like
> communication with the OS, though, use the x=oscmd() form.
Unfortunately REXX doesn't have an x=oscmd() form. This may be supported
in OS/2, but it's not a part of standard REXX and it won't work in MVS
and CMS. In fact, Cowlishaw had envisioned that external commands would
write their output to the external data queue and the invoking REXX
program would use PULL to get at it. This is a poor, CMS-specific
mechanism, and in many cases requires the external commands to be
rewritten to work with REXX. The IBM MVS REXX implementation uses
OUTTRAP to capture output from *some* external commands. It's clumsy,
not portable and does not work in all cases.
>
> >REXX does not provide a good means of communication with commands
> >submitted to external environments. Although it is suggested to use the
> >data queue for this purpose, the data queue itself is a strange idea
> >that limits you to a single process environment. Unix backticks are
> >infinitely superior in this regard.
>
> Why not just seperate the commands into seperate steps, as I am sure
> unix shells do, anyway?
>
> /* contrived example below: */
> x=ls("s*illy")
> y=cat(x)
> /* someone correct my syntax here, it's been a while...*/
Unfortunately even if this works in OS/2, it isn't a standard part of
REXX and it doesn't work in MVS. The Unix counterparts
@x = `ls s*illy`
@y = `cat x`
work great in perl.