Use Regular Expressions to Streamline Your Code
by Howard Fosdick (C) 2023
This article covers two topics. First, what are "regular expressions" and why might you want to use them?
And second, how do you access them in Rexx?
What are Regular Expressions and Why Use Them?
Regular expressions are a way to match strings. They allow you to describe a string or pattern matching algorithm in a single phrase, instead of encoding that whole algorithm yourself. They're extremely powerful.
One regular expression can often replace a whole series of complicated string manipulation statements. That's useful for two reasons. First, string manipulation is a common problem in programming. And second, you can often express all sorts of different problems through the paradigm of manipulating strings.
Once you form the regular expression, the programming language takes on all the work of applying the pattern to the text you search. Here’s a simple example. To list all files in a Unix or Linux directory, you might issue this operating system command--
Or, as a Windows line command that would be:
The asterisk (*) is a regular expression. It matches all results in the command’s output. So the command succinctly lists all files in the directory. The power of regular expressions comes in when you match more complicated patterns. Here we change the example to match only files ending in the characters .exe:
This next expression lists files beginning with any one the three lower-case letters a, b, or c, followed by any two other characters –
As you can see, regular expressions rely on an extensive set of characters or symbols that are assigned special contextual meanings. For example, here the phrase [a-c] matches any one of the three specified characters (a, b, or c), while the question marks each match any single character.
You can build much more complexity into a regular expression, so regular expressions become a very economical way of expressing pattern matching algorithms. The beauty of this is that so many programming problems can be viewed as exercises in string and pattern matching. Regular expressions make for terse, powerful programs.
For example, say an outside company sends you a data file containing customers, but that their personal data is inconsistently formatted. Regular expressions can help you isolate individual data fields, regardless of those formatting inconsistencies, to regularize them into a single consistent format. You can image how these fields could often be inconsistent across naming conventions or national borders:
- Phone numbers
- Birth dates
- Email addresses
One possible downside to regular expressions is that they can appear complex to someone who has to maintain your code (or even to you, if you haven’t looked at your code for awhile!). Therefore you should always fully comment any complex regular expressions you embed in your code. And consider breaking down a single highly complicated regular expressions into a few simpler ones. Nothing is worse than having to figure out – and debug – a very complex regular expression.
How to Access Regular Expressions in Rexx
Regular expressions are not built-in as part of the ANSI-1996 or TRL-2 language standards for classic, procedural Rexx. Instead, there are free function packages that enable you to code them. You just establish access to the external Rexx function library by a line or two of code in your script. Then you can code regular expressions as if they're part of the Rexx language.
Regular expressions come built into Open Object Rexx (ooRexx) and NetRexx.
Windows and other platforms: You can download a free function library for regular expressions that works across many platforms including Windows here: the RexxRE Regular Expression Library.
OS/2 and similar platforms: For OS/2 and similar platforms look here.
ooRexx: Regular expressions come built-in as part of Open Object Rexx and are provided through the RegularExpressions Class, with its methods init, match, parse, pos, and position. Find it in the ooRexx reference documentation here.
NetRexx: NetRexx provides regular expressions in the regex (or grep) and changeregex stages, which use Java regular expressions. See the NetRexx Pipelines manual here.
You’ll want to use regular expressions in your Rexx code to succinctly express complicated pattern matching. Just be sure to document what the regular expressions do.