What are Regular Expressions?


Regular expressions are a way to match strings. They allow you to describe a string or pattern matching algorithm in a single phrase, instead of encoding that whole algorithm yourself. They're extremely powerful.

One regular expression can often replace a whole series of complicated string manipulation statements. That's useful for two reasons. First, string manipulation is a common problem in programming. And second, you can often express all sorts of different problems through the paradigm of manipulating strings.

Once you form the regular expression, the programming language takes on all the work of applying the pattern to the text you search. Here’s a simple example. To list all files in a Unix or Linux directory, you might issue this operating system command--

ls   *

Or, as a Windows line command that would be:

dir   *

The asterisk (*) is a regular expression. It matches all results in the command’s output. So the command succinctly lists all files in the directory. The power of regular expressions comes in when you match more complicated patterns. Here we change the example to match only files ending in the characters .exe:

ls    *.exe

---or---

dir    *.exe

This next expression lists files beginning with any one the three lower-case letters a, b, or c, followed by any two other characters –

ls [a-c]??

As you can see, regular expressions rely on an extensive set of characters or symbols that are assigned special contextual meanings. For example, here the phrase [a-c] matches any one of the three specified characters (a, b, or c), while the question marks each match any single character.

You can build much more complexity into a regular expression, so regular expressions become a very economical way of expressing pattern matching algorithms. The beauty of this is that so many programming problems can be viewed as exercises in string and pattern matching. Regular expressions make for terse, powerful programs.

For example, say an outside company sends you a data file containing customers, but that their personal data is inconsistently formatted. Regular expressions can help you isolate individual data fields, regardless of those formatting inconsistencies, to regularize them into a single consistent format. You can image how these fields could often be inconsistent across naming conventions or national borders:

  • Names
  • Phone numbers
  • Addresses
  • Birth dates
  • Email addresses

The downside to regular expressions are that they can appear very complex to someone who has to maintain your code (or even to you, if you haven’t looked at your code for awhile!). Therefore you should always very fully comment any complex regular expressions you embed in your code. Nothing is worse than having to figure out – and debug – a very complex regular expression in someone else’s undocumented code.

Regular Expressions in Rexx

Regular expressions are not built-in as part of the ANSI-1996 or TRL-2 language standards for classic, procedural Rexx. Instead, there are free function packages that enable you to code regular expressions in Rexx. You can download one here that works across many platforms: the RexxRE Regular Expression Library.

From the coding standpoint, it makes no difference whether regular expressions are built into the language or whether they come as part of an add-in function library. You just establish access to the external Rexx function library by a line or two of code in your script. Then you can code like regular expressions are just another part of the Rexx language.

Regular expressions come built-in as part of Open Object Rexx (ooRexx). They are provided through the RegularExpressions Class, with its methods init, match, parse, pos, and position.

You’ll want to use regular expressions in your Rexx code to succinctly express complicated pattern matching. Just make sure you write code that is still readable, and document what the regular expressions do!