Regular expressions match text to a pattern and optionally collect portions of text referred to as groups.
eXPL regular expressions are implemented using the Java Pattern
class. An eXPL regular expression behaves like the ? operator in that a false condition causes execution to short circuit.
Format
The regular expression declaration format starts with keyword regex, followed by a match expression and optional set of one or more groups exclosed in braces:
regex[( option-set )] match-expression [ { group-set } ]
Match Expression
The match expression has the format: input ? regular-expression. The input is a variable potentially assigned to an expression, from which the text is extracted to perform the match. The regular expression is a string literal or variable, the latter being useful for creating a regular expression from concatenation of components instead of having one unwieldly string literal.
\ character, when used in a regular expression, needs to be escaped by using 2 backslashes \\.
Groups
The group set is a comma-delimited list of identifiers, with the sequence corresponding to the occurence of groups defined in the regular expression. If a group is made optional in the pattern using the "?" operator, then the group variable may not be set. In this case, the variable needs to be set to a default value before the regular expression match occurs. A group variable can also be assigned a type if type conversion from string to a number type is required.
Options
The option set is comma-delimited. The options are the Java Pattern class flags, but in lower case eg. "case_insensitive" for
Pattern.CASE_INSENSITIVE - Enables case-insensitive matching.
Short Circuit Behaviour
A regular expression can be used either as a term in a template or calculator, or as the condition part of a calculator conditional block. When used as a term, the regular expression operation causes evaluation to short circuit on no match thus acting as a filter mechanism. When a match occurs, the term contains the input to the regular expression and this may be useful for providing a query solution.
When used in a conditional block and on no match, the regular expression operation causes evaluation to skip over the block. When a regular expression is intended to only capture group values, then an empty block may be used to prevent term short circuit behaviour.
Case-insensitive Matching
Application Pets of tutorial11 takes avantage of the "case-insensitive" option to handle the fact that the input has a mix of cases. It works with pet informatiom in XML format to print out details on dogs. The species elements have a mix of "dog", "Dog", "cat" and "Cat". It traverses the input data using a cursor in an unconditional loop. To prevent the loop being prematurely exited when the first cat is encountered, the regex is used in a conditional branch. Here is the loop:
regex(case_insensitive) dog = (pet++) ? petRegex { name, color }
