PERLREQUICK
NAME DESCRIPTION The Guide BUGS SEE ALSO AUTHOR AND COPYRIGHT
perlrequick - Perl regular expressions quick start
This page covers the very basics of understanding, creating and using regular expressions ('regexes') in Perl.
Simple word matching
The simplest regex is simply a word, or more generally, a string of characters. A regex consisting of a word matches any string that contains that word:
In this statement, World is a regex and the // enclosing /World/ tells perl to search a string for a match. The operator = associates the string with the regex match and produces a true value if the regex matched, or false if the regex did not match. In our case, World matches the second word in , so the expression is true. This idea has several variations.
Expressions like this are useful in
Finally, the // default delimiters for a match can be changed to arbitrary delimiters by putting an 'm' out front:
Regexes must match a part of the string exactly in order for the statement to be true:
perl will always match at the earliest possible point in the string:
Not all characters can be used 'as is' in a match. Some characters, called metacharacters, are reserved for use in regex notation. The metacharacters are
{}[?()^$.*+?\
A metacharacter can be matched by putting a backslash before it:
In the last regex, the forward slash '/' is also backslashed, because it is used to delimit the regex.
Non-printable ASCII characters are represented by escape sequences. Common examples are t for a tab, n for a newline, and r for a carriage return. Arbitrary bytes are represented by octal escape sequences, e.g., 033, or hexadecimal escape sequences, e.g., x1B:
'cathouse' = /cat$foo/; # matches 'housecat' = /${foo}cat/; # matches With all of the regexes above, if the regex matched anywhere in the string, it was considered a match. To specify where it should match, we would use the anchor metacharacters ^ and $. The anchor ^ means match at the beginning of the string and the anchor $ means match at the end of the string, or before a newline at the end of the string. Some examples:
Using character classes
A character class allows a set of possible characters, rather than just a single character, to match at a particular point in a regex. Character classes are denoted by brackets [...?, with the set of characters to be possibly matched inside. Here are some
/[bcr?at/; # matches 'bat', 'cat', or 'rat' In the last statement, even though 'c' is the first character in the class, the earliest point at which the regex can match is 'a'.
/yes/i; # also match 'yes' in a case-insensitive way The last example shows a match with an 'i' modifier, which makes the match case-insensitive.
Character classes also have ordinary and special characters, but the sets of ordinary and special characters inside a character class are different than those outside a character class. The special characters for a character class are
$x = 'bcr'; /[$x?at/; # matches 'bat, 'cat', or 'rat' /[$x?at/; # matches '$at' or 'xat' /[\$x?at/; # matches 'at', 'bat, 'cat', or 'rat'
/[0-9a-fA-F?/; # matches a hexadecimal digit If '-' is the first or last character in a character class, it is treated as an ordinary character.
The special character ^ in the first position of a character class denotes a negated character class, which matches any character but those in the brackets. Both [...? and [^...? must match a character, or the match fails. Then
/[^a?at/; # doesn't match 'aat' or 'at', but matches
/[^0-9?/; # matches a non-numeric character /[a^?at/; # matches 'aat' or '^at'; here '^' is ordinary Perl has several abbreviations for common character classes:
d is a digit and represents [0-9?
s is a whitespace character and represents
lib/CachedMarkup.php (In template 'browse' < 'body' < 'html'):257: Error: Pure virtual
lib/CachedMarkup.php (In template 'browse' < 'body' < 'html'):257: Error: Pure virtual