Penguin

PERLREQUICK

PERLREQUICK

NAME DESCRIPTION The Guide BUGS SEE ALSO AUTHOR AND COPYRIGHT


NAME

perlrequick - Perl regular expressions quick start

DESCRIPTION

This page covers the very basics of understanding, creating and using regular expressions ('regexes') in Perl.

The Guide

Simple word matching

The simplest regex is simply a word, or more generally, a string of characters. A regex consisting of a word matches any string that contains that word:

In this statement, World is a regex and the // enclosing /World/ tells perl to search a string for a match. The operator = associates the string with the regex match and produces a true value if the regex matched, or false if the regex did not match. In our case, World matches the second word in , so the expression is true. This idea has several variations.

Expressions like this are useful in

conditionals
print
The sense of the match can be reversed by using ! operator
print
The literal string in the regex can be replaced by a variable
$greeting =
If you're matching against $_, the $_ = part can be omitted
$_ =

Finally, the // default delimiters for a match can be changed to arbitrary delimiters by putting an 'm' out front:

Regexes must match a part of the string exactly in order for the statement to be true:

perl will always match at the earliest possible point in the string:

Not all characters can be used 'as is' in a match. Some characters, called metacharacters, are reserved for use in regex notation. The metacharacters are

{}[?()^$.*+?\

A metacharacter can be matched by putting a backslash before it:

In the last regex, the forward slash '/' is also backslashed, because it is used to delimit the regex.

Non-printable ASCII characters are represented by escape sequences. Common examples are t for a tab, n for a newline, and r for a carriage return. Arbitrary bytes are represented by octal escape sequences, e.g., 033, or hexadecimal escape sequences, e.g., x1B:

Regexes are treated mostly as double quoted strings, so variable substitution works
$foo = 'house';

'cathouse' = /cat$foo/; # matches 'housecat' = /${foo}cat/; # matches With all of the regexes above, if the regex matched anywhere in the string, it was considered a match. To specify where it should match, we would use the anchor metacharacters ^ and $. The anchor ^ means match at the beginning of the string and the anchor $ means match at the end of the string, or before a newline at the end of the string. Some examples:

Using character classes

A character class allows a set of possible characters, rather than just a single character, to match at a particular point in a regex. Character classes are denoted by brackets [...?, with the set of characters to be possibly matched inside. Here are some

examples
/cat/; # matches 'cat'

/[bcr?at/; # matches 'bat', 'cat', or 'rat' In the last statement, even though 'c' is the first character in the class, the earliest point at which the regex can match is 'a'.

/[yY?[eE?[sS?/; # match 'yes' in a case-insensitive way

  1. 'yes', 'Yes', 'YES', etc.

/yes/i; # also match 'yes' in a case-insensitive way The last example shows a match with an 'i' modifier, which makes the match case-insensitive.

Character classes also have ordinary and special characters, but the sets of ordinary and special characters inside a character class are different than those outside a character class. The special characters for a character class are

  • ]^$ and are matched using an escape
    /[?c]def/; # matches ']def' or 'cdef'

$x = 'bcr'; /[$x?at/; # matches 'bat, 'cat', or 'rat' /[$x?at/; # matches '$at' or 'xat' /[\$x?at/; # matches 'at', 'bat, 'cat', or 'rat'

The special character '-' acts as a range operator within character classes, so that the unwieldy [0123456789? and [abc...xyz? become the svelte [0-9? and [a-z?
/item[0-9?/; # matches 'item0' or ... or 'item9'

/[0-9a-fA-F?/; # matches a hexadecimal digit If '-' is the first or last character in a character class, it is treated as an ordinary character.

The special character ^ in the first position of a character class denotes a negated character class, which matches any character but those in the brackets. Both [...? and [^...? must match a character, or the match fails. Then

/[^a?at/; # doesn't match 'aat' or 'at', but matches

  1. all other 'bat', 'cat, '0at', '%at', etc.

/[^0-9?/; # matches a non-numeric character /[a^?at/; # matches 'aat' or '^at'; here '^' is ordinary Perl has several abbreviations for common character classes:

d is a digit and represents [0-9?

s is a whitespace character and represents

Fatal Error:

lib/CachedMarkup.php (In template 'browse' < 'body' < 'html'):257: Error: Pure virtual



Fatal PhpWiki Error

lib/CachedMarkup.php (In template 'browse' < 'body' < 'html'):257: Error: Pure virtual