Diff: perlretut(1) - Waikato Linux Users Group

Differences between version 2 and previous revision of perlretut(1).

Other diffs: Previous Major Revision, Previous Author, or view the Annotated Edit History

Newer page:	version 2	Last edited on Monday, June 3, 2002 6:50:50 pm	by perry	Revert
Older page:	version 1	Last edited on Monday, June 3, 2002 6:50:50 pm	by perry	Revert

@@ -320,9 +320,9 @@

# non-word char, followed by a word char

/..rt/; # matches any two chars, followed by 'rt'

/end./; # matches 'end.'

/end[[.]/; # same thing, matches 'end.'

-Because a period is a metacharacter, it needs to be escaped to match as an ordinary period. Because, for example, d and w are sets of characters, it is incorrect to think of [[^dw] as [[DW]; in fact [[^dw] is the same as [[^w], which is the same as [[W]. Think DeMorgan's laws.

+Because a period is a metacharacter, it needs to be escaped to match as an ordinary period. Because, for example, d and w are sets of characters, it is incorrect to think of [[^dw] as [[DW]; in fact [[^dw] is the same as [[^w], which is the same as [[W]. Think ! DeMorgan's laws.

An anchor useful in basic regexps is the __word anchor__

b. This matches a boundary between a word character

@@ -1409,10 +1409,10 @@

Here is the association between some Perl named classes and the traditional Unicode classes:

Perl class name Unicode class name or regular expression

- IsAlpha /^[[LM]/

-IsAlnum /^[[LMN]/

+ ! IsAlpha /^[[LM]/

+! IsAlnum /^[[LMN]/

IsASCII $code

You can also use the official Unicode class names with the p and P, like p{L} for Unicode 'letters', or p{Lu} for uppercase letters, or P{Nd} for non-digits. If a name is just one letter, the braces can be dropped. For instance, pM is the character class of Unicode 'marks'.

@@ -1441,16 +1441,16 @@

w), and blank (a GNU

extension). If utf8 is being used, then these

classes are defined the same as their corresponding perl

Unicode classes: [[:upper:] is the same as

-p{IsUpper}, etc. The POSIX character

+p{! IsUpper}, etc. The POSIX character

classes, however, don't require using utf8. The

[[:digit:], [[:word:], and

[[:space:] correspond to the familiar d,

w, and s character classes. To negate a

POSIX class, put a ^ in front of the

name, so that, e.g., [[:^digit:] corresponds to

-D and under utf8, P{IsDigit}. The

+D and under utf8, P{! IsDigit}. The

Unicode and POSIX character classes can be

used just like d, both inside and outside of

character classes: