Penguin
A RegularExpression is a way of describing search patterns. The letters A-Z and the numbers 0-9 match themselves (case sensitively) and "." matches the "any character". "*" matches the previous charactor zero or more times. \ prevents the next character from having special meaning. so
a*b..e

(any number of a's, followed by a b, two characters and an e)

will match
aaaaabcde
and
bbcde
but not
axbcde

Quick cheatsheet |^Character|^Matches | . | Any single character | ^ | Beginning of line | $ | End of line | \any character | Match any character exactly (even if it's a special character) | [''character group''? | Any single character in the group

Things that alter the previous expression

|^Character|^Alteration | ? | Match the previous expression exactly zero or one times | * | Match the previous expression zero or more times | + | Match the previous expression one or more times

regex(7) explains all the neat things you can do with RegularExpressions and the different types. perlre(1) explains perl's extended regex's.


grep(1) is a command to look for a regex in a file. eg
grep 'foo' /tmp/baz.txt
will look for the string "foo" in /tmp/baz.txt. More usefully
grep 'wlug\.linuxcare\.co\.nz' *

will search for every occurance of "wlug.linuxcare.co.nz" in all the files in this directory.


sed(1) is a "script editor" which uses regex's. sed is usually used for it's amazing search and replace capability. for (simple) example
sed 's/foo/baz/g' <a.txt >b.txt

will search for "foo" and replace it with "baz" in a.txt and output the result in b.txt


perl(1) can also be used for inplace substitutions like so

perl -pi -e 's/foo/bar/g' a.txt

will replace all instances of "foo" with "bar" in a.txt

See also:


awk(1) is a tool for doing processing on record orientated files. It allows you to specify different actions to perform based on regex's.


See also: File Globs

Tricks and Traps:

  • When specifying regex's on the command line, surround them in single quotes "'", it's just easier that way.

Examples of single-character expressions

To match any lowercase vowel: /[aeiou?/

To match any lowercase or uppercase vowel: /[aeiouAEIOU?/

To match any single digit: /[0123456789?/

The same thing: /[0-9?/

Any single digit or minus: /[0-9\-?/

Any lowercase letter: /[a-z?/

The ^ character can be used to negate a [] pattern:

To match anything except a lowercase letter: /[^a-z?/

To match anything except a lowercase or uppercase letter, digit or underscore: /[^a-zA-Z0-9_?/

These can be used with * too, so:

/[0-9?*/

matches any number of digits, including no digits.

Character abbreviations:

Note: These apply to perl regular expressions. They will most likely work in other regex parsers such as sed, but there may be subtle differences.

To match any digit: /[\d?/ (Equivalent to /[0-9?/)

To match any 'word' character: /[\w?/ (Equivalent to /[a-zA-Z0-9_?/)

To match any space character: /[\s?/ (Equivalent to /[ \r\t\n\f?/)

\D, \W and \S are the negated versions of \d, \w and \s:

/[\D?/ is equivalent to /[^0-9?/

/[\W?/ is equivalent to /[^a-zA-Z0-9_?/

/[\S?/ is equivalent to /[^ \r\t\n\f?/


Perl

As mentioned before, perl uses an extended regular expression syntax explained on the perlre(1) man page.

Having said that though, here are some hints.

DON'T interprete variables as a regular expression

if we have

$text='abc[c?defc'; $search='[c?';

then

$text = s/\Q$search\E/XX/ ;

would replace the substring '[c?' in $text with "XX", while

$text = s/$search/XX/ ;

would replace all the occurrences of the character 'c' with "XX".