Penguin
Note: You are viewing an old revision of this page. View the current version.
A RegularExpression is a way of describing search patterns. The letters A-Z and the numbers 0-9 match themselves (case sensitively) and "." matches the "any character". "*" matches the previous charactor zero or more times. \ prevents the next character from having special meaning. so
a*b..e

(any number of a's, followed by a b, two characters and an e)

will match
aaaaabcde
and
bbcde
but not
axbcde

regex(7) explains all the neat things you can do with RegularExpressions and the different types. perlre(1) explains perl's extended regex's.


grep(1) is a command to look for a regex in a file. eg
grep 'foo' /tmp/baz.txt
will look for the string "foo" in /tmp/baz.txt. More usefully
grep 'wlug\.linuxcare\.co\.nz' *

will search for every occurance of "wlug.linuxcare.co.nz" in all the files in this directory.


sed(1) is a "script editor" which uses regex's. sed is usually used for it's amazing search and replace capability. for (simple) example
sed 's/foo/baz/g' <a.txt >b.txt

will search for "foo" and replace it with "baz" in a.txt and output the result in b.txt


awk(1) is a tool for doing processing on record orientated files. It allows you to specify different actions to perform based on regex's.


See also: File Globs

Tricks and Traps:

  • When specifying regex's on the command line, surround them in single quotes "'", it's just easier that way.

Examples of single-character expressions

To match any lowercase vowel: /[aeiou?/

To match any lowercase or uppercase vowel: /[aeiouAEIOU?/

To match any single digit: /[0123456789?/

The same thing: /[0-9?/

Any single digit or minus: /[0-9\-?/

Any lowercase letter: /[a-z?/

The ^ character can be used to negate a [] pattern:

To match anything except a lowercase letter: /[^a-z?/

To match anything except a lowercase or uppercase letter, digit or underscore: /[^a-zA-Z0-9_?/

These can be used with * too, so:

/[0-9?*/

matches any number of digits, including no digits.

Character abbreviations:

Note: These apply to perl regular expressions. They will most likely work in other regex parsers such as sed, but there may be subtle differences.

To match any digit: /[\d?/ (Equivalent to /[0-9?/)

To match any 'word' character: /[\w?/ (Equivalent to /[a-zA-Z0-9_?/)

To match any space character: /[\s?/ (Equivalent to /[ \r\t\n\f?/)

\D, \W and \S are the negated versions of \d, \w and \s:

/[\D?/ is equivalent to /[^0-9?/

/[\W?/ is equivalent to /[^a-zA-Z0-9_?/

/[\S?/ is equivalent to /[^ \r\t\n\f?/


Perl

As mentioned before, perl uses an extended regular expression syntax explained on the perlre(1) man page.

Having said that though, here are some hints.

DON'T interprete variables as a regular expression

if we have

$text='abc[c?defc'; $search='[c?';

then

$text = s/\Q$search\E/XX/ ;

would replace the substring '[c?' in $text with "XX", while

$text = s/$search/XX/ ;

would replace all the occurrences of the character 'c' with "XX".