Regular Expression - Waikato Linux Users Group

Note: You are viewing an old revision of this page. View the current version.

A RegularExpression is a way of describing search patterns. The letters A-Z and the numbers 0-9 match themselves (case sensitively) and "." matches the "any character". "*" matches the previous charactor zero or more times. \ prevents the next character from having special meaning. so: a*b..e

(any number of a's, followed by a b, two characters and an e)

regex(7) explains all the neat things you can do with RegularExpressions and the different types. perlre(1) explains perl's extended regex's.

grep(1) is a command to look for a regex in a file. eg: grep 'foo' /tmp/baz.txt
will look for the string "foo" in /tmp/baz.txt. More usefully: grep 'wlug\.linuxcare\.co\.nz' *

will search for every occurance of "wlug.linuxcare.co.nz" in all the files in this directory.

sed(1) is a "script editor" which uses regex's. sed is usually used for it's amazing search and replace capability. for (simple) example: sed 's/foo/baz/g' <a.txt >b.txt

will search for "foo" and replace it with "baz" in a.txt and output the result in b.txt

awk(1) is a tool for doing processing on record orientated files. It allows you to specify different actions to perform based on regex's.

To match any lowercase vowel: /[aeiou?/

To match any lowercase or uppercase vowel: /[aeiouAEIOU?/

To match any single digit: /[0123456789?/

The same thing: /[0-9?/

Any single digit or minus: /[0-9\-?/

Any lowercase letter: /[a-z?/

The ^ character can be used to negate a [] pattern:

To match anything except a lowercase letter: /[^a-z?/

To match anything except a lowercase or uppercase letter, digit or underscore: /[^a-zA-Z0-9_?/

These can be used with * too, so:

/[0-9?*/

matches any number of digits, including no digits.

Note: These apply to perl regular expressions. They will most likely work in other regex parsers such as sed, but there may be subtle differences.

To match any digit: /[\d?/ (Equivalent to /[0-9?/)

To match any 'word' character: /[\w?/ (Equivalent to /[a-zA-Z0-9_?/)

To match any space character: /[\s?/ (Equivalent to /[ \r\t\n\f?/)

\D, \W and \S are the negated versions of \d, \w and \s:

/[\D?/ is equivalent to /[^0-9?/

/[\W?/ is equivalent to /[^a-zA-Z0-9_?/

/[\S?/ is equivalent to /[^ \r\t\n\f?/

As mentioned before, perl uses an extended regular expression syntax explained on the perlre(1) man page.

Having said that though, here are some hints.

DON'T interprete variables as a regular expression

if we have

$text='abc[c?defc'; $search='[c?';

then

$text = s/\Q$search\E/XX/ ;

would replace the substring '[c?' in $text with "XX", while

$text = s/$search/XX/ ;

would replace all the occurrences of the character 'c' with "XX".

12 pages link to RegularExpression:

Version 5, saved on Friday, March 14, 2003 12:04:34 pm by JohnMcPherson