Regular Expression - Waikato Linux Users Group

A RegularExpression is a way of describing search patterns. The letters A-Z and the numbers 0-9 match themselves (case sensitively) and "." matches the "any character". "*" matches the previous charactor zero or more times. \ prevents the next character from having special meaning. so: a*b..e

(any number of a's, followed by a b, two characters and an e)

Things that alter the previous expression

regex(7) explains all the neat things you can do with RegularExpressions and the different types. perlre(1) explains perl's extended regex's.

grep(1) is a command to look for a regex in a file. eg: grep 'foo' /tmp/baz.txt
will look for the string "foo" in /tmp/baz.txt. More usefully: grep 'wlug\.linuxcare\.co\.nz' *

will search for every occurance of "wlug.linuxcare.co.nz" in all the files in this directory.

sed(1) is a "script editor" which uses regex's. sed is usually used for it's amazing search and replace capability. for (simple) example: sed 's/foo/baz/g' <a.txt >b.txt

will search for "foo" and replace it with "baz" in a.txt and output the result in b.txt

perl(1) can also be used for inplace substitutions like so

perl -pi -e 's/foo/bar/g' a.txt

will replace all instances of "foo" with "bar" in a.txt

To match any lowercase vowel: /[aeiou?/

To match any lowercase or uppercase vowel: /[aeiouAEIOU?/

To match any single digit: /[0123456789?/

The same thing: /[0-9?/

Any single digit or minus: /[0-9\-?/

Any lowercase letter: /[a-z?/

The ^ character can be used to negate a [] pattern:

To match anything except a lowercase letter: /[^a-z?/

To match anything except a lowercase or uppercase letter, digit or underscore: /[^a-zA-Z0-9_?/

These can be used with * too, so:

/[0-9?*/

matches any number of digits, including no digits.

Note: These apply to perl regular expressions. They will most likely work in other regex parsers such as sed, but there may be subtle differences.

To match any digit: /[\d?/ (Equivalent to /[0-9?/)

To match any 'word' character: /[\w?/ (Equivalent to /[a-zA-Z0-9_?/)

To match any space character: /[\s?/ (Equivalent to /[ \r\t\n\f?/)

\D, \W and \S are the negated versions of \d, \w and \s:

/[\D?/ is equivalent to /[^0-9?/

/[\W?/ is equivalent to /[^a-zA-Z0-9_?/

/[\S?/ is equivalent to /[^ \r\t\n\f?/

As mentioned before, perl uses an extended regular expression syntax explained on the perlre(1) man page.

Having said that though, here are some hints.

DON'T interprete variables as a regular expression

if we have

$text='abc[c?defc'; $search='[c?';

then

$text = s/\Q$search\E/XX/ ;

would replace the substring '[c?' in $text with "XX", while

$text = s/$search/XX/ ;

would replace all the occurrences of the character 'c' with "XX".

12 pages link to RegularExpression:

Last edited on Thursday, July 6, 2006 10:47:58 pm by AlastairPorter