View Source: RegularExpression - Waikato Linux Users Group

Edit PageHistory Diff Info LikePages
A RegularExpression is a way of describing search patterns.  The letters A-Z and the numbers 0-9 match themselves (case sensitively) and "." matches the "any character".  "*" matches the previous charactor zero or more times. \ prevents the next character from having special meaning. so:
 a*b..e
(any number of a's, followed by a b, two characters and an e)
will match:
 aaaaabcde
and:
 bbcde

but not:
 axbcde

Quick cheatsheet
|^__Character__|^__Matches__
| . | Any single character
| ^ | Beginning of line
| $ | End of line
| \''any character'' | Match ''any character'' exactly (even if it's a special character)
| [[''character group''] | Any single character in the group

Things that alter the previous expression

|^__Character__|^__Alteration__
| ? | Match the previous expression exactly zero or one times
| * | Match the previous expression zero or more times
| + | Match the previous expression one or more times


regex(7) explains all the neat things you can do with [RegularExpression]s and the different types.  perlre(1) explains perl's extended regex's.
-----
grep(1) is a command to look for a regex in a file.  eg:
 grep 'foo' /tmp/baz.txt
will look for the string "foo" in /tmp/baz.txt.  More usefully:
 grep 'wlug\.linuxcare\.co\.nz' *
will search for every occurance of "wlug.linuxcare.co.nz" in all the files in this directory.
-----
sed(1) is a "__s__cript __ed__itor" which uses regex's.  sed is usually used for it's amazing search and replace capability.  for (simple) example:
 sed 's/foo/baz/g' <a.txt >b.txt
will search for "foo" and replace it with "baz" in a.txt and output the result in b.txt
-----
perl(1) can also be used for inplace substitutions like so
 perl -pi -e 's/foo/bar/g' a.txt
will replace all instances of "foo" with "bar" in a.txt

See also:
* perlrun(1)
-----
awk(1) is a tool for doing processing on record orientated files.  It allows you to specify different actions to perform based on regex's.
-----
See also: File [Glob]s

Tricks and Traps:
* When specifying regex's on the command line, surround them in single quotes "'", it's just easier that way.
-----
!!Examples of single-character expressions


To match any lowercase vowel:
/[[aeiou]/

To match any lowercase or uppercase vowel:
/[[aeiouAEIOU]/

To match any single digit:
/[[0123456789]/

The same thing:
/[[0-9]/

Any single digit or minus:
/[[0-9\-]/

Any lowercase letter:
/[[a-z]/

The ^ character can be used to negate a [] pattern:

To match anything __except__ a lowercase letter:
/[[^a-z]/

To match anything __except__ a lowercase or uppercase letter, digit or underscore:
/[[^a-zA-Z0-9_]/

These can be used with * too, so:

/[[0-9]*/

matches any number of digits, including no digits.

!!Character abbreviations:
Note: These apply to perl regular expressions.  They will most likely work in other regex parsers such as sed, but there may be subtle differences.

To match any digit:
/[[\d]/
(Equivalent to /[[0-9]/)

To match any 'word' character:
/[[\w]/
(Equivalent to /[[a-zA-Z0-9_]/)

To match any space character:
/[[\s]/
(Equivalent to /[[ \r\t\n\f]/)

\D, \W and \S are the negated versions of \d, \w and \s:

/[[\D]/ is equivalent to /[[^0-9]/

/[[\W]/ is equivalent to /[[^a-zA-Z0-9_]/

/[[\S]/ is equivalent to /[[^ \r\t\n\f]/
----
!Perl

As mentioned before, perl uses an extended regular expression syntax explained on the perlre(1) man page.

Having said that though, here are some hints.

__DON'T__ interprete variables as a regular expression

if we have
 $text='abc[[c]defc';
 $search='[[c]';
then
 $text =~ s/\Q$search\E/XX/ ;
would replace the substring '[[c]' in $text with "XX", while
 $text =~ s/$search/XX/ ;
would replace all the occurrences of the character 'c' with "XX".
12 pages link to RegularExpression:
Last edited on Thursday, July 6, 2006 10:47:58 pm by AlastairPorter
Edit PageHistory Diff Info LikePages