Diff: RegularExpression - Waikato Linux Users Group

Differences between version 2 and predecessor to the previous major change of RegularExpression.

Other diffs: Previous Revision, Previous Author, or view the Annotated Edit History

Newer page:	version 2	Last edited on Monday, March 10, 2003 2:19:00 am	by AristotlePagaltzis	Revert
Older page:	version 1	Last edited on Sunday, March 9, 2003 10:22:04 pm	by JohnMcPherson	Revert

@@ -1 +1,87 @@

-~~See RegularExpressions~~ for ~~description~~ and ~~examples~~ .

+A RegularExpression is a way of describing search patterns. The letters A-Z and the numbers -9 match themselves (case sensitively) and "." matches the "any character". "*" matches the previous charactor zero or more times. \ prevents the next character from having special meaning. so:

+ a*b..e

+(any number of a's, followed by a b, two characters and an e)

+will match:

+ aaaaabcde

+and:

+ bbcde

+

+but not:

+ axbcde

+

+regex(7) explains all the neat things you can do with [RegularExpression]s and the different types. perlre(1) explains perl's extended regex's.

+-----

+grep(1) is a command to look for a regex in a file. eg:

+ grep 'foo' /tmp/baz.txt

+will look for the string "foo" in /tmp/baz.txt. More usefully:

+ grep 'wlug\.linuxcare\.co\.nz' *

+will search for every occurance of "wlug.linuxcare.co.nz" in all the files in this directory.

+-----

+sed(1) is a "__s__cript __ed__itor" which uses regex's. sed is usually used for it's amazing search and replace capability . for (simple) example:

+ sed 's/foo/baz/g' <a.txt >b.txt

+will search for "foo" and replace it with "baz" in a.txt and output the result in b.txt

+-----

+awk(1) is a tool for doing processing on record orientated files. It allows you to specify different actions to perform based on regex's.

+-----

+See also: File [Glob]s

+

+Tricks and Traps:

+* When specifying regex's on the command line, surround them in single quotes "'", it's just easier that way.

+-----

+!!Examples of single-character expressions

+

+To match any lowercase vowel:

+/[[aeiou]/

+

+To match any lowercase or uppercase vowel:

+/[[aeiouAEIOU]/

+

+To match any single digit:

+/[[0123456789]/

+

+The same thing:

+/[[-9]/

+

+Any single digit or minus:

+/[[-9\-]/

+

+Any lowercase letter:

+/[[a-z]/

+

+The ^ character can be used to negate a [] pattern:

+

+To match anything __except__ a lowercase letter:

+/[[^a-z]/

+

+To match anything __except__ a lowercase or uppercase letter, digit or underscore:

+/[[^a-zA-Z0-9_]/

+

+These can be used with * too, so:

+

+/[[-9]*/

+

+matches any number of digits, including no digits.

+

+!!Character abbreviations:

+Note: These apply to perl regular expressions. They will most likely work in other regex parsers such as sed, but there may be subtle differences.

+

+To match any digit:

+/[[\d]/

+(Equivalent to /[[-9]/)

+

+To match any 'word' character:

+/[[\w]/

+(Equivalent to /[[a-zA-Z0-9_]/)

+

+To match any space character:

+/[[\s]/

+(Equivalent to /[[ \r\t\n\f]/)

+

+\D, \W and \S are the negated versions of \d, \w and \s:

+

+/[[\D]/ is equivalent to /[[^-9]/

+

+/[[\W]/ is equivalent to /[[^a-zA-Z0-9_]/

+

+/[[\S]/ is equivalent to /[[^ \r\t\n\f]/