Penguin
Blame: RegularExpression
EditPageHistoryDiffInfoLikePages
Annotated edit history of RegularExpression version 11, including all changes. View license author blame.
Rev Author # Line
5 JohnMcPherson 1 A RegularExpression is a way of describing search patterns. The letters A-Z and the numbers 0-9 match themselves (case sensitively) and "." matches the "any character". "*" matches the previous charactor zero or more times. \ prevents the next character from having special meaning. so:
2 a*b..e
3 (any number of a's, followed by a b, two characters and an e)
4 will match:
5 aaaaabcde
6 and:
7 bbcde
8
9 but not:
10 axbcde
10 PerryLorier 11
12 Quick cheatsheet
11 AlastairPorter 13 |^__Character__|^__Matches__
14 | . | Any single character
10 PerryLorier 15 | ^ | Beginning of line
16 | $ | End of line
11 AlastairPorter 17 | \''any character'' | Match ''any character'' exactly (even if it's a special character)
18 | [[''character group''] | Any single character in the group
10 PerryLorier 19
20 Things that alter the previous expression
21
11 AlastairPorter 22 |^__Character__|^__Alteration__
10 PerryLorier 23 | ? | Match the previous expression exactly zero or one times
24 | * | Match the previous expression zero or more times
25 | + | Match the previous expression one or more times
26
5 JohnMcPherson 27
28 regex(7) explains all the neat things you can do with [RegularExpression]s and the different types. perlre(1) explains perl's extended regex's.
29 -----
30 grep(1) is a command to look for a regex in a file. eg:
31 grep 'foo' /tmp/baz.txt
32 will look for the string "foo" in /tmp/baz.txt. More usefully:
33 grep 'wlug\.linuxcare\.co\.nz' *
34 will search for every occurance of "wlug.linuxcare.co.nz" in all the files in this directory.
35 -----
36 sed(1) is a "__s__cript __ed__itor" which uses regex's. sed is usually used for it's amazing search and replace capability. for (simple) example:
37 sed 's/foo/baz/g' <a.txt >b.txt
38 will search for "foo" and replace it with "baz" in a.txt and output the result in b.txt
6 FelipeAlmeida 39 -----
8 MattBrown 40 perl(1) can also be used for inplace substitutions like so
9 MattBrown 41 perl -pi -e 's/foo/bar/g' a.txt
8 MattBrown 42 will replace all instances of "foo" with "bar" in a.txt
6 FelipeAlmeida 43
8 MattBrown 44 See also:
45 * perlrun(1)
5 JohnMcPherson 46 -----
47 awk(1) is a tool for doing processing on record orientated files. It allows you to specify different actions to perform based on regex's.
48 -----
49 See also: File [Glob]s
50
51 Tricks and Traps:
52 * When specifying regex's on the command line, surround them in single quotes "'", it's just easier that way.
53 -----
54 !!Examples of single-character expressions
55
56
57 To match any lowercase vowel:
58 /[[aeiou]/
59
60 To match any lowercase or uppercase vowel:
61 /[[aeiouAEIOU]/
62
63 To match any single digit:
64 /[[0123456789]/
65
66 The same thing:
67 /[[0-9]/
68
69 Any single digit or minus:
70 /[[0-9\-]/
71
72 Any lowercase letter:
73 /[[a-z]/
74
75 The ^ character can be used to negate a [] pattern:
76
77 To match anything __except__ a lowercase letter:
78 /[[^a-z]/
79
80 To match anything __except__ a lowercase or uppercase letter, digit or underscore:
81 /[[^a-zA-Z0-9_]/
82
83 These can be used with * too, so:
84
85 /[[0-9]*/
86
87 matches any number of digits, including no digits.
88
89 !!Character abbreviations:
90 Note: These apply to perl regular expressions. They will most likely work in other regex parsers such as sed, but there may be subtle differences.
91
92 To match any digit:
93 /[[\d]/
94 (Equivalent to /[[0-9]/)
95
96 To match any 'word' character:
97 /[[\w]/
98 (Equivalent to /[[a-zA-Z0-9_]/)
99
100 To match any space character:
101 /[[\s]/
102 (Equivalent to /[[ \r\t\n\f]/)
103
104 \D, \W and \S are the negated versions of \d, \w and \s:
105
106 /[[\D]/ is equivalent to /[[^0-9]/
107
108 /[[\W]/ is equivalent to /[[^a-zA-Z0-9_]/
109
110 /[[\S]/ is equivalent to /[[^ \r\t\n\f]/
111 ----
112 !Perl
113
114 As mentioned before, perl uses an extended regular expression syntax explained on the perlre(1) man page.
115
116 Having said that though, here are some hints.
117
118 __DON'T__ interprete variables as a regular expression
119
120 if we have
121 $text='abc[[c]defc';
122 $search='[[c]';
123 then
124 $text =~ s/\Q$search\E/XX/ ;
125 would replace the substring '[[c]' in $text with "XX", while
126 $text =~ s/$search/XX/ ;
127 would replace all the occurrences of the character 'c' with "XX".