version 11, including all changes.
.
Rev |
Author |
# |
Line |
5 |
JohnMcPherson |
1 |
A RegularExpression is a way of describing search patterns. The letters A-Z and the numbers 0-9 match themselves (case sensitively) and "." matches the "any character". "*" matches the previous charactor zero or more times. \ prevents the next character from having special meaning. so: |
|
|
2 |
a*b..e |
|
|
3 |
(any number of a's, followed by a b, two characters and an e) |
|
|
4 |
will match: |
|
|
5 |
aaaaabcde |
|
|
6 |
and: |
|
|
7 |
bbcde |
|
|
8 |
|
|
|
9 |
but not: |
|
|
10 |
axbcde |
10 |
PerryLorier |
11 |
|
|
|
12 |
Quick cheatsheet |
11 |
AlastairPorter |
13 |
|^__Character__|^__Matches__ |
|
|
14 |
| . | Any single character |
10 |
PerryLorier |
15 |
| ^ | Beginning of line |
|
|
16 |
| $ | End of line |
11 |
AlastairPorter |
17 |
| \''any character'' | Match ''any character'' exactly (even if it's a special character) |
|
|
18 |
| [[''character group''] | Any single character in the group |
10 |
PerryLorier |
19 |
|
|
|
20 |
Things that alter the previous expression |
|
|
21 |
|
11 |
AlastairPorter |
22 |
|^__Character__|^__Alteration__ |
10 |
PerryLorier |
23 |
| ? | Match the previous expression exactly zero or one times |
|
|
24 |
| * | Match the previous expression zero or more times |
|
|
25 |
| + | Match the previous expression one or more times |
|
|
26 |
|
5 |
JohnMcPherson |
27 |
|
|
|
28 |
regex(7) explains all the neat things you can do with [RegularExpression]s and the different types. perlre(1) explains perl's extended regex's. |
|
|
29 |
----- |
|
|
30 |
grep(1) is a command to look for a regex in a file. eg: |
|
|
31 |
grep 'foo' /tmp/baz.txt |
|
|
32 |
will look for the string "foo" in /tmp/baz.txt. More usefully: |
|
|
33 |
grep 'wlug\.linuxcare\.co\.nz' * |
|
|
34 |
will search for every occurance of "wlug.linuxcare.co.nz" in all the files in this directory. |
|
|
35 |
----- |
|
|
36 |
sed(1) is a "__s__cript __ed__itor" which uses regex's. sed is usually used for it's amazing search and replace capability. for (simple) example: |
|
|
37 |
sed 's/foo/baz/g' <a.txt >b.txt |
|
|
38 |
will search for "foo" and replace it with "baz" in a.txt and output the result in b.txt |
6 |
FelipeAlmeida |
39 |
----- |
8 |
MattBrown |
40 |
perl(1) can also be used for inplace substitutions like so |
9 |
MattBrown |
41 |
perl -pi -e 's/foo/bar/g' a.txt |
8 |
MattBrown |
42 |
will replace all instances of "foo" with "bar" in a.txt |
6 |
FelipeAlmeida |
43 |
|
8 |
MattBrown |
44 |
See also: |
|
|
45 |
* perlrun(1) |
5 |
JohnMcPherson |
46 |
----- |
|
|
47 |
awk(1) is a tool for doing processing on record orientated files. It allows you to specify different actions to perform based on regex's. |
|
|
48 |
----- |
|
|
49 |
See also: File [Glob]s |
|
|
50 |
|
|
|
51 |
Tricks and Traps: |
|
|
52 |
* When specifying regex's on the command line, surround them in single quotes "'", it's just easier that way. |
|
|
53 |
----- |
|
|
54 |
!!Examples of single-character expressions |
|
|
55 |
|
|
|
56 |
|
|
|
57 |
To match any lowercase vowel: |
|
|
58 |
/[[aeiou]/ |
|
|
59 |
|
|
|
60 |
To match any lowercase or uppercase vowel: |
|
|
61 |
/[[aeiouAEIOU]/ |
|
|
62 |
|
|
|
63 |
To match any single digit: |
|
|
64 |
/[[0123456789]/ |
|
|
65 |
|
|
|
66 |
The same thing: |
|
|
67 |
/[[0-9]/ |
|
|
68 |
|
|
|
69 |
Any single digit or minus: |
|
|
70 |
/[[0-9\-]/ |
|
|
71 |
|
|
|
72 |
Any lowercase letter: |
|
|
73 |
/[[a-z]/ |
|
|
74 |
|
|
|
75 |
The ^ character can be used to negate a [] pattern: |
|
|
76 |
|
|
|
77 |
To match anything __except__ a lowercase letter: |
|
|
78 |
/[[^a-z]/ |
|
|
79 |
|
|
|
80 |
To match anything __except__ a lowercase or uppercase letter, digit or underscore: |
|
|
81 |
/[[^a-zA-Z0-9_]/ |
|
|
82 |
|
|
|
83 |
These can be used with * too, so: |
|
|
84 |
|
|
|
85 |
/[[0-9]*/ |
|
|
86 |
|
|
|
87 |
matches any number of digits, including no digits. |
|
|
88 |
|
|
|
89 |
!!Character abbreviations: |
|
|
90 |
Note: These apply to perl regular expressions. They will most likely work in other regex parsers such as sed, but there may be subtle differences. |
|
|
91 |
|
|
|
92 |
To match any digit: |
|
|
93 |
/[[\d]/ |
|
|
94 |
(Equivalent to /[[0-9]/) |
|
|
95 |
|
|
|
96 |
To match any 'word' character: |
|
|
97 |
/[[\w]/ |
|
|
98 |
(Equivalent to /[[a-zA-Z0-9_]/) |
|
|
99 |
|
|
|
100 |
To match any space character: |
|
|
101 |
/[[\s]/ |
|
|
102 |
(Equivalent to /[[ \r\t\n\f]/) |
|
|
103 |
|
|
|
104 |
\D, \W and \S are the negated versions of \d, \w and \s: |
|
|
105 |
|
|
|
106 |
/[[\D]/ is equivalent to /[[^0-9]/ |
|
|
107 |
|
|
|
108 |
/[[\W]/ is equivalent to /[[^a-zA-Z0-9_]/ |
|
|
109 |
|
|
|
110 |
/[[\S]/ is equivalent to /[[^ \r\t\n\f]/ |
|
|
111 |
---- |
|
|
112 |
!Perl |
|
|
113 |
|
|
|
114 |
As mentioned before, perl uses an extended regular expression syntax explained on the perlre(1) man page. |
|
|
115 |
|
|
|
116 |
Having said that though, here are some hints. |
|
|
117 |
|
|
|
118 |
__DON'T__ interprete variables as a regular expression |
|
|
119 |
|
|
|
120 |
if we have |
|
|
121 |
$text='abc[[c]defc'; |
|
|
122 |
$search='[[c]'; |
|
|
123 |
then |
|
|
124 |
$text =~ s/\Q$search\E/XX/ ; |
|
|
125 |
would replace the substring '[[c]' in $text with "XX", while |
|
|
126 |
$text =~ s/$search/XX/ ; |
|
|
127 |
would replace all the occurrences of the character 'c' with "XX". |