Rev | Author | # | Line |
---|---|---|---|
1 | perry | 1 | GLOB |
2 | !!!GLOB | ||
3 | NAME | ||
4 | DESCRIPTION | ||
5 | WILDCARD MATCHING | ||
6 | PATHNAMES | ||
7 | EMPTY LISTS | ||
8 | NOTES | ||
9 | SEE ALSO | ||
10 | ---- | ||
11 | !!NAME | ||
12 | |||
13 | |||
14 | glob - Globbing pathnames | ||
15 | !!DESCRIPTION | ||
16 | |||
17 | |||
18 | Long ago, in Unix V6, there was a program ''/etc/glob'' | ||
19 | that would expand wildcard patterns. Soon afterwards this | ||
20 | became a shell built-in. | ||
21 | |||
22 | |||
23 | These days there is also a library routine glob(3) | ||
24 | that will perform this function for a user | ||
25 | program. | ||
26 | |||
27 | |||
28 | The rules are as follows (POSIX 1003.2, 3.13). | ||
29 | !!WILDCARD MATCHING | ||
30 | |||
31 | |||
32 | A string is a wildcard pattern if it contains one of the | ||
33 | characters `?', `*' or `[['. Globbing is the operation that | ||
34 | expands a wildcard pattern into the list of pathnames | ||
35 | matching the pattern. Matching is defined by: | ||
36 | |||
37 | |||
38 | A `?' (not between brackets) matches any single | ||
39 | character. | ||
40 | |||
41 | |||
42 | A `*' (not between brackets) matches any string, including | ||
43 | the empty string. | ||
44 | |||
45 | |||
46 | __Character classes__ | ||
47 | |||
48 | |||
49 | An expression `[[...]' where the first character after the | ||
50 | leading `[[' is not an `!' matches a single character, namely | ||
51 | any of the characters enclosed by the brackets. The string | ||
52 | enclosed by the brackets cannot be empty; therefore `]' can | ||
53 | be allowed between the brackets, provided that it is the | ||
54 | first character. (Thus, `[[][[!]' matches the three characters | ||
55 | `[[', `]' and `!'.) | ||
56 | |||
57 | |||
58 | __Ranges__ | ||
59 | |||
60 | |||
61 | There is one special convention: two characters separated by | ||
62 | `-' denote a range. (Thus, `[[A-Fa-f0-9]' is equivalent to | ||
63 | `[[ABCDEFabcdef0123456789]'.) One may include `-' in its | ||
64 | literal meaning by making it the first or last character | ||
65 | between the brackets. (Thus, `[[]-]' matches just the two | ||
66 | characters `]' and `-', and `[[--/]' matches the three | ||
67 | characters `-', `.', `/'.) | ||
68 | |||
69 | |||
70 | __Complementation__ | ||
71 | |||
72 | |||
73 | An expression `[[!...]' matches a single character, namely | ||
74 | any character that is not matched by the expression obtained | ||
75 | by removing the first `!' from it. (Thus, `[[!]a-]' matches | ||
76 | any single character except `]', `a' and `-'.) | ||
77 | |||
78 | |||
79 | One can remove the special meaning of `?', `*' and `[[' by | ||
80 | preceding them by a backslash, or, in case this is part of a | ||
81 | shell command line, enclosing them in quotes. Between | ||
82 | brackets these characters stand for themselves. Thus, | ||
83 | `[[[[?*]' matches the four characters `[[', `?', `*' and | ||
84 | `'. | ||
85 | !!PATHNAMES | ||
86 | |||
87 | |||
88 | Globbing is applied on each of the components of a pathname | ||
89 | separately. A `/' in a pathname cannot be matched by a `?' | ||
90 | or `*' wildcard, or by a range like `[[.-0]'. A range cannot | ||
91 | contain an explicit `/' character; this would lead to a | ||
92 | syntax error. | ||
93 | |||
94 | |||
95 | If a filename starts with a `.', this character must be | ||
96 | matched explicitly. (Thus, `rm *' will not remove .profile, | ||
97 | and `tar c *' will not archive all your files; `tar c .' is | ||
98 | better.) | ||
99 | !!EMPTY LISTS | ||
100 | |||
101 | |||
102 | The nice and simple rule given above: `expand a wildcard | ||
103 | pattern into the list of matching pathnames' was the | ||
104 | original Unix definition. It allowed one to have patterns | ||
105 | that expand into an empty list, as in | ||
106 | |||
107 | |||
108 | xv -wait 0 *.gif *.jpg | ||
109 | where perhaps no *.gif files are present (and this is not an error). However, POSIX requires that a wildcard pattern is left unchanged when it is syntactically incorrect, or the list of matching pathnames is empty. With ''bash'' one can force the classical behaviour by setting ''allow_null_glob_expansion=true''. | ||
110 | |||
111 | |||
112 | (Similar problems occur elsewhere. E.g., where old scripts | ||
113 | have | ||
114 | |||
115 | |||
116 | rm `find . -name | ||
117 | new scripts require | ||
118 | |||
119 | |||
120 | rm -f nosuchfile `find . -name | ||
121 | to avoid error messages from ''rm'' called with an empty argument list.) | ||
122 | !!NOTES | ||
123 | |||
124 | |||
125 | __Regular expressions__ | ||
126 | |||
127 | |||
128 | Note that wildcard patterns are not regular expressions, | ||
129 | although they are a bit similar. First of all, they match | ||
130 | filenames, rather than text, and secondly, the conventions | ||
131 | are not the same: e.g., in a regular expression `*' means | ||
132 | zero or more copies of the preceding thing. | ||
133 | |||
134 | |||
135 | Now that regular expressions have bracket expressions where | ||
136 | the negation is indicated by a `^', POSIX has declared the | ||
137 | effect of a wildcard pattern `[[^...]' to be | ||
138 | undefined. | ||
139 | |||
140 | |||
141 | __Character classes and | ||
142 | Internationalization__ | ||
143 | |||
144 | |||
145 | Of course ranges were originally meant to be ASCII ranges, | ||
146 | so that `[[ -%]' stands for `[[ ! | ||
147 | |||
148 | |||
149 | (iii) Ranges X-Y comprise all characters that fall between X | ||
150 | and Y (inclusive) in the currect collating sequence as | ||
151 | defined by the LC_COLLATE category in the current | ||
152 | locale. | ||
153 | |||
154 | |||
155 | (iv) Named character classes, like | ||
156 | |||
157 | |||
158 | [[:alnum:] [[:alpha:] [[:blank:] [[:cntrl:] | ||
159 | [[:digit:] [[:graph:] [[:lower:] [[:print:] | ||
160 | [[:punct:] [[:space:] [[:upper:] [[:xdigit:] | ||
161 | so that one can say `[[[[:lower:]]' instead of `[[a-z]', and have things work in Denmark, too, where there are three letters past `z' in the alphabet. These character classes are defined by the LC_CTYPE category in the current locale. | ||
162 | |||
163 | |||
164 | (v) Collating symbols, like `[[.ch.]' or `[[.a-acute.]', where | ||
165 | the string between `[[.' and `.]' is a collating element | ||
166 | defined for the current locale. Note that this may be a | ||
167 | multi-character element. | ||
168 | |||
169 | |||
170 | (vi) Equivalence class expressions, like `[[=a=]', where the | ||
171 | string between `[[=' and `=]' is any collating element from | ||
172 | its equivalence class, as defined for the current locale. | ||
173 | For example, `[[[[=a=]]' might be equivalent to | ||
174 | `[[a | ||
175 | !!SEE ALSO | ||
176 | |||
177 | |||
178 | sh(1), glob(3), fnmatch(3), | ||
179 | locale(7), regex(7) | ||
180 | ---- |