GLOB(B) Linux Programmer's Manual GLOB(B) NAME glob - Globbing pathnames DESCRIPTION Long ago, in Unix V6, there was a program /etc/glob that would expand wildcard patterns. Soon afterwards this became a shell built-in. These days there is also a library routine glob(b) that will perform this function for a user program. The rules are as follows (POSIX 1003.2, 3.13). WILDCARD MATCHING A string is a wildcard pattern if it contains one of the characters `?', `*' or `['. Globbing is the operation that expands a wildcard pattern into the list of pathnames matching the pattern. Matching is defined by: A `?' (not between brackets) matches any single character. A `*' (not between brackets) matches any string, including the empty string. Character classes An expression `[...]' where the first character after the leading `[' is not an `!' matches a single character, namely any of the characters enclosed by the brackets. The string enclosed by the brackets cannot be empty; therefore `]' can be allowed between the brackets, pro- vided that it is the first character. (Thus, `[][!]' matches the three characters `[', `]' and `!'.) Ranges There is one special convention: two characters separated by `-' denote a range. (Thus, `[A-Fa-f0-9]' is equivalent to `[ABCDEFabcdef0123456789]'.) One may include `-' in its literal meaning by making it the first or last charac- ter between the brackets. (Thus, `[]-]' matches just the two characters `]' and `-', and `[--/]' matches the three characters `-', `.', `/'.) Complementation An expression `[!...]' matches a single character, namely any character that is not matched by the expression obtained by removing the first `!' from it. (Thus, `[!]a-]' matches any single character except `]', `a' and `-'.) One can remove the special meaning of `?', `*' and `[' by preceding them by a backslash, or, in case this is part of a shell command line, enclosing them in quotes. Between brackets these characters stand for themselves. Thus, `[[?*\]' matches the four characters `[', `?', `*' and `\'. PATHNAMES Globbing is applied on each of the components of a pathname separately. A `/' in a pathname cannot be matched by a `?' or `*' wildcard, or by a range like `[.-0]'. A range cannot contain an explicit `/' character; this would lead to a syntax error. If a filename starts with a `.', this character must be matched explicitly. (Thus, `rm *' will not remove .pro- file, and `tar c *' will not archive all your files; `tar c .' is better.) EMPTY LISTS The nice and simple rule given above: `expand a wildcard pattern into the list of matching pathnames' was the orig- inal Unix definition. It allowed one to have patterns that expand into an empty list, as in xv -wait 0 *.gif *.jpg where perhaps no *.gif files are present (and this is not an error). However, POSIX requires that a wildcard pat- tern is left unchanged when it is syntactically incorrect, or the list of matching pathnames is empty. With bash one can force the classical behaviour by setting allow_null_glob_expansion=true. (Similar problems occur elsewhere. E.g., where old scripts have rm `find . -name "*~"` new scripts require rm -f nosuchfile `find . -name "*~"` to avoid error messages from rm called with an empty argu- ment list.) NOTES Regular expressions Note that wildcard patterns are not regular expressions, although they are a bit similar. First of all, they match filenames, rather than text, and secondly, the conventions are not the same: e.g., in a regular expression `*' means zero or more copies of the preceding thing. Now that regular expressions have bracket expressions where the negation is indicated by a `^', POSIX has declared the effect of a wildcard pattern `[^...]' to be undefined. Character classes and Internationalization Of course ranges were originally meant to be ASCII ranges, so that `[ -%]' stands for `[ !"#$%]' and `[a-z]' stands for "any lowercase letter". Some Unix implementations generalized this so that a range X-Y stands for the set of characters with code between the codes for X and for Y. However, this requires the user to know the character cod- ing in use on the local system, and moreover, is not con- venient if the collating sequence for the local alphabet differs from the ordering of the character codes. There- fore, POSIX extended the bracket notation greatly, both for wildcard patterns and for regular expressions. In the above we saw three types of item that can occur in a bracket expression: namely (i) the negation, (ii) explicit single characters, and (iii) ranges. POSIX specifies ranges in an internationally more useful way and adds three more types: (iii) Ranges X-Y comprise all characters that fall between X and Y (inclusive) in the currect collating sequence as defined by the LC_COLLATE category in the current locale. (iv) Named character classes, like [:alnum:] [:alpha:] [:blank:] [:cntrl:] [:digit:] [:graph:] [:lower:] [:print:] [:punct:] [:space:] [:upper:] [:xdigit:] so that one can say `[[:lower:]]' instead of `[a-z]', and have things work in Denmark, too, where there are three letters past `z' in the alphabet. These character classes are defined by the LC_CTYPE category in the current locale. (v) Collating symbols, like `[.ch.]' or `[.a-acute.]', where the string between `[.' and `.]' is a collating ele- ment defined for the current locale. Note that this may be a multi-character element. (vi) Equivalence class expressions, like `[=a=]', where the string between `[=' and `=]' is any collating element from its equivalence class, as defined for the current locale. For example, `[[=a=]]' might be equivalent to `[a]' (warning: Latin-1 here), that is, to `[a[.a- acute.][.a-grave.][.a-umlaut.][.a-circumflex.]]'. SEE ALSO sh(h), glob(b), fnmatch(h), locale(e), regex(x) Unix 1998-06-12 GLOB(B)