Penguin
Annotated edit history of locatedb(5) version 1, including all changes. View license author blame.
Rev Author # Line
1 perry 1 LOCATEDB
2 !!!LOCATEDB
3 NAME
4 DESCRIPTION
5 EXAMPLE
6 SEE ALSO
7 ----
8 !!NAME
9
10
11 locatedb - front-compressed file name database
12 !!DESCRIPTION
13
14
15 This manual page documents the format of file name databases
16 for the GNU version of __locate__. The file name
17 databases contain lists of files that were in particular
18 directory trees when the databases were last
19 updated.
20
21
22 There can be multiple databases. Users can select which
23 databases __locate__ searches using an environment
24 variable or command line option; see __locate__(1L). The
25 system administrator can choose the file name of the default
26 database, the frequency with which the databases are
27 updated, and the directories for which they contain entries.
28 Normally, file name databases are updated by running the
29 __updatedb__ program periodically, typically nightly; see
30 __updatedb__(1L).
31
32
33 __updatedb__ runs a program called __frcode__ to
34 compress the list of file names using front-compression,
35 which reduces the database size by a factor of 4 to 5.
36 Front-compression (also known as incremental encoding) works
37 as follows.
38
39
40 The database entries are a sorted list (case-insensitively,
41 for users' convenience). Since the list is sorted, each
42 entry is likely to share a prefix (initial string) with the
43 previous entry. Each database entry begins with an
44 offset-differential count byte, which is the additional
45 number of characters of prefix of the preceding entry to use
46 beyond the number that the preceding entry is using of its
47 predecessor. (The counts can be negative.) Following the
48 count is a null-terminated ASCII remainder -- the part of
49 the name that follows the shared prefix.
50
51
52 If the offset-differential count is larger than can be
53 stored in a byte (+/-127), the byte has the value 0x80 and
54 the count follows in a 2-byte word, with the high byte first
55 (network byte order).
56
57
58 Every database begins with a dummy entry for a file called
59 `LOCATE02', which __locate__ checks for to ensure that
60 the database file has the correct format; it ignores the
61 entry in doing the search.
62
63
64 Databases can not be concatenated together, even if the
65 first (dummy) entry is trimmed from all but the first
66 database. This is because the offset-differential count in
67 the first entry of the second and following databases will
68 be wrong.
69
70
71 There is also an old database format, used by Unix
72 __locate__ and __find__ programs and earlier releases
73 of the GNU ones. __updatedb__ runs programs called
74 __bigram__ and __code__ to produce old-format
75 databases. The old format differs from the above description
76 in the following ways. Instead of each entry starting with
77 an offset-differential count byte and ending with a null,
78 byte values from 0 through 28 indicate offset-differential
79 counts from -14 through 14. The byte value indicating that a
80 long offset-differential count follows is 0x1e (30), not
81 0x80. The long counts are stored in host byte order, which
82 is not necessarily network byte order, and host integer word
83 size, which is usually 4 bytes. They also represent a count
84 14 less than their value. The database lines have no
85 termination byte; the start of the next line is indicated by
86 its first byte having a value __
87
88
89 In addition, instead of starting with a dummy entry, the old
90 database format starts with a 256 byte table containing the
91 128 most common bigrams in the file list. A bigram is a pair
92 of adjacent bytes. Bytes in the database that have the high
93 bit set are indexes (with the high bit cleared) into the
94 bigram table. The bigram and offset-differential count
95 coding makes these databases 20-25% smaller than the new
96 format, but makes them not 8-bit clean. Any byte in a file
97 name that is in the ranges used for the special codes is
98 replaced in the database by a question mark, which not
99 coincidentally is the shell wildcard to match a single
100 character.
101 !!EXAMPLE
102
103
104 Input to __frcode__:
105 /usr/src
106 /usr/src/cmd/aardvark.c
107 /usr/src/cmd/armadillo.c
108 /usr/tmp/zoo
109 Length of the longest prefix of the preceding entry to share:
110 0 /usr/src
111 8 /cmd/aardvark.c
112 14 rmadillo.c
113 5 tmp/zoo
114 Output from __frcode__, with trailing nulls changed to newlines and count bytes made printable:
115
116
117 0 LOCATE02
118 0 /usr/src
119 8 /cmd/aardvark.c
120 6 rmadillo.c
121 -9 tmp/zoo
122 (6 = 14 - 8, and -9 = 5 - 14)
123 !!SEE ALSO
124
125
126 __find__(1L), __locate__(1L), __locatedb__(5L),
127 __xargs__(1L) __Finding Files__ (on-line in Info, or
128 printed)
129 ----
This page is a man page (or other imported legacy content). We are unable to automatically determine the license status of this page.