Blame: locatedb(5) - Waikato Linux Users Group

Annotated edit history of locatedb(5) version 1, including all changes. View license author blame.

Rev	Author	#	Line
1	perry	1	`LOCATEDB`
		2	`!!!LOCATEDB`
		3	`NAME`
		4	`DESCRIPTION`
		5	`EXAMPLE`
		6	`SEE ALSO`
		7	`----`
		8	`!!NAME`
		9
		10
		11	`locatedb - front-compressed file name database`
		12	`!!DESCRIPTION`
		13
		14
		15	`This manual page documents the format of file name databases`
		16	`for the GNU version of __locate__. The file name`
		17	`databases contain lists of files that were in particular`
		18	`directory trees when the databases were last`
		19	`updated.`
		20
		21
		22	`There can be multiple databases. Users can select which`
		23	`databases __locate__ searches using an environment`
		24	`variable or command line option; see __locate__(1L). The`
		25	`system administrator can choose the file name of the default`
		26	`database, the frequency with which the databases are`
		27	`updated, and the directories for which they contain entries.`
		28	`Normally, file name databases are updated by running the`
		29	`__updatedb__ program periodically, typically nightly; see`
		30	`__updatedb__(1L).`
		31
		32
		33	`__updatedb__ runs a program called __frcode__ to`
		34	`compress the list of file names using front-compression,`
		35	`which reduces the database size by a factor of 4 to 5.`
		36	`Front-compression (also known as incremental encoding) works`
		37	`as follows.`
		38
		39
		40	`The database entries are a sorted list (case-insensitively,`
		41	`for users' convenience). Since the list is sorted, each`
		42	`entry is likely to share a prefix (initial string) with the`
		43	`previous entry. Each database entry begins with an`
		44	`offset-differential count byte, which is the additional`
		45	`number of characters of prefix of the preceding entry to use`
		46	`beyond the number that the preceding entry is using of its`
		47	`predecessor. (The counts can be negative.) Following the`
		48	`count is a null-terminated ASCII remainder -- the part of`
		49	`the name that follows the shared prefix.`
		50
		51
		52	`If the offset-differential count is larger than can be`
		53	`stored in a byte (+/-127), the byte has the value 0x80 and`
		54	`the count follows in a 2-byte word, with the high byte first`
		55	`(network byte order).`
		56
		57
		58	`Every database begins with a dummy entry for a file called`
		59	`LOCATE02', which __locate__ checks for to ensure that
		60	`the database file has the correct format; it ignores the`
		61	`entry in doing the search.`
		62
		63
		64	`Databases can not be concatenated together, even if the`
		65	`first (dummy) entry is trimmed from all but the first`
		66	`database. This is because the offset-differential count in`
		67	`the first entry of the second and following databases will`
		68	`be wrong.`
		69
		70
		71	`There is also an old database format, used by Unix`
		72	`__locate__ and __find__ programs and earlier releases`
		73	`of the GNU ones. __updatedb__ runs programs called`
		74	`__bigram__ and __code__ to produce old-format`
		75	`databases. The old format differs from the above description`
		76	`in the following ways. Instead of each entry starting with`
		77	`an offset-differential count byte and ending with a null,`
		78	`byte values from 0 through 28 indicate offset-differential`
		79	`counts from -14 through 14. The byte value indicating that a`
		80	`long offset-differential count follows is 0x1e (30), not`
		81	`0x80. The long counts are stored in host byte order, which`
		82	`is not necessarily network byte order, and host integer word`
		83	`size, which is usually 4 bytes. They also represent a count`
		84	`14 less than their value. The database lines have no`
		85	`termination byte; the start of the next line is indicated by`
		86	`its first byte having a value __`
		87
		88
		89	`In addition, instead of starting with a dummy entry, the old`
		90	`database format starts with a 256 byte table containing the`
		91	`128 most common bigrams in the file list. A bigram is a pair`
		92	`of adjacent bytes. Bytes in the database that have the high`
		93	`bit set are indexes (with the high bit cleared) into the`
		94	`bigram table. The bigram and offset-differential count`
		95	`coding makes these databases 20-25% smaller than the new`
		96	`format, but makes them not 8-bit clean. Any byte in a file`
		97	`name that is in the ranges used for the special codes is`
		98	`replaced in the database by a question mark, which not`
		99	`coincidentally is the shell wildcard to match a single`
		100	`character.`
		101	`!!EXAMPLE`
		102
		103
		104	`Input to __frcode__:`
		105	`/usr/src`
		106	`/usr/src/cmd/aardvark.c`
		107	`/usr/src/cmd/armadillo.c`
		108	`/usr/tmp/zoo`
		109	`Length of the longest prefix of the preceding entry to share:`
		110	`0 /usr/src`
		111	`8 /cmd/aardvark.c`
		112	`14 rmadillo.c`
		113	`5 tmp/zoo`
		114	`Output from __frcode__, with trailing nulls changed to newlines and count bytes made printable:`
		115
		116
		117	`0 LOCATE02`
		118	`0 /usr/src`
		119	`8 /cmd/aardvark.c`
		120	`6 rmadillo.c`
		121	`-9 tmp/zoo`
		122	`(6 = 14 - 8, and -9 = 5 - 14)`
		123	`!!SEE ALSO`
		124
		125
		126	`__find__(1L), __locate__(1L), __locatedb__(5L),`
		127	`__xargs__(1L) __Finding Files__ (on-line in Info, or`
		128	`printed)`
		129	`----`

This page is a man page (or other imported legacy content). We are unable to automatically determine the license status of this page.

Last edited on Monday, June 3, 2002 6:55:58 pm by "perry"

Edit PageHistory Diff Info LikePages