version 2, including all changes.
.
Rev |
Author |
# |
Line |
1 |
perry |
1 |
DICTD |
|
|
2 |
!!!DICTD |
|
|
3 |
NAME |
|
|
4 |
SYNOPSIS |
|
|
5 |
DESCRIPTION |
|
|
6 |
BACKGROUND |
|
|
7 |
OPTIONS |
|
|
8 |
CONFIGURATION FILE |
|
|
9 |
DETERMINATION OF ACCESS LEVEL |
|
|
10 |
SEARCH ALGORITHMS |
|
|
11 |
DATABASE FORMAT |
|
|
12 |
ACKNOWLEDGEMENTS |
|
|
13 |
COPYING |
|
|
14 |
BUGS |
|
|
15 |
FILES |
|
|
16 |
SEE ALSO |
|
|
17 |
---- |
|
|
18 |
!!NAME |
|
|
19 |
|
|
|
20 |
|
|
|
21 |
dictd - a dictionary database server |
|
|
22 |
!!SYNOPSIS |
|
|
23 |
|
|
|
24 |
|
|
|
25 |
__dictd__ ''[[options] |
|
|
26 |
'' |
|
|
27 |
!!DESCRIPTION |
|
|
28 |
|
|
|
29 |
|
|
|
30 |
__dictd__ is a server for the Dictionary Server Protocol |
|
|
31 |
(DICT), a TCP transaction based query/response protocol that |
|
|
32 |
allows a client to access dictionary definitions from a set |
|
|
33 |
of natural language dictionary databases. |
|
|
34 |
|
|
|
35 |
|
|
|
36 |
For security reasons, dictd drops root permissions after |
|
|
37 |
startup. If user __dictd__ exists on the system, the |
|
|
38 |
daemon will run as that user, group __dictd__, otherwise |
|
|
39 |
it will run as user __nobody__, group |
|
|
40 |
__nogroup__. |
|
|
41 |
|
|
|
42 |
|
|
|
43 |
Since startup time is significant, the server is designed to |
|
|
44 |
run continuously, and should ''not'' be run from |
|
|
45 |
inetd(8). (However, with a fast processor, it is |
|
|
46 |
feasible to do so.) |
|
|
47 |
|
|
|
48 |
|
|
|
49 |
Databases are distributed separately from the |
|
|
50 |
server. |
|
|
51 |
!!BACKGROUND |
|
|
52 |
|
|
|
53 |
|
|
|
54 |
For many years, the Internet community has relied on the |
|
|
55 |
|
|
|
56 |
|
|
|
57 |
Fortunately, several freely-distributable dictionaries and |
|
|
58 |
lexicons have recently become available on the Internet. |
|
|
59 |
However, these freely-distributable databases are not |
|
|
60 |
accessible via a uniform interface, and are not accessible |
|
|
61 |
from a single site. They are often small and incomplete |
|
|
62 |
individually, but would collectively provide an interesting |
|
|
63 |
and useful database of English words. Examples include the |
2 |
perry |
64 |
Jargon file, the !WordNet database, MICRA's version of the |
1 |
perry |
65 |
1913 Webster's Revised Unabridged Dictionary, and the Free |
|
|
66 |
Online Dictionary of Computing. (See the DICT protocol |
|
|
67 |
specification (RFC) for references.) Translating and |
|
|
68 |
non-English dictionaries are also becoming available (for |
|
|
69 |
example, the FOLDOC dictionary is being translated into |
|
|
70 |
Spanish). |
|
|
71 |
|
|
|
72 |
|
|
|
73 |
The webster protocol is not suitable for providing access to |
|
|
74 |
a large number of separate dictionary databases, and |
|
|
75 |
extensions to the current webster protocol were not felt to |
|
|
76 |
be a clean solution to the dictionary database |
|
|
77 |
problem. |
|
|
78 |
|
|
|
79 |
|
|
|
80 |
The DICT protocol is designed to provide access to multiple |
|
|
81 |
databases. Word definitions can be requested, the word index |
|
|
82 |
can be searched (using an easily extended set of |
|
|
83 |
algorithms), information about the server can be provided |
|
|
84 |
(e.g., which index search strategies are supported, or which |
|
|
85 |
databases are available), and information about a database |
|
|
86 |
can be provided (e.g., copyright, citation, or distribution |
|
|
87 |
information). Further, the DICT protocol has hooks that can |
|
|
88 |
be used to restrict access to some or all of the |
|
|
89 |
databases. |
|
|
90 |
|
|
|
91 |
|
|
|
92 |
dictd(8) is a server that implements the DICT |
|
|
93 |
protocol. Bret Martin implemented another server, and |
|
|
94 |
several people (including Bret and myself) have implemented |
|
|
95 |
clients in a variety of languages. |
|
|
96 |
!!OPTIONS |
|
|
97 |
|
|
|
98 |
|
|
|
99 |
__-V__ or __--version__ |
|
|
100 |
|
|
|
101 |
|
|
|
102 |
Display version information. |
|
|
103 |
|
|
|
104 |
|
|
|
105 |
__--license__ |
|
|
106 |
|
|
|
107 |
|
|
|
108 |
Display copyright and license information. |
|
|
109 |
|
|
|
110 |
|
|
|
111 |
__-h__ or __--help__ |
|
|
112 |
|
|
|
113 |
|
|
|
114 |
Display help information. |
|
|
115 |
|
|
|
116 |
|
|
|
117 |
__-v__ or __--verbose__ or __-d |
|
|
118 |
verbose__ |
|
|
119 |
|
|
|
120 |
|
|
|
121 |
Be verbose. |
|
|
122 |
|
|
|
123 |
|
|
|
124 |
__-c__ ''file'' or __--config__ |
|
|
125 |
''file'' |
|
|
126 |
|
|
|
127 |
|
|
|
128 |
Specify configuration file. The default is |
|
|
129 |
''/etc/dictd.conf'', but may be changed in the |
|
|
130 |
''dictd.h'' file at compile time |
|
|
131 |
(DICT_CONFIG_FILE). |
|
|
132 |
|
|
|
133 |
|
|
|
134 |
__-p__ ''service'' or __--port__ |
|
|
135 |
''service'' |
|
|
136 |
|
|
|
137 |
|
|
|
138 |
Specifies the port (e.g., 2628) or service (e.g., dict) for |
|
|
139 |
connections. The default is 2628, as specified in the DICT |
|
|
140 |
Protocol RFC, but may be changed in the ''dictd.h'' file |
|
|
141 |
at compile time (DICT_DEFAULT_SERVICE). |
|
|
142 |
|
|
|
143 |
|
|
|
144 |
__-i__ or __--inetd__ |
|
|
145 |
|
|
|
146 |
|
|
|
147 |
Communicate on standard input/output, suitable for use from |
|
|
148 |
inetd. Although, due to its rather large startup time, this |
|
|
149 |
daemon was not intended to run from inetd, with a fast |
|
|
150 |
processor it is feasible to do so. |
|
|
151 |
|
|
|
152 |
|
|
|
153 |
__--depth__ ''length'' |
|
|
154 |
|
|
|
155 |
|
|
|
156 |
Specify the queue length for listen(2). Specifies the |
|
|
157 |
number of pending socket connections which are queued by the |
|
|
158 |
operating system. Some operating systems may silently limit |
|
|
159 |
this value to 5 (older BSD systems) or 128 (Linux). The |
|
|
160 |
default is 10 but may be changed in the ''dictd.h'' file |
|
|
161 |
at compile time (DICT_QUEUE_DEPTH). |
|
|
162 |
|
|
|
163 |
|
|
|
164 |
__--delay__ ''seconds'' |
|
|
165 |
|
|
|
166 |
|
|
|
167 |
Specifies the number of seconds a client may be idle before |
|
|
168 |
the server will close the connection. Idle time is defined |
|
|
169 |
to be the time the server is waiting for input and does not |
|
|
170 |
include the time the server spends searching the database. |
|
|
171 |
Connections are closed without warning since no provision |
|
|
172 |
for premature connection termination is specified in the |
|
|
173 |
DICT protocol RFC. The default is 600 seconds (10 minutes), |
|
|
174 |
but may be changed in the ''dictd.h'' file at compile |
|
|
175 |
time (DICT_DEFAULT_DELAY). |
|
|
176 |
|
|
|
177 |
|
|
|
178 |
__--facility__ ''facility'' |
|
|
179 |
|
|
|
180 |
|
|
|
181 |
Specifies the syslog facility to use. The use of this option |
|
|
182 |
sets the -s option. The available facilities are those |
|
|
183 |
listed in ''syslog.conf(5)''. (Note that keywords such as |
|
|
184 |
__local1__ are used, not the variables such as |
|
|
185 |
__LOG_LOCAL1__ described in ''syslog(3)''.) The |
|
|
186 |
default facility is __user__. |
|
|
187 |
|
|
|
188 |
|
|
|
189 |
The default syslog configuration adds all logs to |
|
|
190 |
/var/log/syslog. Refer to ''syslog.conf(5)'' if you wish |
|
|
191 |
to assign a log file name for a previously unused facility, |
|
|
192 |
or if you desire to avoid cluttering ''/var/log/syslog'' |
|
|
193 |
with dictd logging messages. |
|
|
194 |
|
|
|
195 |
|
|
|
196 |
__-f__ or __--force__ |
|
|
197 |
|
|
|
198 |
|
|
|
199 |
Force the daemon to start even if an instance of the daemon |
|
|
200 |
is already running. (This is of little value unless a |
|
|
201 |
non-default port is specified with -p, since, if one |
|
|
202 |
instance is bound to a port, the second one fails when it |
|
|
203 |
can not bind to the port.) |
|
|
204 |
|
|
|
205 |
|
|
|
206 |
__--limit__ ''children'' |
|
|
207 |
|
|
|
208 |
|
|
|
209 |
Specifies the number of daemons that may be running |
|
|
210 |
simultaneously. Each daemon services a single connection. If |
|
|
211 |
the limit is exceeded, a (serialized) connection will be |
|
|
212 |
made by the server process, and a response code 420 (server |
|
|
213 |
temporarily unavailable) will be sent to the client. This |
|
|
214 |
parameter should be adjusted to prevent the server machine |
|
|
215 |
from being overloaded by dict clients, but should not be set |
|
|
216 |
so low that many clients are denied useful connections. The |
|
|
217 |
default is 100, but may be changed in the ''dictd.h'' |
|
|
218 |
file at compile time (DICT_DAEMON_LIMIT). |
|
|
219 |
|
|
|
220 |
|
|
|
221 |
__-l__ ''option'' or __--log__ |
|
|
222 |
''option'' |
|
|
223 |
|
|
|
224 |
|
|
|
225 |
Specify a logging option. (This is effective only if logging |
|
|
226 |
has been enabled with the -s or -L option.) Only one option |
|
|
227 |
may be set with each invocation of this option; however, |
|
|
228 |
multiple invocations of this option may be made in one dictd |
|
|
229 |
command line. For instance: |
|
|
230 |
|
|
|
231 |
|
|
|
232 |
__dictd -s --log__ ''stats'' __--log__ ''found'' |
|
|
233 |
__--log__ ''notfound'' |
|
|
234 |
|
|
|
235 |
|
|
|
236 |
is a valid command line, and sets three logging |
|
|
237 |
options. |
|
|
238 |
|
|
|
239 |
|
|
|
240 |
Some of the more verbose options are used primarily for |
|
|
241 |
debugging the server code, and are not practical for normal |
|
|
242 |
use. |
|
|
243 |
|
|
|
244 |
|
|
|
245 |
__server__ Log server diagnostics. This is extremely |
|
|
246 |
verbose. |
|
|
247 |
|
|
|
248 |
|
|
|
249 |
__connect__ |
|
|
250 |
|
|
|
251 |
|
|
|
252 |
Log all connections. |
|
|
253 |
|
|
|
254 |
|
|
|
255 |
__stats__ |
|
|
256 |
|
|
|
257 |
|
|
|
258 |
Log all children terminations. |
|
|
259 |
|
|
|
260 |
|
|
|
261 |
__command__ |
|
|
262 |
|
|
|
263 |
|
|
|
264 |
Log all commands. This is extremely verbose. |
|
|
265 |
|
|
|
266 |
|
|
|
267 |
__client__ |
|
|
268 |
|
|
|
269 |
|
|
|
270 |
Log results of CLIENT command. |
|
|
271 |
|
|
|
272 |
|
|
|
273 |
__found__ |
|
|
274 |
|
|
|
275 |
|
|
|
276 |
Log all words found in the databases. |
|
|
277 |
|
|
|
278 |
|
|
|
279 |
__notfound__ |
|
|
280 |
|
|
|
281 |
|
|
|
282 |
Log all words not found in the databases. |
|
|
283 |
|
|
|
284 |
|
|
|
285 |
__timestamp__ |
|
|
286 |
|
|
|
287 |
|
|
|
288 |
When logging to a file, use a full timestamp like that which |
|
|
289 |
syslog would produce. Otherwise, no timestamp is made, |
|
|
290 |
making the files shorter. |
|
|
291 |
|
|
|
292 |
|
|
|
293 |
__host__ |
|
|
294 |
|
|
|
295 |
|
|
|
296 |
Log name of foreign host. |
|
|
297 |
|
|
|
298 |
|
|
|
299 |
__min__ |
|
|
300 |
|
|
|
301 |
|
|
|
302 |
Set the following options: found, notfound, stats, and |
|
|
303 |
client. If logging is activated (to a file, or via syslog), |
|
|
304 |
and no options are set, then this minimal set of options |
|
|
305 |
will be used. |
|
|
306 |
|
|
|
307 |
|
|
|
308 |
__all__ |
|
|
309 |
|
|
|
310 |
|
|
|
311 |
Set all of the options. |
|
|
312 |
|
|
|
313 |
|
|
|
314 |
__none__ |
|
|
315 |
|
|
|
316 |
|
|
|
317 |
Clear all of the options. |
|
|
318 |
|
|
|
319 |
|
|
|
320 |
To facilitate location of interesting information in the log |
|
|
321 |
file, entries are marked with initial letters indicating the |
|
|
322 |
class of the line being logged: |
|
|
323 |
|
|
|
324 |
|
|
|
325 |
__I__ |
|
|
326 |
|
|
|
327 |
|
|
|
328 |
Information about the server, connections, or termination |
|
|
329 |
statistics. These lines are generally not designed to be |
|
|
330 |
parsed automatically. |
|
|
331 |
|
|
|
332 |
|
|
|
333 |
__E__ |
|
|
334 |
|
|
|
335 |
|
|
|
336 |
Error messages. |
|
|
337 |
|
|
|
338 |
|
|
|
339 |
__C__ |
|
|
340 |
|
|
|
341 |
|
|
|
342 |
CLIENT command information. |
|
|
343 |
|
|
|
344 |
|
|
|
345 |
__D__ |
|
|
346 |
|
|
|
347 |
|
|
|
348 |
Definitions found in the databases searched. |
|
|
349 |
|
|
|
350 |
|
|
|
351 |
__M__ |
|
|
352 |
|
|
|
353 |
|
|
|
354 |
Matches found in the database searched. |
|
|
355 |
|
|
|
356 |
|
|
|
357 |
__N__ |
|
|
358 |
|
|
|
359 |
|
|
|
360 |
Matches which were not found in the databases |
|
|
361 |
searched. |
|
|
362 |
|
|
|
363 |
|
|
|
364 |
__T__ |
|
|
365 |
|
|
|
366 |
|
|
|
367 |
Trace of exact line sent by client. |
|
|
368 |
|
|
|
369 |
|
|
|
370 |
To preserve anonymity of the client, do ''not'' use the |
|
|
371 |
__connect__ or __host__ options. Clients may or may |
|
|
372 |
not send host information using the CLIENT command, but this |
|
|
373 |
should be an option that is selectable on the client |
|
|
374 |
side. |
|
|
375 |
|
|
|
376 |
|
|
|
377 |
__-s__ |
|
|
378 |
|
|
|
379 |
|
|
|
380 |
Log using the syslog(3) facility. |
|
|
381 |
|
|
|
382 |
|
|
|
383 |
__-L__ ''file'' or __--logfile__ |
|
|
384 |
''file'' |
|
|
385 |
|
|
|
386 |
|
|
|
387 |
Specify the file for logging. |
|
|
388 |
|
|
|
389 |
|
|
|
390 |
__NOTE:__ If dictd does not have write permission for |
|
|
391 |
this file, it will silently fail. |
|
|
392 |
|
|
|
393 |
|
|
|
394 |
__-m__ ''minutes'' or __--mark__ |
|
|
395 |
''minutes'' |
|
|
396 |
|
|
|
397 |
|
|
|
398 |
How often a timestamp should be logged. (This is effective |
|
|
399 |
only if logging has been enabled with the -s or -L |
|
|
400 |
option.) |
|
|
401 |
|
|
|
402 |
|
|
|
403 |
__-d__ ''option'' |
|
|
404 |
|
|
|
405 |
|
|
|
406 |
Activate a debugging option. There are several, all of which |
|
|
407 |
are only useful to developers. They are documented here for |
|
|
408 |
completeness. A list can be obtained interactively by using |
|
|
409 |
__-d__ with an illegal option. |
|
|
410 |
|
|
|
411 |
|
|
|
412 |
__verbose__ |
|
|
413 |
|
|
|
414 |
|
|
|
415 |
The same as __-v__ or __--verbose__. Adds verbosity to |
|
|
416 |
other options. |
|
|
417 |
|
|
|
418 |
|
|
|
419 |
__scan__ |
|
|
420 |
|
|
|
421 |
|
|
|
422 |
Debug the scanner for the configuration file. |
|
|
423 |
|
|
|
424 |
|
|
|
425 |
__parse__ |
|
|
426 |
|
|
|
427 |
|
|
|
428 |
Debug the parser for the configuration file. |
|
|
429 |
|
|
|
430 |
|
|
|
431 |
__search__ |
|
|
432 |
|
|
|
433 |
|
|
|
434 |
Debug the character folding and binary search |
|
|
435 |
routines. |
|
|
436 |
|
|
|
437 |
|
|
|
438 |
__init__ |
|
|
439 |
|
|
|
440 |
|
|
|
441 |
Report database initialization. |
|
|
442 |
|
|
|
443 |
|
|
|
444 |
__port__ |
|
|
445 |
|
|
|
446 |
|
|
|
447 |
Log client-side port number to the log file. |
|
|
448 |
|
|
|
449 |
|
|
|
450 |
__lev__ |
|
|
451 |
|
|
|
452 |
|
|
|
453 |
Debug Levenshtein search algorithm. |
|
|
454 |
|
|
|
455 |
|
|
|
456 |
__auth__ |
|
|
457 |
|
|
|
458 |
|
|
|
459 |
Debug the authorization routines. |
|
|
460 |
|
|
|
461 |
|
|
|
462 |
__nodetach__ |
|
|
463 |
|
|
|
464 |
|
|
|
465 |
Do not detach as a background process. Implies that a copy |
|
|
466 |
of the log file will appear on the standard |
|
|
467 |
output. |
|
|
468 |
|
|
|
469 |
|
|
|
470 |
__nofork__ |
|
|
471 |
|
|
|
472 |
|
|
|
473 |
Do not fork daemons to service requests. Be a |
|
|
474 |
single-threaded server. This option implies __nodetach__, |
|
|
475 |
and is most useful for using a debugger to find the point at |
|
|
476 |
which daemon processes are dumping core. |
|
|
477 |
|
|
|
478 |
|
|
|
479 |
__alt__ |
|
|
480 |
|
|
|
481 |
|
|
|
482 |
Debugs __altcompare__ in ''index.c''. |
|
|
483 |
!!CONFIGURATION FILE |
|
|
484 |
|
|
|
485 |
|
|
|
486 |
The configuration file defaults to ''/etc/dictd.conf'', |
|
|
487 |
but can be specified on the command line with the __-c__ |
|
|
488 |
option (see above). The configuration file has four distinct |
|
|
489 |
sections. At this time, each section must appear in the |
|
|
490 |
specified order, although only the Database section is |
|
|
491 |
required. |
|
|
492 |
|
|
|
493 |
|
|
|
494 |
__Syntax__ |
|
|
495 |
|
|
|
496 |
|
|
|
497 |
The following keywords are valid in a configuration file: |
|
|
498 |
access, allow, deny, group, database, data, index, filter, |
|
|
499 |
prefilter, postfilter, name, include, user, authonly, site. |
|
|
500 |
Keywords are case sensitive. String arguments that contain |
|
|
501 |
spaces should be surrounded by double quotes. Without |
|
|
502 |
quoting, strings may contain alphanumeric characters and _, |
|
|
503 |
-, ., and *, but not spaces. Strings must be on a single |
|
|
504 |
line and cannot be continued between lines. Comments start |
|
|
505 |
with # and extend to the end of the line. |
|
|
506 |
|
|
|
507 |
|
|
|
508 |
__Access Specification__ |
|
|
509 |
|
|
|
510 |
|
|
|
511 |
Access specifications may occur in the Access Section or in |
|
|
512 |
the Database Section. The access specification will be |
|
|
513 |
described here. |
|
|
514 |
|
|
|
515 |
|
|
|
516 |
For allow, deny, and authonly, a star (*) may be used as a |
|
|
517 |
wild card that matches any number of characters. A question |
|
|
518 |
mark (?) may be used as a wildcard that matches a single |
|
|
519 |
character. For example, 10.0.0.* and *.edu are valid |
|
|
520 |
strings. |
|
|
521 |
|
|
|
522 |
|
|
|
523 |
The syntax is as follows: |
|
|
524 |
|
|
|
525 |
|
|
|
526 |
__allow__ ''string'' |
|
|
527 |
|
|
|
528 |
|
|
|
529 |
The string specifies a domain name or IP address which is |
|
|
530 |
allowed access to the server (in the Access Section) or to a |
|
|
531 |
database (in the Database Section). |
|
|
532 |
|
|
|
533 |
|
|
|
534 |
__deny__ ''string'' |
|
|
535 |
|
|
|
536 |
|
|
|
537 |
The string specifies a domain name or IP address which is |
|
|
538 |
denied access to the server (in the Access Section) or to a |
|
|
539 |
database (in the Database Section). Note that if reverse DNS |
|
|
540 |
is not working, then only the IP number will be checked. |
|
|
541 |
Therefore, it is essential to deny networks based on IP |
|
|
542 |
number, since a denial based on domain name may not always |
|
|
543 |
be checked. |
|
|
544 |
|
|
|
545 |
|
|
|
546 |
__authonly__ ''string'' |
|
|
547 |
|
|
|
548 |
|
|
|
549 |
This form is only useful in the Access Section. The string |
|
|
550 |
specifies a domain name or IP address which is allowed |
|
|
551 |
access to the server but not to any of the databases. All |
|
|
552 |
commands are valid except DEFINE, MATCH, and SHOW DB. More |
|
|
553 |
specifically AUTH is a valid command, and commands which |
|
|
554 |
access the databases are not allowed. |
|
|
555 |
|
|
|
556 |
|
|
|
557 |
__user__''string'' |
|
|
558 |
|
|
|
559 |
|
|
|
560 |
This form is only useful in the Database Section. The string |
|
|
561 |
specifies a username that is allowed to access this database |
|
|
562 |
after a successful AUTH command is executed. |
|
|
563 |
|
|
|
564 |
|
|
|
565 |
__site__ ''string'' |
|
|
566 |
|
|
|
567 |
|
|
|
568 |
Used to specify the filename for the site information file, |
|
|
569 |
a flat text file which will be displayed in response to the |
|
|
570 |
SHOW SERVER command. This section, if present, must be |
|
|
571 |
first. |
|
|
572 |
|
|
|
573 |
|
|
|
574 |
__access {__ ''access specification'' |
|
|
575 |
__}__ |
|
|
576 |
|
|
|
577 |
|
|
|
578 |
This section, the second if the Site Section is present, |
|
|
579 |
contains access restrictions for the server and all of the |
|
|
580 |
databases collectively. Per-database control is specified in |
|
|
581 |
the Database Section |
|
|
582 |
|
|
|
583 |
|
|
|
584 |
__database__ ''string'' __{__ ''database |
|
|
585 |
specification'' __}__ |
|
|
586 |
|
|
|
587 |
|
|
|
588 |
This section is required. The string specifies the name of |
|
|
589 |
the database (e.g., wn or web1913). The database |
|
|
590 |
specification describes the database: |
|
|
591 |
|
|
|
592 |
|
|
|
593 |
__NOTE__: If the files specified in database |
|
|
594 |
specification do not exist on the system, dictd will |
|
|
595 |
silently fail. |
|
|
596 |
|
|
|
597 |
|
|
|
598 |
__data__ ''string'' |
|
|
599 |
|
|
|
600 |
|
|
|
601 |
Specifies the filename for the flat text |
|
|
602 |
database. |
|
|
603 |
|
|
|
604 |
|
|
|
605 |
__index__ ''string'' |
|
|
606 |
|
|
|
607 |
|
|
|
608 |
Specifies the filename for the index file. |
|
|
609 |
|
|
|
610 |
|
|
|
611 |
__prefilter__ ''string'' |
|
|
612 |
|
|
|
613 |
|
|
|
614 |
Specifies the prefilter command. When a chunk of the |
|
|
615 |
compressed database is read, it will be filtered with this |
|
|
616 |
filter before being decompressed. This may be used to |
|
|
617 |
provide some additional compression that knows about the |
|
|
618 |
data and can provide better compression than the LZ77 |
|
|
619 |
algorithm used by zlib. |
|
|
620 |
|
|
|
621 |
|
|
|
622 |
__postfilter__ ''string'' |
|
|
623 |
|
|
|
624 |
|
|
|
625 |
Specifies the postfilter command. When a chunk of the |
|
|
626 |
compressed database is read, it will be filtered with this |
|
|
627 |
filter before the offset and length for the entry are used |
|
|
628 |
to access data. This is provided for symmetry with the |
|
|
629 |
prefilter command, and may also be useful for providing |
|
|
630 |
additional database compression. |
|
|
631 |
|
|
|
632 |
|
|
|
633 |
__filter__ ''string'' |
|
|
634 |
|
|
|
635 |
|
|
|
636 |
Specifies the filter command. After the entry is extracted |
|
|
637 |
from the database, it will be filtered with this filter. |
|
|
638 |
This may be used to provide formatting for the entry (e.g., |
|
|
639 |
for html). __Warning:__ This is not currently |
|
|
640 |
implemented. |
|
|
641 |
|
|
|
642 |
|
|
|
643 |
__name__ ''string'' |
|
|
644 |
|
|
|
645 |
|
|
|
646 |
Specifies the short name of the database (e.g., |
|
|
647 |
dictd.h'' file at compile time |
|
|
648 |
(DICT_SHORT_ENTRY_NAME). |
|
|
649 |
|
|
|
650 |
|
|
|
651 |
__access {__ ''access specification'' |
|
|
652 |
__}__ |
|
|
653 |
|
|
|
654 |
|
|
|
655 |
Used to restrict access to this particular |
|
|
656 |
database. |
|
|
657 |
|
|
|
658 |
|
|
|
659 |
__include__ ''filename'' |
|
|
660 |
|
|
|
661 |
|
|
|
662 |
The text of the file ''filename'' (usually a database |
|
|
663 |
specification) will be read as if it appeared at this |
|
|
664 |
location in the configuration file. |
|
|
665 |
|
|
|
666 |
|
|
|
667 |
__Note for Debian Systems:__ |
|
|
668 |
On Debian Systems, a configuration script that creates a |
|
|
669 |
database specification in /var/lib/dictd/db.list is run |
|
|
670 |
whenever any dictionary database is installed or removed. |
|
|
671 |
This makes it unnecessary for the user to edit the Database |
|
|
672 |
section of the configuration file. |
|
|
673 |
|
|
|
674 |
|
|
|
675 |
__user__ ''string'' __string__ |
|
|
676 |
|
|
|
677 |
|
|
|
678 |
The first string specifies the username, and the second |
|
|
679 |
string specifies the shared secret for this username. When |
|
|
680 |
the AUTH command is used, the client will provide the |
|
|
681 |
username and a hashed version of the shared secret. If the |
|
|
682 |
shared secret matches, the user is said to have |
|
|
683 |
authenticated, and will have access to databases whose |
|
|
684 |
access specifications allow that user (by name, or by |
|
|
685 |
wildcard). If present, this section must appear last in the |
|
|
686 |
configuration file. There may be many user entries. The |
|
|
687 |
shared secret should be kept secret, as anyone who has |
|
|
688 |
access to it can access the shared databases (assuming |
|
|
689 |
access is not denied by domain name). |
|
|
690 |
!!DETERMINATION OF ACCESS LEVEL |
|
|
691 |
|
|
|
692 |
|
|
|
693 |
When a client connects, the global access specification is |
|
|
694 |
scanned, in order, until a specification matches. If no |
|
|
695 |
access specification exists, all access is allowed (e.g., |
|
|
696 |
the action is the same as if |
|
|
697 |
|
|
|
698 |
|
|
|
699 |
allow 10.42.* authonly *.edu deny * |
|
|
700 |
|
|
|
701 |
|
|
|
702 |
With this specification, all clients in the 10.42 network |
|
|
703 |
will be allowed access to unrestricted databases; all |
|
|
704 |
clients from *.edu sites will be allowed to authenticate, |
|
|
705 |
but will be denied access to all databases, even those which |
|
|
706 |
are otherwise unrestricted; and all other clients will have |
|
|
707 |
their connection terminated immediately. The 10.42 network |
|
|
708 |
clients can send an AUTH command and gain access to |
|
|
709 |
restricted databases. The *.edu clients must send an AUTH |
|
|
710 |
command to gain access to any databases, restricted or |
|
|
711 |
unrestricted. |
|
|
712 |
|
|
|
713 |
|
|
|
714 |
When the AUTH command is sent, the access list for each |
|
|
715 |
database is scanned, in order, just as the global access |
|
|
716 |
list is scanned. However, after authentication, the client |
|
|
717 |
has an associated username. For example, consider the |
|
|
718 |
following access specification: |
|
|
719 |
|
|
|
720 |
|
|
|
721 |
user u1 deny *.com user u2 allow * |
|
|
722 |
|
|
|
723 |
|
|
|
724 |
If the client authenticated as u1, then the client will have |
|
|
725 |
access to this database, even if the client comes from a |
|
|
726 |
*.com site. In contrast, if the client authenticated as u2, |
|
|
727 |
the client will only have access if it does not come from a |
|
|
728 |
*.com site. In this case, the |
|
|
729 |
|
|
|
730 |
|
|
|
731 |
__Warning:__ Checks are performed for domain names and |
|
|
732 |
for IP addresses. However, if reverse DNS for a specific |
|
|
733 |
site is not working, it is possible that a domain name may |
|
|
734 |
not be available for checking. Make sure that all denials |
|
|
735 |
use IP addresses. (And consider a future enhancement: if a |
|
|
736 |
domain name is not available, should denials that depend on |
|
|
737 |
a domain name match anything? This is the more conservative |
|
|
738 |
viewpoint, but it is not currently |
|
|
739 |
implemented.) |
|
|
740 |
!!SEARCH ALGORITHMS |
|
|
741 |
|
|
|
742 |
|
|
|
743 |
The DICT standard specifies a few search algorithms that |
|
|
744 |
must be implemented, and permits others to be supported on a |
|
|
745 |
server-dependent basis. The following search strategies are |
|
|
746 |
supported by this server. Note that ''all'' strategies |
|
|
747 |
are case insensitive. Most ignore non-alphanumeric, |
|
|
748 |
non-whitespace characters. |
|
|
749 |
|
|
|
750 |
|
|
|
751 |
__exact__ |
|
|
752 |
|
|
|
753 |
|
|
|
754 |
An exact match. This algorithm uses a binary search and is |
|
|
755 |
one of the fastest search algorithms available. |
|
|
756 |
|
|
|
757 |
|
|
|
758 |
__prefix__ |
|
|
759 |
|
|
|
760 |
|
|
|
761 |
Prefix match. This algorithm also uses a binary search and |
|
|
762 |
is very fast. |
|
|
763 |
|
|
|
764 |
|
|
|
765 |
__substring__ |
|
|
766 |
|
|
|
767 |
|
|
|
768 |
Match a substring anywhere in the headword. This search |
|
|
769 |
strategy uses a modified Boyer-Moore-Horspool algorithm. |
|
|
770 |
Since it must search the whole index file, it is not as fast |
|
|
771 |
as the exact and prefix matches. |
|
|
772 |
|
|
|
773 |
|
|
|
774 |
__suffix__ |
|
|
775 |
|
|
|
776 |
|
|
|
777 |
Suffix match. This search strategy also uses a modified |
|
|
778 |
Boyer-Moore-Horspool algorithm, and is as fast as the |
|
|
779 |
substring search. |
|
|
780 |
|
|
|
781 |
|
|
|
782 |
__re__ |
|
|
783 |
|
|
|
784 |
|
|
|
785 |
POSIX 1003.2 (modern) regular expression search. Modern |
|
|
786 |
regular expressions are the ones used by egrep(1). |
|
|
787 |
These regular expressions allow predefined character classes |
|
|
788 |
(e.g., [[[[:alnum:]], [[[[:alpha:]], [[[[:digit:]], and |
|
|
789 |
[[[[:xdigit:]] are useful for this application); uses * to |
|
|
790 |
match a sequence 0 or more matches of the previous atom; |
|
|
791 |
uses + to match a sequence of 1 or more matches of the |
|
|
792 |
previous atom; uses ? to match a sequence of 0 or 1 matches |
|
|
793 |
of the previous atom; uses ^ to match the beginning of a |
|
|
794 |
word, uses $ to match the end of a word, and allows nested |
|
|
795 |
subexpression and alternation with () and |. For example, |
|
|
796 |
__Warning:__ |
|
|
797 |
Regular expression matches can take 10 to 300 times longer |
|
|
798 |
than substring matches. On a busy server, with many |
|
|
799 |
databases, this can required more than 5 minutes of waiting |
|
|
800 |
time, depending on the complexity of the regular |
|
|
801 |
expression. |
|
|
802 |
|
|
|
803 |
|
|
|
804 |
__regexp__ |
|
|
805 |
|
|
|
806 |
|
|
|
807 |
Old (basic) regular expressions. These regular expressions |
|
|
808 |
don't support |, +, or ?. Groups use escaped parentheses. |
|
|
809 |
While modern regular expressions are generally easier to |
|
|
810 |
use, basic regular expressions have a back reference |
|
|
811 |
feature. This can be used to match a second occurrence of |
|
|
812 |
something that was already matched. For example, the |
|
|
813 |
following expression finds all words that begin and end with |
|
|
814 |
the same three letters: |
|
|
815 |
|
|
|
816 |
|
|
|
817 |
^\(...\).*\1$ |
|
|
818 |
Note the use of the double backslashes to escape the special characters. This is required by the DICT protocol string specification (a single backslash quotes the next character -- we use two to get a single backslash through to the regular expression engine). __Warning:__ Note that the use of backtracking is even slower than the use of general regular expressions. |
|
|
819 |
|
|
|
820 |
|
|
|
821 |
__soundex__ |
|
|
822 |
|
|
|
823 |
|
|
|
824 |
The Soundex algorithm, a classic algorithm for finding words |
|
|
825 |
that sound similar to each other. The algorithm encodes each |
|
|
826 |
word using the first letter of the word and up to three |
|
|
827 |
digits. Since the first letter is known, this search is |
|
|
828 |
relatively fast, and it sometimes good for correcting |
|
|
829 |
spelling errors when the Levenshtein algorithm doesn't |
|
|
830 |
help. |
|
|
831 |
|
|
|
832 |
|
|
|
833 |
__lev__ |
|
|
834 |
|
|
|
835 |
|
|
|
836 |
The Levenshtein algorithm (string edit distance of one). |
|
|
837 |
This algorithm searches for all words which are within an |
|
|
838 |
edit distance of one from the target word. An |
|
|
839 |
!!DATABASE FORMAT |
|
|
840 |
|
|
|
841 |
|
|
|
842 |
Databases for __dictd__ are distributed separately. A |
|
|
843 |
database consists of two files. One is a flat text file, the |
|
|
844 |
other in the index. |
|
|
845 |
|
|
|
846 |
|
|
|
847 |
The flat text file contains dictionary entries (or any other |
|
|
848 |
suitable data), and the index contains tab-delimited tuples |
|
|
849 |
consisting of the headword, the byte offset at which this |
|
|
850 |
entry begins in the flat text file, and the length of the |
|
|
851 |
entry in bytes. The offset and length are encoded using base |
|
|
852 |
64 encoding using the 64-character subset of International |
|
|
853 |
Alphabet IA5 discussed in RFC 1421 (printable encoding) and |
|
|
854 |
RFC 1522 (base64 MIME). Encoding the offsets in base 64 |
|
|
855 |
saves considerable space when compared with the usual base |
|
|
856 |
10 encoding, while still permitting tab characters (ASCII 9) |
|
|
857 |
to be used for delimiting fields in a record. Each record |
|
|
858 |
ends with a newline (ASCII 10), so the index file is human |
|
|
859 |
readable. |
|
|
860 |
|
|
|
861 |
|
|
|
862 |
The flat text file may be compressed using gzip(1) |
|
|
863 |
(not recommended) or dictzip(1) (highly recommended). |
|
|
864 |
Optimal speed will be obtained using an uncompressed file. |
|
|
865 |
However, the __gzip__ compression algorithm works very |
|
|
866 |
well on plain text, and can result in space savings |
|
|
867 |
typically between 60 and 80%. Using a file compressed with |
|
|
868 |
gzip(1) is not recommended, however, because random |
|
|
869 |
access on the file can only be accomplished by serially |
|
|
870 |
decompressing the whole file, a process which is |
|
|
871 |
prohibitively slow. dictzip(1) uses the same |
|
|
872 |
compression algorithm and file format as does |
|
|
873 |
gzip(1), but provides a table that can be used to |
|
|
874 |
randomly access compressed blocks in the file. The use of |
|
|
875 |
50-64kB blocks for compression typically degrades |
|
|
876 |
compression by less than 10%, while maintaining acceptable |
|
|
877 |
random access capabilities for all data in the file. As an |
|
|
878 |
added benefit, files compressed with dictzip(1) can |
|
|
879 |
be decompressed with gzip(1) or zcat(1). |
|
|
880 |
(Note: recompressing a __dictzip__'d file using, for |
|
|
881 |
example, znew(1) will destroy the random access |
|
|
882 |
characteristics of the file. Always compress data files |
|
|
883 |
using dictzip(1).) |
|
|
884 |
!!ACKNOWLEDGEMENTS |
|
|
885 |
|
|
|
886 |
|
|
|
887 |
Special thanks to Jean-loup Gailly and Mark Adler for |
|
|
888 |
writing the zlib general purpose data compression library. |
|
|
889 |
The version contained with __dictd__ is not necessarily |
|
|
890 |
an original version and __may have been modified__, |
|
|
891 |
although any modifications are probably trivial. The key |
|
|
892 |
features of the __dictzip__ random-access compression |
|
|
893 |
algorithm utilize a documented extension of the gzip format, |
|
|
894 |
and do not require any modifications to zlib. For more |
|
|
895 |
information on zlib, please see the zlib home page at'' |
|
|
896 |
http://quest.jpl.nasa.gov/zlib/'' |
|
|
897 |
|
|
|
898 |
|
|
|
899 |
Special thanks to Henry Spencer for his regex package. The |
|
|
900 |
package contained with __dictd__ is not necessarily an |
|
|
901 |
original version and __may have been modified.__ For more |
|
|
902 |
information on regex, please see'' |
|
|
903 |
ftp://zoo.toronto.edu/pub/regex.shar'' |
|
|
904 |
!!COPYING |
|
|
905 |
|
|
|
906 |
|
|
|
907 |
The main source files for the __dictd__ server and the |
|
|
908 |
__dictzip__ compression program were written by Rik Faith |
|
|
909 |
(faith@dict.org) and are distributed under the terms of the |
|
|
910 |
GNU General Public License. If you need to distribute under |
|
|
911 |
other terms, write to the author. |
|
|
912 |
|
|
|
913 |
|
|
|
914 |
The main libraries used by these programs (zlib, regex, |
|
|
915 |
libmaa) are distributed under different terms, so you may be |
|
|
916 |
able to use the libraries for applications which are |
|
|
917 |
incompatible with the GPL -- please see the copyright |
|
|
918 |
notices and license information that come with the libraries |
|
|
919 |
for more information, and consult with your attorney to |
|
|
920 |
resolve these issues. |
|
|
921 |
!!BUGS |
|
|
922 |
|
|
|
923 |
|
|
|
924 |
The regular expression searches do not ignore |
|
|
925 |
non-whitespace, non-alphanumeric characters as do the other |
|
|
926 |
searches. In practice, this isn't much of a |
|
|
927 |
problem. |
|
|
928 |
|
|
|
929 |
|
|
|
930 |
The databases are memory mapped and cannot be updated while |
|
|
931 |
the server is running. |
|
|
932 |
|
|
|
933 |
|
|
|
934 |
There is no way to get a running server to re-read the |
|
|
935 |
configuration file, so databases cannot be added or deleted |
|
|
936 |
on the fly. |
|
|
937 |
!!FILES |
|
|
938 |
|
|
|
939 |
|
|
|
940 |
''/etc/dictd.conf |
|
|
941 |
/usr/sbin/dictd'' |
|
|
942 |
!!SEE ALSO |
|
|
943 |
|
|
|
944 |
|
|
|
945 |
dict(1), dictzip(1), gunzip(1), |
|
|
946 |
zcat(1), webster(1), __RFC |
|
|
947 |
2229__ |
|
|
948 |
---- |