Home
Main website
Display Sidebar
Hide Ads
Recent Changes
View Source:
dictd(8)
Edit
PageHistory
Diff
Info
LikePages
DICTD !!!DICTD NAME SYNOPSIS DESCRIPTION BACKGROUND OPTIONS CONFIGURATION FILE DETERMINATION OF ACCESS LEVEL SEARCH ALGORITHMS DATABASE FORMAT ACKNOWLEDGEMENTS COPYING BUGS FILES SEE ALSO ---- !!NAME dictd - a dictionary database server !!SYNOPSIS __dictd__ ''[[options] '' !!DESCRIPTION __dictd__ is a server for the Dictionary Server Protocol (DICT), a TCP transaction based query/response protocol that allows a client to access dictionary definitions from a set of natural language dictionary databases. For security reasons, dictd drops root permissions after startup. If user __dictd__ exists on the system, the daemon will run as that user, group __dictd__, otherwise it will run as user __nobody__, group __nogroup__. Since startup time is significant, the server is designed to run continuously, and should ''not'' be run from inetd(8). (However, with a fast processor, it is feasible to do so.) Databases are distributed separately from the server. !!BACKGROUND For many years, the Internet community has relied on the Fortunately, several freely-distributable dictionaries and lexicons have recently become available on the Internet. However, these freely-distributable databases are not accessible via a uniform interface, and are not accessible from a single site. They are often small and incomplete individually, but would collectively provide an interesting and useful database of English words. Examples include the Jargon file, the !WordNet database, MICRA's version of the 1913 Webster's Revised Unabridged Dictionary, and the Free Online Dictionary of Computing. (See the DICT protocol specification (RFC) for references.) Translating and non-English dictionaries are also becoming available (for example, the FOLDOC dictionary is being translated into Spanish). The webster protocol is not suitable for providing access to a large number of separate dictionary databases, and extensions to the current webster protocol were not felt to be a clean solution to the dictionary database problem. The DICT protocol is designed to provide access to multiple databases. Word definitions can be requested, the word index can be searched (using an easily extended set of algorithms), information about the server can be provided (e.g., which index search strategies are supported, or which databases are available), and information about a database can be provided (e.g., copyright, citation, or distribution information). Further, the DICT protocol has hooks that can be used to restrict access to some or all of the databases. dictd(8) is a server that implements the DICT protocol. Bret Martin implemented another server, and several people (including Bret and myself) have implemented clients in a variety of languages. !!OPTIONS __-V__ or __--version__ Display version information. __--license__ Display copyright and license information. __-h__ or __--help__ Display help information. __-v__ or __--verbose__ or __-d verbose__ Be verbose. __-c__ ''file'' or __--config__ ''file'' Specify configuration file. The default is ''/etc/dictd.conf'', but may be changed in the ''dictd.h'' file at compile time (DICT_CONFIG_FILE). __-p__ ''service'' or __--port__ ''service'' Specifies the port (e.g., 2628) or service (e.g., dict) for connections. The default is 2628, as specified in the DICT Protocol RFC, but may be changed in the ''dictd.h'' file at compile time (DICT_DEFAULT_SERVICE). __-i__ or __--inetd__ Communicate on standard input/output, suitable for use from inetd. Although, due to its rather large startup time, this daemon was not intended to run from inetd, with a fast processor it is feasible to do so. __--depth__ ''length'' Specify the queue length for listen(2). Specifies the number of pending socket connections which are queued by the operating system. Some operating systems may silently limit this value to 5 (older BSD systems) or 128 (Linux). The default is 10 but may be changed in the ''dictd.h'' file at compile time (DICT_QUEUE_DEPTH). __--delay__ ''seconds'' Specifies the number of seconds a client may be idle before the server will close the connection. Idle time is defined to be the time the server is waiting for input and does not include the time the server spends searching the database. Connections are closed without warning since no provision for premature connection termination is specified in the DICT protocol RFC. The default is 600 seconds (10 minutes), but may be changed in the ''dictd.h'' file at compile time (DICT_DEFAULT_DELAY). __--facility__ ''facility'' Specifies the syslog facility to use. The use of this option sets the -s option. The available facilities are those listed in ''syslog.conf(5)''. (Note that keywords such as __local1__ are used, not the variables such as __LOG_LOCAL1__ described in ''syslog(3)''.) The default facility is __user__. The default syslog configuration adds all logs to /var/log/syslog. Refer to ''syslog.conf(5)'' if you wish to assign a log file name for a previously unused facility, or if you desire to avoid cluttering ''/var/log/syslog'' with dictd logging messages. __-f__ or __--force__ Force the daemon to start even if an instance of the daemon is already running. (This is of little value unless a non-default port is specified with -p, since, if one instance is bound to a port, the second one fails when it can not bind to the port.) __--limit__ ''children'' Specifies the number of daemons that may be running simultaneously. Each daemon services a single connection. If the limit is exceeded, a (serialized) connection will be made by the server process, and a response code 420 (server temporarily unavailable) will be sent to the client. This parameter should be adjusted to prevent the server machine from being overloaded by dict clients, but should not be set so low that many clients are denied useful connections. The default is 100, but may be changed in the ''dictd.h'' file at compile time (DICT_DAEMON_LIMIT). __-l__ ''option'' or __--log__ ''option'' Specify a logging option. (This is effective only if logging has been enabled with the -s or -L option.) Only one option may be set with each invocation of this option; however, multiple invocations of this option may be made in one dictd command line. For instance: __dictd -s --log__ ''stats'' __--log__ ''found'' __--log__ ''notfound'' is a valid command line, and sets three logging options. Some of the more verbose options are used primarily for debugging the server code, and are not practical for normal use. __server__ Log server diagnostics. This is extremely verbose. __connect__ Log all connections. __stats__ Log all children terminations. __command__ Log all commands. This is extremely verbose. __client__ Log results of CLIENT command. __found__ Log all words found in the databases. __notfound__ Log all words not found in the databases. __timestamp__ When logging to a file, use a full timestamp like that which syslog would produce. Otherwise, no timestamp is made, making the files shorter. __host__ Log name of foreign host. __min__ Set the following options: found, notfound, stats, and client. If logging is activated (to a file, or via syslog), and no options are set, then this minimal set of options will be used. __all__ Set all of the options. __none__ Clear all of the options. To facilitate location of interesting information in the log file, entries are marked with initial letters indicating the class of the line being logged: __I__ Information about the server, connections, or termination statistics. These lines are generally not designed to be parsed automatically. __E__ Error messages. __C__ CLIENT command information. __D__ Definitions found in the databases searched. __M__ Matches found in the database searched. __N__ Matches which were not found in the databases searched. __T__ Trace of exact line sent by client. To preserve anonymity of the client, do ''not'' use the __connect__ or __host__ options. Clients may or may not send host information using the CLIENT command, but this should be an option that is selectable on the client side. __-s__ Log using the syslog(3) facility. __-L__ ''file'' or __--logfile__ ''file'' Specify the file for logging. __NOTE:__ If dictd does not have write permission for this file, it will silently fail. __-m__ ''minutes'' or __--mark__ ''minutes'' How often a timestamp should be logged. (This is effective only if logging has been enabled with the -s or -L option.) __-d__ ''option'' Activate a debugging option. There are several, all of which are only useful to developers. They are documented here for completeness. A list can be obtained interactively by using __-d__ with an illegal option. __verbose__ The same as __-v__ or __--verbose__. Adds verbosity to other options. __scan__ Debug the scanner for the configuration file. __parse__ Debug the parser for the configuration file. __search__ Debug the character folding and binary search routines. __init__ Report database initialization. __port__ Log client-side port number to the log file. __lev__ Debug Levenshtein search algorithm. __auth__ Debug the authorization routines. __nodetach__ Do not detach as a background process. Implies that a copy of the log file will appear on the standard output. __nofork__ Do not fork daemons to service requests. Be a single-threaded server. This option implies __nodetach__, and is most useful for using a debugger to find the point at which daemon processes are dumping core. __alt__ Debugs __altcompare__ in ''index.c''. !!CONFIGURATION FILE The configuration file defaults to ''/etc/dictd.conf'', but can be specified on the command line with the __-c__ option (see above). The configuration file has four distinct sections. At this time, each section must appear in the specified order, although only the Database section is required. __Syntax__ The following keywords are valid in a configuration file: access, allow, deny, group, database, data, index, filter, prefilter, postfilter, name, include, user, authonly, site. Keywords are case sensitive. String arguments that contain spaces should be surrounded by double quotes. Without quoting, strings may contain alphanumeric characters and _, -, ., and *, but not spaces. Strings must be on a single line and cannot be continued between lines. Comments start with # and extend to the end of the line. __Access Specification__ Access specifications may occur in the Access Section or in the Database Section. The access specification will be described here. For allow, deny, and authonly, a star (*) may be used as a wild card that matches any number of characters. A question mark (?) may be used as a wildcard that matches a single character. For example, 10.0.0.* and *.edu are valid strings. The syntax is as follows: __allow__ ''string'' The string specifies a domain name or IP address which is allowed access to the server (in the Access Section) or to a database (in the Database Section). __deny__ ''string'' The string specifies a domain name or IP address which is denied access to the server (in the Access Section) or to a database (in the Database Section). Note that if reverse DNS is not working, then only the IP number will be checked. Therefore, it is essential to deny networks based on IP number, since a denial based on domain name may not always be checked. __authonly__ ''string'' This form is only useful in the Access Section. The string specifies a domain name or IP address which is allowed access to the server but not to any of the databases. All commands are valid except DEFINE, MATCH, and SHOW DB. More specifically AUTH is a valid command, and commands which access the databases are not allowed. __user__''string'' This form is only useful in the Database Section. The string specifies a username that is allowed to access this database after a successful AUTH command is executed. __site__ ''string'' Used to specify the filename for the site information file, a flat text file which will be displayed in response to the SHOW SERVER command. This section, if present, must be first. __access {__ ''access specification'' __}__ This section, the second if the Site Section is present, contains access restrictions for the server and all of the databases collectively. Per-database control is specified in the Database Section __database__ ''string'' __{__ ''database specification'' __}__ This section is required. The string specifies the name of the database (e.g., wn or web1913). The database specification describes the database: __NOTE__: If the files specified in database specification do not exist on the system, dictd will silently fail. __data__ ''string'' Specifies the filename for the flat text database. __index__ ''string'' Specifies the filename for the index file. __prefilter__ ''string'' Specifies the prefilter command. When a chunk of the compressed database is read, it will be filtered with this filter before being decompressed. This may be used to provide some additional compression that knows about the data and can provide better compression than the LZ77 algorithm used by zlib. __postfilter__ ''string'' Specifies the postfilter command. When a chunk of the compressed database is read, it will be filtered with this filter before the offset and length for the entry are used to access data. This is provided for symmetry with the prefilter command, and may also be useful for providing additional database compression. __filter__ ''string'' Specifies the filter command. After the entry is extracted from the database, it will be filtered with this filter. This may be used to provide formatting for the entry (e.g., for html). __Warning:__ This is not currently implemented. __name__ ''string'' Specifies the short name of the database (e.g., dictd.h'' file at compile time (DICT_SHORT_ENTRY_NAME). __access {__ ''access specification'' __}__ Used to restrict access to this particular database. __include__ ''filename'' The text of the file ''filename'' (usually a database specification) will be read as if it appeared at this location in the configuration file. __Note for Debian Systems:__ On Debian Systems, a configuration script that creates a database specification in /var/lib/dictd/db.list is run whenever any dictionary database is installed or removed. This makes it unnecessary for the user to edit the Database section of the configuration file. __user__ ''string'' __string__ The first string specifies the username, and the second string specifies the shared secret for this username. When the AUTH command is used, the client will provide the username and a hashed version of the shared secret. If the shared secret matches, the user is said to have authenticated, and will have access to databases whose access specifications allow that user (by name, or by wildcard). If present, this section must appear last in the configuration file. There may be many user entries. The shared secret should be kept secret, as anyone who has access to it can access the shared databases (assuming access is not denied by domain name). !!DETERMINATION OF ACCESS LEVEL When a client connects, the global access specification is scanned, in order, until a specification matches. If no access specification exists, all access is allowed (e.g., the action is the same as if allow 10.42.* authonly *.edu deny * With this specification, all clients in the 10.42 network will be allowed access to unrestricted databases; all clients from *.edu sites will be allowed to authenticate, but will be denied access to all databases, even those which are otherwise unrestricted; and all other clients will have their connection terminated immediately. The 10.42 network clients can send an AUTH command and gain access to restricted databases. The *.edu clients must send an AUTH command to gain access to any databases, restricted or unrestricted. When the AUTH command is sent, the access list for each database is scanned, in order, just as the global access list is scanned. However, after authentication, the client has an associated username. For example, consider the following access specification: user u1 deny *.com user u2 allow * If the client authenticated as u1, then the client will have access to this database, even if the client comes from a *.com site. In contrast, if the client authenticated as u2, the client will only have access if it does not come from a *.com site. In this case, the __Warning:__ Checks are performed for domain names and for IP addresses. However, if reverse DNS for a specific site is not working, it is possible that a domain name may not be available for checking. Make sure that all denials use IP addresses. (And consider a future enhancement: if a domain name is not available, should denials that depend on a domain name match anything? This is the more conservative viewpoint, but it is not currently implemented.) !!SEARCH ALGORITHMS The DICT standard specifies a few search algorithms that must be implemented, and permits others to be supported on a server-dependent basis. The following search strategies are supported by this server. Note that ''all'' strategies are case insensitive. Most ignore non-alphanumeric, non-whitespace characters. __exact__ An exact match. This algorithm uses a binary search and is one of the fastest search algorithms available. __prefix__ Prefix match. This algorithm also uses a binary search and is very fast. __substring__ Match a substring anywhere in the headword. This search strategy uses a modified Boyer-Moore-Horspool algorithm. Since it must search the whole index file, it is not as fast as the exact and prefix matches. __suffix__ Suffix match. This search strategy also uses a modified Boyer-Moore-Horspool algorithm, and is as fast as the substring search. __re__ POSIX 1003.2 (modern) regular expression search. Modern regular expressions are the ones used by egrep(1). These regular expressions allow predefined character classes (e.g., [[[[:alnum:]], [[[[:alpha:]], [[[[:digit:]], and [[[[:xdigit:]] are useful for this application); uses * to match a sequence 0 or more matches of the previous atom; uses + to match a sequence of 1 or more matches of the previous atom; uses ? to match a sequence of 0 or 1 matches of the previous atom; uses ^ to match the beginning of a word, uses $ to match the end of a word, and allows nested subexpression and alternation with () and |. For example, __Warning:__ Regular expression matches can take 10 to 300 times longer than substring matches. On a busy server, with many databases, this can required more than 5 minutes of waiting time, depending on the complexity of the regular expression. __regexp__ Old (basic) regular expressions. These regular expressions don't support |, +, or ?. Groups use escaped parentheses. While modern regular expressions are generally easier to use, basic regular expressions have a back reference feature. This can be used to match a second occurrence of something that was already matched. For example, the following expression finds all words that begin and end with the same three letters: ^\(...\).*\1$ Note the use of the double backslashes to escape the special characters. This is required by the DICT protocol string specification (a single backslash quotes the next character -- we use two to get a single backslash through to the regular expression engine). __Warning:__ Note that the use of backtracking is even slower than the use of general regular expressions. __soundex__ The Soundex algorithm, a classic algorithm for finding words that sound similar to each other. The algorithm encodes each word using the first letter of the word and up to three digits. Since the first letter is known, this search is relatively fast, and it sometimes good for correcting spelling errors when the Levenshtein algorithm doesn't help. __lev__ The Levenshtein algorithm (string edit distance of one). This algorithm searches for all words which are within an edit distance of one from the target word. An !!DATABASE FORMAT Databases for __dictd__ are distributed separately. A database consists of two files. One is a flat text file, the other in the index. The flat text file contains dictionary entries (or any other suitable data), and the index contains tab-delimited tuples consisting of the headword, the byte offset at which this entry begins in the flat text file, and the length of the entry in bytes. The offset and length are encoded using base 64 encoding using the 64-character subset of International Alphabet IA5 discussed in RFC 1421 (printable encoding) and RFC 1522 (base64 MIME). Encoding the offsets in base 64 saves considerable space when compared with the usual base 10 encoding, while still permitting tab characters (ASCII 9) to be used for delimiting fields in a record. Each record ends with a newline (ASCII 10), so the index file is human readable. The flat text file may be compressed using gzip(1) (not recommended) or dictzip(1) (highly recommended). Optimal speed will be obtained using an uncompressed file. However, the __gzip__ compression algorithm works very well on plain text, and can result in space savings typically between 60 and 80%. Using a file compressed with gzip(1) is not recommended, however, because random access on the file can only be accomplished by serially decompressing the whole file, a process which is prohibitively slow. dictzip(1) uses the same compression algorithm and file format as does gzip(1), but provides a table that can be used to randomly access compressed blocks in the file. The use of 50-64kB blocks for compression typically degrades compression by less than 10%, while maintaining acceptable random access capabilities for all data in the file. As an added benefit, files compressed with dictzip(1) can be decompressed with gzip(1) or zcat(1). (Note: recompressing a __dictzip__'d file using, for example, znew(1) will destroy the random access characteristics of the file. Always compress data files using dictzip(1).) !!ACKNOWLEDGEMENTS Special thanks to Jean-loup Gailly and Mark Adler for writing the zlib general purpose data compression library. The version contained with __dictd__ is not necessarily an original version and __may have been modified__, although any modifications are probably trivial. The key features of the __dictzip__ random-access compression algorithm utilize a documented extension of the gzip format, and do not require any modifications to zlib. For more information on zlib, please see the zlib home page at'' http://quest.jpl.nasa.gov/zlib/'' Special thanks to Henry Spencer for his regex package. The package contained with __dictd__ is not necessarily an original version and __may have been modified.__ For more information on regex, please see'' ftp://zoo.toronto.edu/pub/regex.shar'' !!COPYING The main source files for the __dictd__ server and the __dictzip__ compression program were written by Rik Faith (faith@dict.org) and are distributed under the terms of the GNU General Public License. If you need to distribute under other terms, write to the author. The main libraries used by these programs (zlib, regex, libmaa) are distributed under different terms, so you may be able to use the libraries for applications which are incompatible with the GPL -- please see the copyright notices and license information that come with the libraries for more information, and consult with your attorney to resolve these issues. !!BUGS The regular expression searches do not ignore non-whitespace, non-alphanumeric characters as do the other searches. In practice, this isn't much of a problem. The databases are memory mapped and cannot be updated while the server is running. There is no way to get a running server to re-read the configuration file, so databases cannot be added or deleted on the fly. !!FILES ''/etc/dictd.conf /usr/sbin/dictd'' !!SEE ALSO dict(1), dictzip(1), gunzip(1), zcat(1), webster(1), __RFC 2229__ ----
5 pages link to
dictd(8)
:
Man8d
dictunzip(1)
dictzcat(1)
dictzip(1)
gdict(1)
This page is a man page (or other imported legacy content). We are unable to automatically determine the license status of this page.