version 1, including all changes.
.
| Rev |
Author |
# |
Line |
| 1 |
perry |
1 |
DICTZIP |
| |
|
2 |
!!!DICTZIP |
| |
|
3 |
NAME |
| |
|
4 |
SYNOPSIS |
| |
|
5 |
DESCRIPTION |
| |
|
6 |
TRADEOFFS |
| |
|
7 |
OPTIONS |
| |
|
8 |
CREDITS |
| |
|
9 |
SEE ALSO |
| |
|
10 |
---- |
| |
|
11 |
!!NAME |
| |
|
12 |
|
| |
|
13 |
|
| |
|
14 |
dictzip, dictunzip, dictzcat - compress (or expand) files, allowing random access |
| |
|
15 |
!!SYNOPSIS |
| |
|
16 |
|
| |
|
17 |
|
| |
|
18 |
__dictzip [[__''options''__]__ ''name |
| |
|
19 |
''__dictunzip [[__''options''__]__ ''name |
| |
|
20 |
''__dictzcat__ ''name |
| |
|
21 |
'' |
| |
|
22 |
!!DESCRIPTION |
| |
|
23 |
|
| |
|
24 |
|
| |
|
25 |
__dictzip__ compresses files using the gzip(1) |
| |
|
26 |
algorithm (LZ77) in a manner which is completely compatible |
| |
|
27 |
with the __gzip__ file format. An extension to the |
| |
|
28 |
__gzip__ file format (Extra Field, described in 2.3.1.1 |
| |
|
29 |
of RFC 1952) allows extra data to be stored in the header of |
| |
|
30 |
a compressed file. Programs like __gzip__ and __zcat__ |
| |
|
31 |
will ignore this extra data. However, dictd(8), the |
| |
|
32 |
DICT protocol dictionary server will make use of this data |
| |
|
33 |
to perform pseudo-random access on the file. Files in the |
| |
|
34 |
__dictzip__ format should end in |
| |
|
35 |
__gzip__ files that |
| |
|
36 |
do not contain the special header information. |
| |
|
37 |
|
| |
|
38 |
|
| |
|
39 |
From RFC 1952, the extra field is specified as |
| |
|
40 |
follows: |
| |
|
41 |
|
| |
|
42 |
|
| |
|
43 |
If the FLG.FEXTRA bit is set, an |
| |
|
44 |
|
| |
|
45 |
|
| |
|
46 |
+---+---+---+---+==================================+ |
| |
|
47 |
|SI1|SI2| LEN |... LEN bytes of subfield data ...| |
| |
|
48 |
+---+---+---+---+==================================+ |
| |
|
49 |
SI1 and SI2 provide a subfield ID, typically two ASCII letters with some mnemonic value. Jean-Loup Gailly |
| |
|
50 |
|
| |
|
51 |
|
| |
|
52 |
LEN gives the length of the subfield data, excluding the 4 |
| |
|
53 |
initial bytes. |
| |
|
54 |
|
| |
|
55 |
|
| |
|
56 |
The __dictzip__ program uses 'R' for SI1, and 'A' for SI2 |
| |
|
57 |
(i.e., |
| |
|
58 |
__ |
| |
|
59 |
|
| |
|
60 |
|
| |
|
61 |
+---+---+---+---+---+---+===============================+ |
| |
|
62 |
| VER | CHLEN | CHCNT | ... CHCNT words of data ... | |
| |
|
63 |
+---+---+---+---+---+---+===============================+ |
| |
|
64 |
As per RFC 1952, all data is stored least-significant byte first. For VER 1 of the data, all values are 16-bits long (2 bytes), and are unsigned integers. |
| |
|
65 |
|
| |
|
66 |
|
| |
|
67 |
XLEN (which is specified earlier in the header) is a two |
| |
|
68 |
byte integer, so the extra field can be 0xffff bytes long, 2 |
| |
|
69 |
bytes of which are used for the subfield ID (SI1 and SI1), |
| |
|
70 |
and 2 bytes of which are used for the subfield length (LEN). |
| |
|
71 |
This leaves 0xfffb bytes (0x7ffd 2-byte entries or 0x3ffe |
| |
|
72 |
4-byte entries). Given that the zip output buffer must be |
| |
|
73 |
10% + 12 bytes larger than the input buffer, we can store |
| |
|
74 |
58969 bytes per entry, or about 1.8GB if the 2-byte entries |
| |
|
75 |
are used. If this becomes a limiting factor, another format |
| |
|
76 |
version can be selected and defined for 4-byte |
| |
|
77 |
entries. |
| |
|
78 |
|
| |
|
79 |
|
| |
|
80 |
For compression, the file is divided up into |
| |
|
81 |
|
| |
|
82 |
|
| |
|
83 |
To perform random access on the data, the offset and length |
| |
|
84 |
of the data are provided to library routines. These routines |
| |
|
85 |
determine the chunk in which the desired data begins, and |
| |
|
86 |
decompresses that chunk. Consecutive chunks are decompressed |
| |
|
87 |
as necessary. |
| |
|
88 |
!!TRADEOFFS |
| |
|
89 |
|
| |
|
90 |
|
| |
|
91 |
__Speed__ |
| |
|
92 |
|
| |
|
93 |
|
| |
|
94 |
True random file access is not realized, since any access, |
| |
|
95 |
even for a single byte, requires that a 64kB chunk be read |
| |
|
96 |
and decompressed. This is slower than accessing a flat text |
| |
|
97 |
file, but is much, much faster than performing serial access |
| |
|
98 |
on a fully compressed file. |
| |
|
99 |
|
| |
|
100 |
|
| |
|
101 |
__Space__ |
| |
|
102 |
|
| |
|
103 |
|
| |
|
104 |
For the textual dictionary databases we are working with, |
| |
|
105 |
the use of 64kB chunks and maximal LZ77 compression realizes |
| |
|
106 |
a file which is only about 4% larger than the same file |
| |
|
107 |
compressed all at once. |
| |
|
108 |
!!OPTIONS |
| |
|
109 |
|
| |
|
110 |
|
| |
|
111 |
__-d__ or __--decompress__ |
| |
|
112 |
|
| |
|
113 |
|
| |
|
114 |
Decompress. This is the default if the executable is called |
| |
|
115 |
__dictunzip__. |
| |
|
116 |
|
| |
|
117 |
|
| |
|
118 |
__-c__ or __--stdout__ |
| |
|
119 |
|
| |
|
120 |
|
| |
|
121 |
Write output on standard output; keep original files |
| |
|
122 |
unchanged. This is only available when decompressing |
| |
|
123 |
(because parts of the header must be updated after a write |
| |
|
124 |
when compressing). |
| |
|
125 |
|
| |
|
126 |
|
| |
|
127 |
__-f__ or __--force__ |
| |
|
128 |
|
| |
|
129 |
|
| |
|
130 |
Force compression or decompression even if the output file |
| |
|
131 |
already exists. |
| |
|
132 |
|
| |
|
133 |
|
| |
|
134 |
__-h__ or __--help__ |
| |
|
135 |
|
| |
|
136 |
|
| |
|
137 |
Display help. |
| |
|
138 |
|
| |
|
139 |
|
| |
|
140 |
__-k__ or __--keep__ |
| |
|
141 |
|
| |
|
142 |
|
| |
|
143 |
Do not delete the original file. |
| |
|
144 |
|
| |
|
145 |
|
| |
|
146 |
__-l__ or __--list__ |
| |
|
147 |
|
| |
|
148 |
|
| |
|
149 |
For each compressed file, list the following |
| |
|
150 |
fields: |
| |
|
151 |
|
| |
|
152 |
|
| |
|
153 |
type: dzip, gzip, or text (includes files in unknown |
| |
|
154 |
formats) crc: CRC checksum date and time: from header |
| |
|
155 |
chunks: number of chunks in file size: size of each |
| |
|
156 |
uncompressed chunk compr.: compressed size uncompr.: |
| |
|
157 |
uncompressed size ratio: compression ratio (0.0% if unknown) |
| |
|
158 |
name: name of uncompressed file |
| |
|
159 |
|
| |
|
160 |
|
| |
|
161 |
Unlike __gzip__, the compression method is not |
| |
|
162 |
detected. |
| |
|
163 |
|
| |
|
164 |
|
| |
|
165 |
__-L__ or __--license__ |
| |
|
166 |
|
| |
|
167 |
|
| |
|
168 |
Display the __dictzip__ license and quit. |
| |
|
169 |
|
| |
|
170 |
|
| |
|
171 |
__-t__ or __--test__ |
| |
|
172 |
|
| |
|
173 |
|
| |
|
174 |
Check the compressed file integrity. This option is not |
| |
|
175 |
implemented. Instead, it will list the header |
| |
|
176 |
information. |
| |
|
177 |
|
| |
|
178 |
|
| |
|
179 |
__-v__ or __--verbose__ |
| |
|
180 |
|
| |
|
181 |
|
| |
|
182 |
Verbose. Display extra information during |
| |
|
183 |
compression. |
| |
|
184 |
|
| |
|
185 |
|
| |
|
186 |
__-V__ or __--version__ |
| |
|
187 |
|
| |
|
188 |
|
| |
|
189 |
Version. Display the version number and compilation options |
| |
|
190 |
then quit. |
| |
|
191 |
|
| |
|
192 |
|
| |
|
193 |
__-s__ ''start'' or __--start__ |
| |
|
194 |
''start'' |
| |
|
195 |
|
| |
|
196 |
|
| |
|
197 |
Specify the offer to start decompression, using decimal |
| |
|
198 |
numbers. The default is at the beginning of the |
| |
|
199 |
file. |
| |
|
200 |
|
| |
|
201 |
|
| |
|
202 |
__-e__ ''size'' or __--size__ |
| |
|
203 |
''size'' |
| |
|
204 |
|
| |
|
205 |
|
| |
|
206 |
Specify the size of the portion of the file to decompress, |
| |
|
207 |
using decimal numbers. The default is the whole |
| |
|
208 |
file. |
| |
|
209 |
|
| |
|
210 |
|
| |
|
211 |
__-S__ ''start'' or __--Start__ |
| |
|
212 |
''start'' |
| |
|
213 |
|
| |
|
214 |
|
| |
|
215 |
Specify the offer to start decompression, using base64 |
| |
|
216 |
numbers. The default is at the beginning of the |
| |
|
217 |
file. |
| |
|
218 |
|
| |
|
219 |
|
| |
|
220 |
__-E__ ''size'' or __--Size__ |
| |
|
221 |
''start'' |
| |
|
222 |
|
| |
|
223 |
|
| |
|
224 |
Specify the size of the portion of the file to decompress, |
| |
|
225 |
using base64 numbers. The default is the whole |
| |
|
226 |
file. |
| |
|
227 |
|
| |
|
228 |
|
| |
|
229 |
__-p__ ''prefilter'' or __--pre__ |
| |
|
230 |
''prefilter'' |
| |
|
231 |
|
| |
|
232 |
|
| |
|
233 |
Specify a shell command to execute as a filter before |
| |
|
234 |
compression or decompression of a chunk. The pre- and |
| |
|
235 |
post-compression filters can be used to provide additional |
| |
|
236 |
compression or output formatting. The filters may not |
| |
|
237 |
increase the buffer size significantly. The pre- and |
| |
|
238 |
post-compression filters were designed to provide the most |
| |
|
239 |
general interface possible. |
| |
|
240 |
|
| |
|
241 |
|
| |
|
242 |
__-P__ ''postfilter'' or __--post__ |
| |
|
243 |
''postfilter'' |
| |
|
244 |
|
| |
|
245 |
|
| |
|
246 |
Specify a shell command to execute as a filter after |
| |
|
247 |
compression or decompression. |
| |
|
248 |
!!CREDITS |
| |
|
249 |
|
| |
|
250 |
|
| |
|
251 |
__dictzip__ was written by Rik Faith (faith@cs.unc.edu) |
| |
|
252 |
and is distributed under the terms of the GNU General Public |
| |
|
253 |
License. If you need to distribute under other terms, write |
| |
|
254 |
to the author. |
| |
|
255 |
|
| |
|
256 |
|
| |
|
257 |
The main libraries used by this programs (zlib, regex, |
| |
|
258 |
libmaa) are distributed under different terms, so you may be |
| |
|
259 |
able to use the libraries for applications which are |
| |
|
260 |
incompatible with the GPL -- please see the copyright |
| |
|
261 |
notices and license information that come with the libraries |
| |
|
262 |
for more information, and consult with your attorney to |
| |
|
263 |
resolve these issues. |
| |
|
264 |
!!SEE ALSO |
| |
|
265 |
|
| |
|
266 |
|
| |
|
267 |
dict(1), dictd(8), gzip(1), |
| |
|
268 |
gunzip(1), zcat(1) |
| |
|
269 |
---- |