version 1, including all changes.
.
Rev |
Author |
# |
Line |
1 |
perry |
1 |
DICTZIP |
|
|
2 |
!!!DICTZIP |
|
|
3 |
NAME |
|
|
4 |
SYNOPSIS |
|
|
5 |
DESCRIPTION |
|
|
6 |
TRADEOFFS |
|
|
7 |
OPTIONS |
|
|
8 |
CREDITS |
|
|
9 |
SEE ALSO |
|
|
10 |
---- |
|
|
11 |
!!NAME |
|
|
12 |
|
|
|
13 |
|
|
|
14 |
dictzip, dictunzip, dictzcat - compress (or expand) files, allowing random access |
|
|
15 |
!!SYNOPSIS |
|
|
16 |
|
|
|
17 |
|
|
|
18 |
__dictzip [[__''options''__]__ ''name |
|
|
19 |
''__dictunzip [[__''options''__]__ ''name |
|
|
20 |
''__dictzcat__ ''name |
|
|
21 |
'' |
|
|
22 |
!!DESCRIPTION |
|
|
23 |
|
|
|
24 |
|
|
|
25 |
__dictzip__ compresses files using the gzip(1) |
|
|
26 |
algorithm (LZ77) in a manner which is completely compatible |
|
|
27 |
with the __gzip__ file format. An extension to the |
|
|
28 |
__gzip__ file format (Extra Field, described in 2.3.1.1 |
|
|
29 |
of RFC 1952) allows extra data to be stored in the header of |
|
|
30 |
a compressed file. Programs like __gzip__ and __zcat__ |
|
|
31 |
will ignore this extra data. However, dictd(8), the |
|
|
32 |
DICT protocol dictionary server will make use of this data |
|
|
33 |
to perform pseudo-random access on the file. Files in the |
|
|
34 |
__dictzip__ format should end in |
|
|
35 |
__gzip__ files that |
|
|
36 |
do not contain the special header information. |
|
|
37 |
|
|
|
38 |
|
|
|
39 |
From RFC 1952, the extra field is specified as |
|
|
40 |
follows: |
|
|
41 |
|
|
|
42 |
|
|
|
43 |
If the FLG.FEXTRA bit is set, an |
|
|
44 |
|
|
|
45 |
|
|
|
46 |
+---+---+---+---+==================================+ |
|
|
47 |
|SI1|SI2| LEN |... LEN bytes of subfield data ...| |
|
|
48 |
+---+---+---+---+==================================+ |
|
|
49 |
SI1 and SI2 provide a subfield ID, typically two ASCII letters with some mnemonic value. Jean-Loup Gailly |
|
|
50 |
|
|
|
51 |
|
|
|
52 |
LEN gives the length of the subfield data, excluding the 4 |
|
|
53 |
initial bytes. |
|
|
54 |
|
|
|
55 |
|
|
|
56 |
The __dictzip__ program uses 'R' for SI1, and 'A' for SI2 |
|
|
57 |
(i.e., |
|
|
58 |
__ |
|
|
59 |
|
|
|
60 |
|
|
|
61 |
+---+---+---+---+---+---+===============================+ |
|
|
62 |
| VER | CHLEN | CHCNT | ... CHCNT words of data ... | |
|
|
63 |
+---+---+---+---+---+---+===============================+ |
|
|
64 |
As per RFC 1952, all data is stored least-significant byte first. For VER 1 of the data, all values are 16-bits long (2 bytes), and are unsigned integers. |
|
|
65 |
|
|
|
66 |
|
|
|
67 |
XLEN (which is specified earlier in the header) is a two |
|
|
68 |
byte integer, so the extra field can be 0xffff bytes long, 2 |
|
|
69 |
bytes of which are used for the subfield ID (SI1 and SI1), |
|
|
70 |
and 2 bytes of which are used for the subfield length (LEN). |
|
|
71 |
This leaves 0xfffb bytes (0x7ffd 2-byte entries or 0x3ffe |
|
|
72 |
4-byte entries). Given that the zip output buffer must be |
|
|
73 |
10% + 12 bytes larger than the input buffer, we can store |
|
|
74 |
58969 bytes per entry, or about 1.8GB if the 2-byte entries |
|
|
75 |
are used. If this becomes a limiting factor, another format |
|
|
76 |
version can be selected and defined for 4-byte |
|
|
77 |
entries. |
|
|
78 |
|
|
|
79 |
|
|
|
80 |
For compression, the file is divided up into |
|
|
81 |
|
|
|
82 |
|
|
|
83 |
To perform random access on the data, the offset and length |
|
|
84 |
of the data are provided to library routines. These routines |
|
|
85 |
determine the chunk in which the desired data begins, and |
|
|
86 |
decompresses that chunk. Consecutive chunks are decompressed |
|
|
87 |
as necessary. |
|
|
88 |
!!TRADEOFFS |
|
|
89 |
|
|
|
90 |
|
|
|
91 |
__Speed__ |
|
|
92 |
|
|
|
93 |
|
|
|
94 |
True random file access is not realized, since any access, |
|
|
95 |
even for a single byte, requires that a 64kB chunk be read |
|
|
96 |
and decompressed. This is slower than accessing a flat text |
|
|
97 |
file, but is much, much faster than performing serial access |
|
|
98 |
on a fully compressed file. |
|
|
99 |
|
|
|
100 |
|
|
|
101 |
__Space__ |
|
|
102 |
|
|
|
103 |
|
|
|
104 |
For the textual dictionary databases we are working with, |
|
|
105 |
the use of 64kB chunks and maximal LZ77 compression realizes |
|
|
106 |
a file which is only about 4% larger than the same file |
|
|
107 |
compressed all at once. |
|
|
108 |
!!OPTIONS |
|
|
109 |
|
|
|
110 |
|
|
|
111 |
__-d__ or __--decompress__ |
|
|
112 |
|
|
|
113 |
|
|
|
114 |
Decompress. This is the default if the executable is called |
|
|
115 |
__dictunzip__. |
|
|
116 |
|
|
|
117 |
|
|
|
118 |
__-c__ or __--stdout__ |
|
|
119 |
|
|
|
120 |
|
|
|
121 |
Write output on standard output; keep original files |
|
|
122 |
unchanged. This is only available when decompressing |
|
|
123 |
(because parts of the header must be updated after a write |
|
|
124 |
when compressing). |
|
|
125 |
|
|
|
126 |
|
|
|
127 |
__-f__ or __--force__ |
|
|
128 |
|
|
|
129 |
|
|
|
130 |
Force compression or decompression even if the output file |
|
|
131 |
already exists. |
|
|
132 |
|
|
|
133 |
|
|
|
134 |
__-h__ or __--help__ |
|
|
135 |
|
|
|
136 |
|
|
|
137 |
Display help. |
|
|
138 |
|
|
|
139 |
|
|
|
140 |
__-k__ or __--keep__ |
|
|
141 |
|
|
|
142 |
|
|
|
143 |
Do not delete the original file. |
|
|
144 |
|
|
|
145 |
|
|
|
146 |
__-l__ or __--list__ |
|
|
147 |
|
|
|
148 |
|
|
|
149 |
For each compressed file, list the following |
|
|
150 |
fields: |
|
|
151 |
|
|
|
152 |
|
|
|
153 |
type: dzip, gzip, or text (includes files in unknown |
|
|
154 |
formats) crc: CRC checksum date and time: from header |
|
|
155 |
chunks: number of chunks in file size: size of each |
|
|
156 |
uncompressed chunk compr.: compressed size uncompr.: |
|
|
157 |
uncompressed size ratio: compression ratio (0.0% if unknown) |
|
|
158 |
name: name of uncompressed file |
|
|
159 |
|
|
|
160 |
|
|
|
161 |
Unlike __gzip__, the compression method is not |
|
|
162 |
detected. |
|
|
163 |
|
|
|
164 |
|
|
|
165 |
__-L__ or __--license__ |
|
|
166 |
|
|
|
167 |
|
|
|
168 |
Display the __dictzip__ license and quit. |
|
|
169 |
|
|
|
170 |
|
|
|
171 |
__-t__ or __--test__ |
|
|
172 |
|
|
|
173 |
|
|
|
174 |
Check the compressed file integrity. This option is not |
|
|
175 |
implemented. Instead, it will list the header |
|
|
176 |
information. |
|
|
177 |
|
|
|
178 |
|
|
|
179 |
__-v__ or __--verbose__ |
|
|
180 |
|
|
|
181 |
|
|
|
182 |
Verbose. Display extra information during |
|
|
183 |
compression. |
|
|
184 |
|
|
|
185 |
|
|
|
186 |
__-V__ or __--version__ |
|
|
187 |
|
|
|
188 |
|
|
|
189 |
Version. Display the version number and compilation options |
|
|
190 |
then quit. |
|
|
191 |
|
|
|
192 |
|
|
|
193 |
__-s__ ''start'' or __--start__ |
|
|
194 |
''start'' |
|
|
195 |
|
|
|
196 |
|
|
|
197 |
Specify the offer to start decompression, using decimal |
|
|
198 |
numbers. The default is at the beginning of the |
|
|
199 |
file. |
|
|
200 |
|
|
|
201 |
|
|
|
202 |
__-e__ ''size'' or __--size__ |
|
|
203 |
''size'' |
|
|
204 |
|
|
|
205 |
|
|
|
206 |
Specify the size of the portion of the file to decompress, |
|
|
207 |
using decimal numbers. The default is the whole |
|
|
208 |
file. |
|
|
209 |
|
|
|
210 |
|
|
|
211 |
__-S__ ''start'' or __--Start__ |
|
|
212 |
''start'' |
|
|
213 |
|
|
|
214 |
|
|
|
215 |
Specify the offer to start decompression, using base64 |
|
|
216 |
numbers. The default is at the beginning of the |
|
|
217 |
file. |
|
|
218 |
|
|
|
219 |
|
|
|
220 |
__-E__ ''size'' or __--Size__ |
|
|
221 |
''start'' |
|
|
222 |
|
|
|
223 |
|
|
|
224 |
Specify the size of the portion of the file to decompress, |
|
|
225 |
using base64 numbers. The default is the whole |
|
|
226 |
file. |
|
|
227 |
|
|
|
228 |
|
|
|
229 |
__-p__ ''prefilter'' or __--pre__ |
|
|
230 |
''prefilter'' |
|
|
231 |
|
|
|
232 |
|
|
|
233 |
Specify a shell command to execute as a filter before |
|
|
234 |
compression or decompression of a chunk. The pre- and |
|
|
235 |
post-compression filters can be used to provide additional |
|
|
236 |
compression or output formatting. The filters may not |
|
|
237 |
increase the buffer size significantly. The pre- and |
|
|
238 |
post-compression filters were designed to provide the most |
|
|
239 |
general interface possible. |
|
|
240 |
|
|
|
241 |
|
|
|
242 |
__-P__ ''postfilter'' or __--post__ |
|
|
243 |
''postfilter'' |
|
|
244 |
|
|
|
245 |
|
|
|
246 |
Specify a shell command to execute as a filter after |
|
|
247 |
compression or decompression. |
|
|
248 |
!!CREDITS |
|
|
249 |
|
|
|
250 |
|
|
|
251 |
__dictzip__ was written by Rik Faith (faith@cs.unc.edu) |
|
|
252 |
and is distributed under the terms of the GNU General Public |
|
|
253 |
License. If you need to distribute under other terms, write |
|
|
254 |
to the author. |
|
|
255 |
|
|
|
256 |
|
|
|
257 |
The main libraries used by this programs (zlib, regex, |
|
|
258 |
libmaa) are distributed under different terms, so you may be |
|
|
259 |
able to use the libraries for applications which are |
|
|
260 |
incompatible with the GPL -- please see the copyright |
|
|
261 |
notices and license information that come with the libraries |
|
|
262 |
for more information, and consult with your attorney to |
|
|
263 |
resolve these issues. |
|
|
264 |
!!SEE ALSO |
|
|
265 |
|
|
|
266 |
|
|
|
267 |
dict(1), dictd(8), gzip(1), |
|
|
268 |
gunzip(1), zcat(1) |
|
|
269 |
---- |