Penguin
Blame: dictunzip(1)
EditPageHistoryDiffInfoLikePages
Annotated edit history of dictunzip(1) version 1, including all changes. View license author blame.
Rev Author # Line
1 perry 1 DICTZIP
2 !!!DICTZIP
3 NAME
4 SYNOPSIS
5 DESCRIPTION
6 TRADEOFFS
7 OPTIONS
8 CREDITS
9 SEE ALSO
10 ----
11 !!NAME
12
13
14 dictzip, dictunzip, dictzcat - compress (or expand) files, allowing random access
15 !!SYNOPSIS
16
17
18 __dictzip [[__''options''__]__ ''name
19 ''__dictunzip [[__''options''__]__ ''name
20 ''__dictzcat__ ''name
21 ''
22 !!DESCRIPTION
23
24
25 __dictzip__ compresses files using the gzip(1)
26 algorithm (LZ77) in a manner which is completely compatible
27 with the __gzip__ file format. An extension to the
28 __gzip__ file format (Extra Field, described in 2.3.1.1
29 of RFC 1952) allows extra data to be stored in the header of
30 a compressed file. Programs like __gzip__ and __zcat__
31 will ignore this extra data. However, dictd(8), the
32 DICT protocol dictionary server will make use of this data
33 to perform pseudo-random access on the file. Files in the
34 __dictzip__ format should end in
35 __gzip__ files that
36 do not contain the special header information.
37
38
39 From RFC 1952, the extra field is specified as
40 follows:
41
42
43 If the FLG.FEXTRA bit is set, an
44
45
46 +---+---+---+---+==================================+
47 |SI1|SI2| LEN |... LEN bytes of subfield data ...|
48 +---+---+---+---+==================================+
49 SI1 and SI2 provide a subfield ID, typically two ASCII letters with some mnemonic value. Jean-Loup Gailly
50
51
52 LEN gives the length of the subfield data, excluding the 4
53 initial bytes.
54
55
56 The __dictzip__ program uses 'R' for SI1, and 'A' for SI2
57 (i.e.,
58 __
59
60
61 +---+---+---+---+---+---+===============================+
62 | VER | CHLEN | CHCNT | ... CHCNT words of data ... |
63 +---+---+---+---+---+---+===============================+
64 As per RFC 1952, all data is stored least-significant byte first. For VER 1 of the data, all values are 16-bits long (2 bytes), and are unsigned integers.
65
66
67 XLEN (which is specified earlier in the header) is a two
68 byte integer, so the extra field can be 0xffff bytes long, 2
69 bytes of which are used for the subfield ID (SI1 and SI1),
70 and 2 bytes of which are used for the subfield length (LEN).
71 This leaves 0xfffb bytes (0x7ffd 2-byte entries or 0x3ffe
72 4-byte entries). Given that the zip output buffer must be
73 10% + 12 bytes larger than the input buffer, we can store
74 58969 bytes per entry, or about 1.8GB if the 2-byte entries
75 are used. If this becomes a limiting factor, another format
76 version can be selected and defined for 4-byte
77 entries.
78
79
80 For compression, the file is divided up into
81
82
83 To perform random access on the data, the offset and length
84 of the data are provided to library routines. These routines
85 determine the chunk in which the desired data begins, and
86 decompresses that chunk. Consecutive chunks are decompressed
87 as necessary.
88 !!TRADEOFFS
89
90
91 __Speed__
92
93
94 True random file access is not realized, since any access,
95 even for a single byte, requires that a 64kB chunk be read
96 and decompressed. This is slower than accessing a flat text
97 file, but is much, much faster than performing serial access
98 on a fully compressed file.
99
100
101 __Space__
102
103
104 For the textual dictionary databases we are working with,
105 the use of 64kB chunks and maximal LZ77 compression realizes
106 a file which is only about 4% larger than the same file
107 compressed all at once.
108 !!OPTIONS
109
110
111 __-d__ or __--decompress__
112
113
114 Decompress. This is the default if the executable is called
115 __dictunzip__.
116
117
118 __-c__ or __--stdout__
119
120
121 Write output on standard output; keep original files
122 unchanged. This is only available when decompressing
123 (because parts of the header must be updated after a write
124 when compressing).
125
126
127 __-f__ or __--force__
128
129
130 Force compression or decompression even if the output file
131 already exists.
132
133
134 __-h__ or __--help__
135
136
137 Display help.
138
139
140 __-k__ or __--keep__
141
142
143 Do not delete the original file.
144
145
146 __-l__ or __--list__
147
148
149 For each compressed file, list the following
150 fields:
151
152
153 type: dzip, gzip, or text (includes files in unknown
154 formats) crc: CRC checksum date and time: from header
155 chunks: number of chunks in file size: size of each
156 uncompressed chunk compr.: compressed size uncompr.:
157 uncompressed size ratio: compression ratio (0.0% if unknown)
158 name: name of uncompressed file
159
160
161 Unlike __gzip__, the compression method is not
162 detected.
163
164
165 __-L__ or __--license__
166
167
168 Display the __dictzip__ license and quit.
169
170
171 __-t__ or __--test__
172
173
174 Check the compressed file integrity. This option is not
175 implemented. Instead, it will list the header
176 information.
177
178
179 __-v__ or __--verbose__
180
181
182 Verbose. Display extra information during
183 compression.
184
185
186 __-V__ or __--version__
187
188
189 Version. Display the version number and compilation options
190 then quit.
191
192
193 __-s__ ''start'' or __--start__
194 ''start''
195
196
197 Specify the offer to start decompression, using decimal
198 numbers. The default is at the beginning of the
199 file.
200
201
202 __-e__ ''size'' or __--size__
203 ''size''
204
205
206 Specify the size of the portion of the file to decompress,
207 using decimal numbers. The default is the whole
208 file.
209
210
211 __-S__ ''start'' or __--Start__
212 ''start''
213
214
215 Specify the offer to start decompression, using base64
216 numbers. The default is at the beginning of the
217 file.
218
219
220 __-E__ ''size'' or __--Size__
221 ''start''
222
223
224 Specify the size of the portion of the file to decompress,
225 using base64 numbers. The default is the whole
226 file.
227
228
229 __-p__ ''prefilter'' or __--pre__
230 ''prefilter''
231
232
233 Specify a shell command to execute as a filter before
234 compression or decompression of a chunk. The pre- and
235 post-compression filters can be used to provide additional
236 compression or output formatting. The filters may not
237 increase the buffer size significantly. The pre- and
238 post-compression filters were designed to provide the most
239 general interface possible.
240
241
242 __-P__ ''postfilter'' or __--post__
243 ''postfilter''
244
245
246 Specify a shell command to execute as a filter after
247 compression or decompression.
248 !!CREDITS
249
250
251 __dictzip__ was written by Rik Faith (faith@cs.unc.edu)
252 and is distributed under the terms of the GNU General Public
253 License. If you need to distribute under other terms, write
254 to the author.
255
256
257 The main libraries used by this programs (zlib, regex,
258 libmaa) are distributed under different terms, so you may be
259 able to use the libraries for applications which are
260 incompatible with the GPL -- please see the copyright
261 notices and license information that come with the libraries
262 for more information, and consult with your attorney to
263 resolve these issues.
264 !!SEE ALSO
265
266
267 dict(1), dictd(8), gzip(1),
268 gunzip(1), zcat(1)
269 ----
This page is a man page (or other imported legacy content). We are unable to automatically determine the license status of this page.