View Source: dictunzip(1) - Waikato Linux Users Group

Edit PageHistory Diff Info LikePages
DICTZIP
!!!DICTZIP
NAME
SYNOPSIS
DESCRIPTION
TRADEOFFS
OPTIONS
CREDITS
SEE ALSO
----
!!NAME


dictzip, dictunzip, dictzcat - compress (or expand) files, allowing random access
!!SYNOPSIS


__dictzip [[__''options''__]__ ''name
''__dictunzip [[__''options''__]__ ''name
''__dictzcat__ ''name
''
!!DESCRIPTION


__dictzip__ compresses files using the gzip(1)
algorithm (LZ77) in a manner which is completely compatible
with the __gzip__ file format. An extension to the
__gzip__ file format (Extra Field, described in 2.3.1.1
of RFC 1952) allows extra data to be stored in the header of
a compressed file. Programs like __gzip__ and __zcat__
will ignore this extra data. However, dictd(8), the
DICT protocol dictionary server will make use of this data
to perform pseudo-random access on the file. Files in the
__dictzip__ format should end in
__gzip__ files that
do not contain the special header information.


From RFC 1952, the extra field is specified as
follows:


If the FLG.FEXTRA bit is set, an


+---+---+---+---+==================================+
|SI1|SI2|  LEN  |... LEN bytes of subfield data ...|
+---+---+---+---+==================================+
SI1 and SI2 provide a subfield ID, typically two ASCII letters with some mnemonic value. Jean-Loup Gailly


LEN gives the length of the subfield data, excluding the 4
initial bytes.


The __dictzip__ program uses 'R' for SI1, and 'A' for SI2
(i.e.,
__


+---+---+---+---+---+---+===============================+
|  VER  | CHLEN | CHCNT |  ... CHCNT words of data ...  |
+---+---+---+---+---+---+===============================+
As per RFC 1952, all data is stored least-significant byte first. For VER 1 of the data, all values are 16-bits long (2 bytes), and are unsigned integers.


XLEN (which is specified earlier in the header) is a two
byte integer, so the extra field can be 0xffff bytes long, 2
bytes of which are used for the subfield ID (SI1 and SI1),
and 2 bytes of which are used for the subfield length (LEN).
This leaves 0xfffb bytes (0x7ffd 2-byte entries or 0x3ffe
4-byte entries). Given that the zip output buffer must be
10% + 12 bytes larger than the input buffer, we can store
58969 bytes per entry, or about 1.8GB if the 2-byte entries
are used. If this becomes a limiting factor, another format
version can be selected and defined for 4-byte
entries.


For compression, the file is divided up into


To perform random access on the data, the offset and length
of the data are provided to library routines. These routines
determine the chunk in which the desired data begins, and
decompresses that chunk. Consecutive chunks are decompressed
as necessary.
!!TRADEOFFS


__Speed__


True random file access is not realized, since any access,
even for a single byte, requires that a 64kB chunk be read
and decompressed. This is slower than accessing a flat text
file, but is much, much faster than performing serial access
on a fully compressed file.


__Space__


For the textual dictionary databases we are working with,
the use of 64kB chunks and maximal LZ77 compression realizes
a file which is only about 4% larger than the same file
compressed all at once.
!!OPTIONS


__-d__ or __--decompress__


Decompress. This is the default if the executable is called
__dictunzip__.


__-c__ or __--stdout__


Write output on standard output; keep original files
unchanged. This is only available when decompressing
(because parts of the header must be updated after a write
when compressing).


__-f__ or __--force__


Force compression or decompression even if the output file
already exists.


__-h__ or __--help__


Display help.


__-k__ or __--keep__


Do not delete the original file.


__-l__ or __--list__


For each compressed file, list the following
fields:


type: dzip, gzip, or text (includes files in unknown
formats) crc: CRC checksum date and time: from header
chunks: number of chunks in file size: size of each
uncompressed chunk compr.: compressed size uncompr.:
uncompressed size ratio: compression ratio (0.0% if unknown)
name: name of uncompressed file


Unlike __gzip__, the compression method is not
detected.


__-L__ or __--license__


Display the __dictzip__ license and quit.


__-t__ or __--test__


Check the compressed file integrity. This option is not
implemented. Instead, it will list the header
information.


__-v__ or __--verbose__


Verbose. Display extra information during
compression.


__-V__ or __--version__


Version. Display the version number and compilation options
then quit.


__-s__ ''start'' or __--start__
''start''


Specify the offer to start decompression, using decimal
numbers. The default is at the beginning of the
file.


__-e__ ''size'' or __--size__
''size''


Specify the size of the portion of the file to decompress,
using decimal numbers. The default is the whole
file.


__-S__ ''start'' or __--Start__
''start''


Specify the offer to start decompression, using base64
numbers. The default is at the beginning of the
file.


__-E__ ''size'' or __--Size__
''start''


Specify the size of the portion of the file to decompress,
using base64 numbers. The default is the whole
file.


__-p__ ''prefilter'' or __--pre__
''prefilter''


Specify a shell command to execute as a filter before
compression or decompression of a chunk. The pre- and
post-compression filters can be used to provide additional
compression or output formatting. The filters may not
increase the buffer size significantly. The pre- and
post-compression filters were designed to provide the most
general interface possible.


__-P__ ''postfilter'' or __--post__
''postfilter''


Specify a shell command to execute as a filter after
compression or decompression.
!!CREDITS


__dictzip__ was written by Rik Faith (faith@cs.unc.edu)
and is distributed under the terms of the GNU General Public
License. If you need to distribute under other terms, write
to the author.


The main libraries used by this programs (zlib, regex,
libmaa) are distributed under different terms, so you may be
able to use the libraries for applications which are
incompatible with the GPL -- please see the copyright
notices and license information that come with the libraries
for more information, and consult with your attorney to
resolve these issues.
!!SEE ALSO


dict(1), dictd(8), gzip(1),
gunzip(1), zcat(1)
----
One page links to dictunzip(1):
Man1d
This page is a man page (or other imported legacy content). We are unable to automatically determine the license status of this page.
Last edited on Tuesday, June 4, 2002 12:21:55 am by "perry"
Edit PageHistory Diff Info LikePages