Blame: bunzip2(1) - Waikato Linux Users Group

Annotated edit history of bunzip2(1) version 1 showing authors affecting page license. View with all changes included.

Rev	Author	#	Line
1	perry	1	`bzip2`
		2	`!!!bzip2`
		3	`NAME`
		4	`SYNOPSIS`
		5	`DESCRIPTION`
		6	`OPTIONS`
		7	`MEMORY MANAGEMENT`
		8	`RECOVERING DATA FROM DAMAGED FILES`
		9	`PERFORMANCE NOTES`
		10	`CAVEATS`
		11	`AUTHOR`
		12	`----`
		13	`!!NAME`
		14
		15
		16	`bzip2, bunzip2 - a block-sorting file compressor, v1.0.2`
		17	`bzcat - decompresses files to stdout`
		18	`bzip2recover - recovers data from damaged bzip2 files`
		19	`!!SYNOPSIS`
		20
		21
		22	`__bzip2__ [[ __-cdfkqstvzVL123456789__ ] [[ ''filenames`
		23	`...'' ]__`
		24	`bunzip2__ [[ __-fkvsVL__ ] [[ ''filenames ...''`
		25	`]__`
		26	`bzcat__ [[ __-s__ ] [[ ''filenames ...'' ]__`
		27	`bzip2recover__ ''filename''`
		28	`!!DESCRIPTION`
		29
		30
		31	`''bzip2'' compresses files using the Burrows-Wheeler`
		32	`block sorting text compression algorithm, and Huffman`
		33	`coding. Compression is generally considerably better than`
		34	`that achieved by more conventional LZ77/LZ78-based`
		35	`compressors, and approaches the performance of the PPM`
		36	`family of statistical compressors.`
		37
		38
		39	`The command-line options are deliberately very similar to`
		40	`those of ''GNU gzip,'' but they are not`
		41	`identical.`
		42
		43
		44	`''bzip2'' expects a list of file names to accompany the`
		45	`command-line flags. Each file is replaced by a compressed`
		46	`version of itself, with the name`
		47	`''`
		48
		49
		50	`''bzip2'' and ''bunzip2'' will by default not`
		51	`overwrite existing files. If you want this to happen,`
		52	`specify the -f flag.`
		53
		54
		55	`If no file names are specified, ''bzip2'' compresses from`
		56	`standard input to standard output. In this case,`
		57	`''bzip2'' will decline to write compressed output to a`
		58	`terminal, as this would be entirely incomprehensible and`
		59	`therefore pointless.`
		60
		61
		62	`''bunzip2'' (or ''bzip2 -d)'' decompresses all`
		63	`specified files. Files which were not created by`
		64	`''bzip2'' will be detected and ignored, and a warning`
		65	`issued. ''bzip2'' attempts to guess the filename for the`
		66	`decompressed file from that of the compressed file as`
		67	`follows:`
		68
		69
		70	`filename.bz2 becomes filename filename.bz becomes filename`
		71	`filename.tbz2 becomes filename.tar filename.tbz becomes`
		72	`filename.tar anyothername becomes`
		73	`anyothername.out`
		74
		75
		76	`If the file does not end in one of the recognised endings,`
		77	`''.bz2, .bz, .tbz2'' or ''.tbz, bzip2'' complains that`
		78	`it cannot guess the name of the original file, and uses the`
		79	`original name with ''.out'' appended.`
		80
		81
		82	`As with compression, supplying no filenames causes`
		83	`decompression from standard input to standard`
		84	`output.`
		85
		86
		87	`''bunzip2'' will correctly decompress a file which is the`
		88	`concatenation of two or more compressed files. The result is`
		89	`the concatenation of the corresponding uncompressed files.`
		90	`Integrity testing (-t) of concatenated compressed files is`
		91	`also supported.`
		92
		93
		94	`You can also compress or decompress files to the standard`
		95	`output by giving the -c flag. Multiple files may be`
		96	`compressed and decompressed like this. The resulting outputs`
		97	`are fed sequentially to stdout. Compression of multiple`
		98	`files in this manner generates a stream containing multiple`
		99	`compressed file representations. Such a stream can be`
		100	`decompressed correctly only by ''bzip2'' version 0.9.0 or`
		101	`later. Earlier versions of ''bzip2'' will stop after`
		102	`decompressing the first file in the stream.`
		103
		104
		105	`''bzcat'' (or ''bzip2 -dc)'' decompresses all`
		106	`specified files to the standard output.`
		107
		108
		109	`''bzip2'' will read arguments from the environment`
		110	`variables ''BZIP2'' and ''BZIP,'' in that order, and`
		111	`will process them before any arguments read from the command`
		112	`line. This gives a convenient way to supply default`
		113	`arguments.`
		114
		115
		116	`Compression is always performed, even if the compressed file`
		117	`is slightly larger than the original. Files of less than`
		118	`about one hundred bytes tend to get larger, since the`
		119	`compression mechanism has a constant overhead in the region`
		120	`of 50 bytes. Random data (including the output of most file`
		121	`compressors) is coded at about 8.05 bits per byte, giving an`
		122	`expansion of around 0.5%.`
		123
		124
		125	`As a self-check for your protection, ''bzip2'' uses`
		126	`32-bit CRCs to make sure that the decompressed version of a`
		127	`file is identical to the original. This guards against`
		128	`corruption of the compressed data, and against undetected`
		129	`bugs in ''bzip2'' (hopefully very unlikely). The chances`
		130	`of data corruption going undetected is microscopic, about`
		131	`one chance in four billion for each file processed. Be`
		132	`aware, though, that the check occurs upon decompression, so`
		133	`it can only tell you that something is wrong. It can't help`
		134	`you recover the original uncompressed data. You can use`
		135	`''bzip2recover'' to try to recover data from damaged`
		136	`files.`
		137
		138
		139	`Return values: 0 for a normal exit, 1 for environmental`
		140	`problems (file not found, invalid flags, I/O errors,`
		141	`bzip2'' to panic.`
		142	`!!OPTIONS`
		143
		144
		145	`__-c --stdout__`
		146
		147
		148	`Compress or decompress to standard output.`
		149
		150
		151	`__-d --decompress__`
		152
		153
		154	`Force decompression. ''bzip2, bunzip2'' and ''bzcat''`
		155	`are really the same program, and the decision about what`
		156	`actions to take is done on the basis of which name is used.`
		157	`This flag overrides that mechanism, and forces ''bzip2''`
		158	`to decompress.`
		159
		160
		161	`__-z --compress__`
		162
		163
		164	`The complement to -d: forces compression, regardless of the`
		165	`invocation name.`
		166
		167
		168	`__-t --test__`
		169
		170
		171	`Check integrity of the specified file(s), but don't`
		172	`decompress them. This really performs a trial decompression`
		173	`and throws away the result.`
		174
		175
		176	`__-f --force__`
		177
		178
		179	`Force overwrite of output files. Normally, ''bzip2'' will`
		180	`not overwrite existing output files. Also forces`
		181	`''bzip2'' to break hard links to files, which it`
		182	`otherwise wouldn't do.`
		183
		184
		185	`bzip2 normally declines to decompress files which don't have`
		186	`the correct magic header bytes. If forced (-f), however, it`
		187	`will pass such files through unmodified. This is how GNU`
		188	`gzip behaves.`
		189
		190
		191	`__-k --keep__`
		192
		193
		194	`Keep (don't delete) input files during compression or`
		195	`decompression.`
		196
		197
		198	`__-s --small__`
		199
		200
		201	`Reduce memory usage, for compression, decompression and`
		202	`testing. Files are decompressed and tested using a modified`
		203	`algorithm which only requires 2.5 bytes per block byte. This`
		204	`means any file can be decompressed in 2300k of memory,`
		205	`albeit at about half the normal speed.`
		206
		207
		208	`During compression, -s selects a block size of 200k, which`
		209	`limits memory use to around the same figure, at the expense`
		210	`of your compression ratio. In short, if your machine is low`
		211	`on memory (8 megabytes or less), use -s for everything. See`
		212	`MEMORY MANAGEMENT below.`
		213
		214
		215	`__-q --quiet__`
		216
		217
		218	`Suppress non-essential warning messages. Messages pertaining`
		219	`to I/O errors and other critical events will not be`
		220	`suppressed.`
		221
		222
		223	`__-v --verbose__`
		224
		225
		226	`Verbose mode -- show the compression ratio for each file`
		227	`processed. Further -v's increase the verbosity level,`
		228	`spewing out lots of information which is primarily of`
		229	`interest for diagnostic purposes.`
		230
		231
		232	`__-L --license -V --version__`
		233
		234
		235	`Display the software version, license terms and`
		236	`conditions.`
		237
		238
		239	`__-1 (or --fast) to -9 (or --best)__`
		240
		241
		242	`Set the block size to 100 k, 200 k .. 900 k when`
		243	`compressing. Has no effect when decompressing. See MEMORY`
		244	`MANAGEMENT below. The --fast and --best aliases are`
		245	`primarily for GNU gzip compatibility. In particular, --fast`
		246	`doesn't make things significantly faster. And --best merely`
		247	`selects the default behaviour.`
		248
		249
		250	`__--__`
		251
		252
		253	`Treats all subsequent arguments as file names, even if they`
		254	`start with a dash. This is so you can handle files with`
		255	`names beginning with a dash, for example: bzip2 --`
		256	`-myfilename.`
		257
		258
		259	`__--repetitive-fast --repetitive-best__`
		260
		261
		262	`These flags are redundant in versions 0.9.5 and above. They`
		263	`provided some coarse control over the behaviour of the`
		264	`sorting algorithm in earlier versions, which was sometimes`
		265	`useful. 0.9.5 and above have an improved algorithm which`
		266	`renders these flags irrelevant.`
		267	`!!MEMORY MANAGEMENT`
		268
		269
		270	`''bzip2'' compresses large files in blocks. The block`
		271	`size affects both the compression ratio achieved, and the`
		272	`amount of memory needed for compression and decompression.`
		273	`The flags -1 through -9 specify the block size to be 100,000`
		274	`bytes through 900,000 bytes (the default) respectively. At`
		275	`decompression time, the block size used for compression is`
		276	`read from the header of the compressed file, and`
		277	`''bunzip2'' then allocates itself just enough memory to`
		278	`decompress the file. Since block sizes are stored in`
		279	`compressed files, it follows that the flags -1 to -9 are`
		280	`irrelevant to and so ignored during`
		281	`decompression.`
		282
		283
		284	`Compression and decompression requirements, in bytes, can be`
		285	`estimated as:`
		286
		287
		288	`Compression: 400k + ( 8 x block size )`
		289
		290
		291	`Decompression: 100k + ( 4 x block size ), or 100k + ( 2.5 x`
		292	`block size )`
		293
		294
		295	`Larger block sizes give rapidly diminishing marginal`
		296	`returns. Most of the compression comes from the first two or`
		297	`three hundred k of block size, a fact worth bearing in mind`
		298	`when using ''bzip2'' on small machines. It is also`
		299	`important to appreciate that the decompression memory`
		300	`requirement is set at compression time by the choice of`
		301	`block size.`
		302
		303
		304	`For files compressed with the default 900k block size,`
		305	`''bunzip2'' will require about 3700 kbytes to decompress.`
		306	`To support decompression of any file on a 4 megabyte`
		307	`machine, ''bunzip2'' has an option to decompress using`
		308	`approximately half this amount of memory, about 2300 kbytes.`
		309	`Decompression speed is also halved, so you should use this`
		310	`option only where necessary. The relevant flag is`
		311	`-s.`
		312
		313
		314	`In general, try and use the largest block size memory`
		315	`constraints allow, since that maximises the compression`
		316	`achieved. Compression and decompression speed are virtually`
		317	`unaffected by block size.`
		318
		319
		320	`Another significant point applies to files which fit in a`
		321	`single block -- that means most files you'd encounter using`
		322	`a large block size. The amount of real memory touched is`
		323	`proportional to the size of the file, since the file is`
		324	`smaller than a block. For example, compressing a file 20,000`
		325	`bytes long with the flag -9 will cause the compressor to`
		326	`allocate around 7600k of memory, but only touch 400k + 20000`
		327	`* 8 = 560 kbytes of it. Similarly, the decompressor will`
		328	`allocate 3700k but only touch 100k + 20000 * 4 = 180`
		329	`kbytes.`
		330
		331
		332	`Here is a table which summarises the maximum memory usage`
		333	`for different block sizes. Also recorded is the total`
		334	`compressed size for 14 files of the Calgary Text Compression`
		335	`Corpus totalling 3,141,622 bytes. This column gives some`
		336	`feel for how compression varies with block size. These`
		337	`figures tend to understate the advantage of larger block`
		338	`sizes for larger files, since the Corpus is dominated by`
		339	`smaller files.`
		340
		341
		342	`Compress Decompress Decompress Corpus Flag usage usage -s`
		343	`usage Size`
		344
		345
		346	`-1 1200k 500k 350k 914704 -2 2000k 900k 600k 877703 -3 2800k`
		347	`1300k 850k 860338 -4 3600k 1700k 1100k 846899 -5 4400k 2100k`
		348	`1350k 845160 -6 5200k 2500k 1600k 838626 -7 6100k 2900k`
		349	`1850k 834096 -8 6800k 3300k 2100k 828642 -9 7600k 3700k`
		350	`2350k 828642`
		351	`!!RECOVERING DATA FROM DAMAGED FILES`
		352
		353
		354	`''bzip2'' compresses files in blocks, usually 900kbytes`
		355	`long. Each block is handled independently. If a media or`
		356	`transmission error causes a multi-block .bz2 file to become`
		357	`damaged, it may be possible to recover data from the`
		358	`undamaged blocks in the file.`
		359
		360
		361	`The compressed representation of each block is delimited by`
		362	`a 48-bit pattern, which makes it possible to find the block`
		363	`boundaries with reasonable certainty. Each block also`
		364	`carries its own 32-bit CRC, so damaged blocks can be`
		365	`distinguished from undamaged ones.`
		366
		367
		368	`''bzip2recover'' is a simple program whose purpose is to`
		369	`search for blocks in .bz2 files, and write each block out`
		370	`into its own .bz2 file. You can then use ''bzip2'' -t to`
		371	`test the integrity of the resulting files, and decompress`
		372	`those which are undamaged.`
		373
		374
		375	`''bzip2recover'' takes a single argument, the name of the`
		376	`damaged file, and writes a number of files`
		377	`''`
		378
		379
		380	`''bzip2recover'' should be of most use dealing with large`
		381	`.bz2 files, as these will contain many blocks. It is clearly`
		382	`futile to use it on damaged single-block files, since a`
		383	`damaged block cannot be recovered. If you wish to minimise`
		384	`any potential data loss through media or transmission`
		385	`errors, you might consider compressing with a smaller block`
		386	`size.`
		387	`!!PERFORMANCE NOTES`
		388
		389
		390	`The sorting phase of compression gathers together similar`
		391	`strings in the file. Because of this, files containing very`
		392	`long runs of repeated symbols, like`
		393
		394
		395	`Decompression speed is unaffected by these`
		396	`phenomena.`
		397
		398
		399	`''bzip2'' usually allocates several megabytes of memory`
		400	`to operate in, and then charges all over it in a fairly`
		401	`random fashion. This means that performance, both for`
		402	`compressing and decompressing, is largely determined by the`
		403	`speed at which your machine can service cache misses.`
		404	`Because of this, small changes to the code to reduce the`
		405	`miss rate have been observed to give disproportionately`
		406	`large performance improvements. I imagine ''bzip2'' will`
		407	`perform best on machines with very large`
		408	`caches.`
		409	`!!CAVEATS`
		410
		411
		412	`I/O error messages are not as helpful as they could be.`
		413	`''bzip2'' tries hard to detect I/O errors and exit`
		414	`cleanly, but the details of what the problem is sometimes`
		415	`seem rather misleading.`
		416
		417
		418	`This manual page pertains to version 1.0.2 of ''bzip2.''`
		419	`Compressed data created by this version is entirely forwards`
		420	`and backwards compatible with the previous public releases,`
		421	`versions 0.1pl2, 0.9.0, 0.9.5, 1.0.0 and 1.0.1, but with the`
		422	`following exception: 0.9.0 and above can correctly`
		423	`decompress multiple concatenated compressed files. 0.1pl2`
		424	`cannot do this; it will stop after decompressing just the`
		425	`first file in the stream.`
		426
		427
		428	`''bzip2recover'' versions prior to this one, 1.0.2, used`
		429	`32-bit integers to represent bit positions in compressed`
		430	`files, so it could not handle compressed files more than 512`
		431	`megabytes long. Version 1.0.2 and above uses 64-bit ints on`
		432	`some platforms which support them (GNU supported targets,`
		433	`and Windows). To establish whether or not bzip2recover was`
		434	`built with such a limitation, run it without arguments. In`
		435	`any event you can build yourself an unlimited version if you`
		436	`can recompile it with MaybeUInt64 set to be an unsigned`
		437	`64-bit integer.`
		438	`!!AUTHOR`
		439
		440
		441	`Julian Seward, jseward@acm.org.`
		442
		443
		444	`http://sources.redhat.com/bzip2`
		445
		446
		447	`The ideas embodied in ''bzip2'' are due to (at least) the`
		448	`following people: Michael Burrows and David Wheeler (for the`
		449	`block sorting transformation), David Wheeler (again, for the`
		450	`Huffman coder), Peter Fenwick (for the structured coding`
		451	`model in the original ''bzip,'' and many refinements),`
		452	`and Alistair Moffat, Radford Neal and Ian Witten (for the`
		453	`arithmetic coder in the original ''bzip).'' I am much`
		454	`indebted for their help, support and advice. See the manual`
		455	`in the source distribution for pointers to sources of`
		456	`documentation. Christian von Roques encouraged me to look`
		457	`for faster sorting algorithms, so as to speed up`
		458	`compression. Bela Lubkin encouraged me to improve the`
		459	`worst-case compression performance. The bz* scripts are`
		460	`derived from those of GNU gzip. Many people sent patches,`
		461	`helped with portability problems, lent machines, gave advice`
		462	`and were generally helpful.`
		463	`----`

This page is a man page (or other imported legacy content). We are unable to automatically determine the license status of this page.

Last edited on Monday, June 3, 2002 6:49:59 pm by "perry"

Edit PageHistory Diff Info LikePages