Blame: LZW - Waikato Linux Users Group

Annotated edit history of LZW version 12, including all changes. View license author blame.

Rev	Author	#	Line
12	AristotlePagaltzis	1	`An [Acronym] for __L__empel, __Z__iv, __W__elch.`
		2
		3	`----`
		4
9	AristotlePagaltzis	5	`!!! A brief history of LZ Compression`
7	CraigBox	6
1	CraigBox	7	`Back in the 70's, the amount of data that people wanted to store started outstripping the media that people had to store it on.`
		8
9	AristotlePagaltzis	9	`Two computer scientists named A. __L__empel and J. __Z__iv published a paper called "A Universal Algorithm for Sequential Data Compression" in the IEEE Transactions on Information Theory, May 1977. This documented an [Algorithm] that was referred to subsequently as [LZ77 \| http://www.rasip.fer.hr/research/compress/algorithms/fund/lz/lz77.html].`
1	CraigBox	10
7	CraigBox	11	`The basis for the LZ series of algorithms is the building of a dictionary of common phrases in a file. Instead of outputting characters, phrase numbers in this dictionary can be output, achieving compression when you can output the number 1832 (10 bits) instead of the string "foobarbaz" (72 bits).`
1	CraigBox	12
		13	`The decompressor needs nothing more than this sequence of output and a knowledge of how the dictionary is built to be able to built the dictionary itself, and therefore decompress the file.`
		14
7	CraigBox	15	`LZ77 is a window based system: for each string that it compresses, it outputs an index back into to the input (go back N characters), a window size (copy these characters), and then first mismatching character.`
1	CraigBox	16
9	AristotlePagaltzis	17	`Then, in 1978, they came up with a better algorithm (dubbed [LZ78 \| http://www.rasip.fer.hr/research/compress/algorithms/fund/lz/lz78.html], that only required the compressor to output a phrase number and a mismatching character. The phrase extended with the mismatching character would then be added to the dictionary.`
1	CraigBox	18
7	CraigBox	19	LZ77/78 encoding remained the de-facto standard for compression of data (along with some cool stuff like HuffmanCoding, that's how PKZIP works) until 1984, where Terry __W__elch published "A Technique for High-Performance Data Compression" in Computer magazine. He realised that with a dictionary that had all the possible characters in it already, you wouldn't have to output the mismatching character like in LZ78; you could output a phrase number that matched it. This algorithm is called __LZW encoding__ (Lempel, Ziv, Welch).
10	JohnMcPherson	20
		21	`LZW is (or was) covered by patents held by both Unisys and [IBM]; see the [Unisys] page for more on this. (IBM never made any attempt to enforce their "defensive patent".)`
		22
1	CraigBox	23
9	AristotlePagaltzis	24	`!!! How it works`
1	CraigBox	25
		26	`The algorithm is really quite simple, and will be described here.`
6	CraigBox	27
9	AristotlePagaltzis	28	`!! Compression`
2	CraigBox	29
		30	`* Start with a dictionary that contains all possible characters.`
		31	`* P refers to the previous phrase (initially empty).`
		32	`* Read a character, C.`
		33
		34	`* Is the string P+C in the dictionary? (It is - P is empty so P+C = C)`
7	CraigBox	35	`* Output the code number for string C (C.num) to the LZW file`
2	CraigBox	36	`* Set the previous string P = C.`
		37
		38	`* While there are still character remaining`
5	CraigBox	39	`** Read a character`
		40	`** Is the string P+C present in the dictionary?`
2	CraigBox	41
7	CraigBox	42	`*** If yes, set P to P+C and keep going (to make a longer string of matches)`
5	CraigBox	43	`*** If no,`
7	CraigBox	44	`#### output the code number for P (P.num) to the LZW file`
5	CraigBox	45	`#### add the string P+C to the dictionary`
7	CraigBox	46	`#### Set P = C`
2	CraigBox	47
7	CraigBox	48	`* (we have run out of characters) output the code word for the string P (P.num) to the LZW file and end.`
2	CraigBox	49
6	CraigBox	50	`You'll note that the first stage can be built into the while loop; I put it up the top to make it absolutely clear what to do.`
		51
9	AristotlePagaltzis	52	`!! Decompression`
6	CraigBox	53
		54	`* Add all the possible characters to the dictionary as in the compressor.`
7	CraigBox	55	`* Read a code C.num from the LZW file. As this is the first character that was compressed, it is one of the 'root' characters that were added to the dictionary in the first step.`
6	CraigBox	56	`* Output the character C.`
		57	`* Set P (previous string) to C.`
		58
		59	`* While there are codes remaining in the LZW file`
		60	`** Read a code C.num`
7	CraigBox	61	`** Is there a string C matching C.num in the dictionary?`
6	CraigBox	62	`*** If yes:`
7	CraigBox	63	`#### Output the string C to the LZW file`
6	CraigBox	64	`#### Add the string P+the first character of __C__ to the dictionary`
		65	`*** If no:`
		66	`#### Add the string P+the first character of __P__ to the dictionary`
7	CraigBox	67	`#### Output the string you just added to the LZW file`
1	CraigBox	68
6	CraigBox	69	`The second case looks a bit strange; it is basically the case where you have a repeating phrase in your input file, so the code number you output is the newest code you add to the dictionary. Therefore if you receive a number you haven't got yet, the next character must be the first character of the previous stream.`
1	CraigBox	70
6	CraigBox	71	`The easiest way to understand LZW is by looking at some examples. I won't reproduce them here, but you can find some (and a really good description of the algorithm) at http://www.rasip.fer.hr/research/compress/algorithms/fund/lz/lzw.html.`
1	CraigBox	72
9	AristotlePagaltzis	73	`----`
		74	`Part of CategoryAlgorithm`
11	PhilHarper	75
		76	`anyone able to explain how LZMA ,used by 7z, compares? http://7-zip.org/7z.html`

Last edited on Saturday, December 20, 2003 2:43:07 am by AristotlePagaltzis

Edit PageHistory Diff Info LikePages