Diff: LZW - Waikato Linux Users Group

Differences between version 9 and previous revision of LZW.

Other diffs: Previous Major Revision, Previous Author, or view the Annotated Edit History

Newer page:	version 9	Last edited on Thursday, December 18, 2003 11:35:21 pm	by AristotlePagaltzis	Revert
Older page:	version 8	Last edited on Tuesday, September 10, 2002 8:37:15 pm	by CraigBox	Revert

@@ -1,25 +1,25 @@

-!!A ~~History~~ of LZ Compression

+! !! A brief history of LZ Compression

Back in the 70's, the amount of data that people wanted to store started outstripping the media that people had to store it on.

-Two computer scientists named A. __L__empel and J. __Z__iv published a paper called "A Universal Algorithm for Sequential Data Compression" in the IEEE Transactions on Information Theory, May 1977. This documented an [Algorithm] that was referred to subsequently as [LZ77|http://www.rasip.fer.hr/research/compress/algorithms/fund/lz/lz77.html].

+Two computer scientists named A. __L__empel and J. __Z__iv published a paper called "A Universal Algorithm for Sequential Data Compression" in the IEEE Transactions on Information Theory, May 1977. This documented an [Algorithm] that was referred to subsequently as [LZ77 | http://www.rasip.fer.hr/research/compress/algorithms/fund/lz/lz77.html].

The basis for the LZ series of algorithms is the building of a dictionary of common phrases in a file. Instead of outputting characters, phrase numbers in this dictionary can be output, achieving compression when you can output the number 1832 (10 bits) instead of the string "foobarbaz" (72 bits).

The decompressor needs nothing more than this sequence of output and a knowledge of how the dictionary is built to be able to built the dictionary itself, and therefore decompress the file.

LZ77 is a window based system: for each string that it compresses, it outputs an index back into to the input (go back N characters), a window size (copy these characters), and then first mismatching character.

-Then, in 1978, they came up with a better algorithm (dubbed [LZ78|http://www.rasip.fer.hr/research/compress/algorithms/fund/lz/lz78.html], that only required the compressor to output a phrase number and a mismatching character. The phrase extended with the mismatching character would then be added to the dictionary.

+Then, in 1978, they came up with a better algorithm (dubbed [LZ78 | http://www.rasip.fer.hr/research/compress/algorithms/fund/lz/lz78.html], that only required the compressor to output a phrase number and a mismatching character. The phrase extended with the mismatching character would then be added to the dictionary.

LZ77/78 encoding remained the de-facto standard for compression of data (along with some cool stuff like HuffmanCoding, that's how PKZIP works) until 1984, where Terry __W__elch published "A Technique for High-Performance Data Compression" in Computer magazine. He realised that with a dictionary that had all the possible characters in it already, you wouldn't have to output the mismatching character like in LZ78; you could output a phrase number that matched it. This algorithm is called __LZW encoding__ (Lempel, Ziv, Welch).

-!!How it works

+! !! How it works

The algorithm is really quite simple, and will be described here.

-!Compression

+! ! Compression

* Start with a dictionary that contains all possible characters.

* P refers to the previous phrase (initially empty).

* Read a character, C.

@@ -41,9 +41,9 @@

* (we have run out of characters) output the code word for the string P (P.num) to the LZW file and end.

You'll note that the first stage can be built into the while loop; I put it up the top to make it absolutely clear what to do.

-!Decompression

+! ! Decompression

* Add all the possible characters to the dictionary as in the compressor.

* Read a code C.num from the LZW file. As this is the first character that was compressed, it is one of the 'root' characters that were added to the dictionary in the first step.

* Output the character C.

@@ -62,10 +62,6 @@

The second case looks a bit strange; it is basically the case where you have a repeating phrase in your input file, so the code number you output is the newest code you add to the dictionary. Therefore if you receive a number you haven't got yet, the next character must be the first character of the previous stream.

The easiest way to understand LZW is by looking at some examples. I won't reproduce them here, but you can find some (and a really good description of the algorithm) at http://www.rasip.fer.hr/research/compress/algorithms/fund/lz/lzw.html.

-~~!!Why we only talk about it in hushed terms~~

-

-Something else that is important to understand is http://www.unisys.com/about__unisys/lzw/ - Unisys own it. I believe they used to allow free usage of it until they realised "Hey! There's a lot of people using [GIF], so lets assert our ownership of the patent and demand license fees from their asses."

-

--- ----

-Part of CategoryAlgorithm~~. (I just made it up.)~~

+----

+Part of CategoryAlgorithm