Diff: LZW - Waikato Linux Users Group

Differences between version 6 and previous revision of LZW.

Other diffs: Previous Major Revision, Previous Author, or view the Annotated Edit History

Newer page:	version 6	Last edited on Tuesday, September 10, 2002 8:32:33 pm	by CraigBox	Revert
Older page:	version 5	Last edited on Tuesday, September 10, 2002 8:24:09 pm	by CraigBox	Revert

@@ -11,38 +11,59 @@

Then, in 1978, they came up with a better algorithm (dubbed [LZ78|http://www.rasip.fer.hr/research/compress/algorithms/fund/lz/lz78.html], that only required the compressor to output a phrase number and a mismatching character. The phrase extended with the mismatching character would then be added to the dictionary.

LZ77/78 encoding remained the de-facto standard for compression of data (along with some cool stuff like HuffmanCoding) until 1984, where Terry __W__elch published "A Technique for High-Performance Data Compression" in Computer magazine. He realised that with a dictionary that had all the possible characters in it already, you wouldn't have to output the mismatching character like in LZ78; you could output a phrase number that matched it. This algorithm is called __LZW encoding__ (Lempel, Ziv, Welch).

-!How it works

+! !How it works

The algorithm is really quite simple, and will be described here.

+

+!Compression

* Start with a dictionary that contains all possible characters.

* P refers to the previous phrase (initially empty).

* Read a character, C.

* Is the string P+C in the dictionary? (It is - P is empty so P+C = C)

-* Output the code number for string C to the codestream

+* Output the code number for string C (C.num) to the codestream

* Set the previous string P = C.

* While there are still character remaining

** Read a character

** Is the string P+C present in the dictionary?

*** If yes, set P to P+C

*** If no,

-#### output the code ~~word~~ for P

+#### output the code number for P (P.num)

#### add the string P+C to the dictionary

#### Set P to C

* (we have run out of characters) output the code word for the string P and end.

+You'll note that the first stage can be built into the while loop; I put it up the top to make it absolutely clear what to do.

+

+!Decompression

+

+* Add all the possible characters to the dictionary as in the compressor.

+* Read a code C.num. As this is the first character that was compressed, it is one of the 'root' characters that were added to the dictionary in the first step.

+* Output the character C.

+* Set P (previous string) to C.

+

+* While there are codes remaining in the LZW file

+** Read a code C.num

+** Is there a string C for C.num in the dictionary?

+*** If yes:

+#### Output the string C

+#### Add the string P+the first character of __C__ to the dictionary

+*** If no:

+#### Add the string P+the first character of __P__ to the dictionary

+#### Output the string you just added to the dictionary

+The second case looks a bit strange; it is basically the case where you have a repeating phrase in your input file, so the code number you output is the newest code you add to the dictionary. Therefore if you receive a number you haven't got yet, the next character must be the first character of the previous stream.

-~~A really good link~~ to LZW is http://www.rasip.fer.hr/research/compress/algorithms/fund/lz/lzw.html.

+The easiest way to understand LZW is by looking at some examples. I won't reproduce them here, but you can find some (and a really good description of the algorithm) at http://www.rasip.fer.hr/research/compress/algorithms/fund/lz/lzw.html.

!Why we only talk about it in hushed terms

Something else that is important to understand is http://www.unisys.com/about__unisys/lzw/ - Unisys own it. I believe they used to allow free usage of it until they realised "Hey! There's a lot of people using [GIF], so lets assert our ownership of the patent and demand license fees from their asses."

------

Part of CategoryAlgorithm. (I just made it up.)