Differences between version 6 and previous revision of LZW.
Other diffs: Previous Major Revision, Previous Author, or view the Annotated Edit History
Newer page: | version 6 | Last edited on Tuesday, September 10, 2002 8:32:33 pm | by CraigBox | Revert |
Older page: | version 5 | Last edited on Tuesday, September 10, 2002 8:24:09 pm | by CraigBox | Revert |
@@ -11,38 +11,59 @@
Then, in 1978, they came up with a better algorithm (dubbed [LZ78|http://www.rasip.fer.hr/research/compress/algorithms/fund/lz/lz78.html], that only required the compressor to output a phrase number and a mismatching character. The phrase extended with the mismatching character would then be added to the dictionary.
LZ77/78 encoding remained the de-facto standard for compression of data (along with some cool stuff like HuffmanCoding) until 1984, where Terry __W__elch published "A Technique for High-Performance Data Compression" in Computer magazine. He realised that with a dictionary that had all the possible characters in it already, you wouldn't have to output the mismatching character like in LZ78; you could output a phrase number that matched it. This algorithm is called __LZW encoding__ (Lempel, Ziv, Welch).
-!How it works
+!
!How it works
The algorithm is really quite simple, and will be described here.
+
+!Compression
* Start with a dictionary that contains all possible characters.
* P refers to the previous phrase (initially empty).
* Read a character, C.
* Is the string P+C in the dictionary? (It is - P is empty so P+C = C)
-* Output the code number for string C to the codestream
+* Output the code number for string C (C.num)
to the codestream
* Set the previous string P = C.
* While there are still character remaining
** Read a character
** Is the string P+C present in the dictionary?
*** If yes, set P to P+C
*** If no,
-#### output the code word
for P
+#### output the code number
for P (P.num)
#### add the string P+C to the dictionary
#### Set P to C
* (we have run out of characters) output the code word for the string P and end.
+You'll note that the first stage can be built into the while loop; I put it up the top to make it absolutely clear what to do.
+
+!Decompression
+
+* Add all the possible characters to the dictionary as in the compressor.
+* Read a code C.num. As this is the first character that was compressed, it is one of the 'root' characters that were added to the dictionary in the first step.
+* Output the character C.
+* Set P (previous string) to C.
+
+* While there are codes remaining in the LZW file
+** Read a code C.num
+** Is there a string C for C.num in the dictionary?
+*** If yes:
+#### Output the string C
+#### Add the string P+the first character of __C__ to the dictionary
+*** If no:
+#### Add the string P+the first character of __P__ to the dictionary
+#### Output the string you just added to the dictionary
+The second case looks a bit strange; it is basically the case where you have a repeating phrase in your input file, so the code number you output is the newest code you add to the dictionary. Therefore if you receive a number you haven't got yet, the next character must be the first character of the previous stream.
-A really good link
to LZW is http://www.rasip.fer.hr/research/compress/algorithms/fund/lz/lzw.html.
+The easiest way
to understand
LZW is by looking at some examples. I won't reproduce them here, but you can find some (and a really good description of the algorithm) at
http://www.rasip.fer.hr/research/compress/algorithms/fund/lz/lzw.html.
!Why we only talk about it in hushed terms
Something else that is important to understand is http://www.unisys.com/about__unisys/lzw/ - Unisys own it. I believe they used to allow free usage of it until they realised "Hey! There's a lot of people using [GIF], so lets assert our ownership of the patent and demand license fees from their asses."
------
Part of CategoryAlgorithm. (I just made it up.)