Penguin

Differences between version 6 and previous revision of LZW.

Other diffs: Previous Major Revision, Previous Author, or view the Annotated Edit History

Newer page: version 6 Last edited on Tuesday, September 10, 2002 8:32:33 pm by CraigBox Revert
Older page: version 5 Last edited on Tuesday, September 10, 2002 8:24:09 pm by CraigBox Revert
@@ -11,38 +11,59 @@
 Then, in 1978, they came up with a better algorithm (dubbed [LZ78|http://www.rasip.fer.hr/research/compress/algorithms/fund/lz/lz78.html], that only required the compressor to output a phrase number and a mismatching character. The phrase extended with the mismatching character would then be added to the dictionary. 
  
 LZ77/78 encoding remained the de-facto standard for compression of data (along with some cool stuff like HuffmanCoding) until 1984, where Terry __W__elch published "A Technique for High-Performance Data Compression" in Computer magazine. He realised that with a dictionary that had all the possible characters in it already, you wouldn't have to output the mismatching character like in LZ78; you could output a phrase number that matched it. This algorithm is called __LZW encoding__ (Lempel, Ziv, Welch). 
  
-!How it works 
+! !How it works 
  
 The algorithm is really quite simple, and will be described here. 
+  
+!Compression  
  
 * Start with a dictionary that contains all possible characters. 
 * P refers to the previous phrase (initially empty). 
 * Read a character, C. 
  
 * Is the string P+C in the dictionary? (It is - P is empty so P+C = C) 
-* Output the code number for string C to the codestream 
+* Output the code number for string C (C.num) to the codestream 
 * Set the previous string P = C. 
  
 * While there are still character remaining 
 ** Read a character 
 ** Is the string P+C present in the dictionary? 
  
 *** If yes, set P to P+C 
 *** If no, 
-#### output the code word for P 
+#### output the code number for P (P.num)  
 #### add the string P+C to the dictionary 
 #### Set P to C 
  
 * (we have run out of characters) output the code word for the string P and end. 
  
+You'll note that the first stage can be built into the while loop; I put it up the top to make it absolutely clear what to do.  
+  
+!Decompression  
+  
+* Add all the possible characters to the dictionary as in the compressor.  
+* Read a code C.num. As this is the first character that was compressed, it is one of the 'root' characters that were added to the dictionary in the first step.  
+* Output the character C.  
+* Set P (previous string) to C.  
+  
+* While there are codes remaining in the LZW file  
+** Read a code C.num  
+** Is there a string C for C.num in the dictionary?  
+*** If yes:  
+#### Output the string C  
+#### Add the string P+the first character of __C__ to the dictionary  
+*** If no:  
+#### Add the string P+the first character of __P__ to the dictionary  
+#### Output the string you just added to the dictionary  
  
+The second case looks a bit strange; it is basically the case where you have a repeating phrase in your input file, so the code number you output is the newest code you add to the dictionary. Therefore if you receive a number you haven't got yet, the next character must be the first character of the previous stream.  
  
-A really good link to LZW is http://www.rasip.fer.hr/research/compress/algorithms/fund/lz/lzw.html. 
+The easiest way to understand LZW is by looking at some examples. I won't reproduce them here, but you can find some (and a really good description of the algorithm) at http://www.rasip.fer.hr/research/compress/algorithms/fund/lz/lzw.html. 
  
 !Why we only talk about it in hushed terms 
  
 Something else that is important to understand is http://www.unisys.com/about__unisys/lzw/ - Unisys own it. I believe they used to allow free usage of it until they realised "Hey! There's a lot of people using [GIF], so lets assert our ownership of the patent and demand license fees from their asses." 
  
 ------ 
 Part of CategoryAlgorithm. (I just made it up.)