This is a patched version of zlib modified to use Pentium-opti-
This  is  a patched version of zlib modified to use Pentium-opti-
mized  assembly  code  in  the  deflation  algorithm.  The  files
changed/added by this patch are:

README.586 match.S

The  effectiveness  of  these modifications is a bit marginal, as
the the program's bottleneck seems to  be  mostly  L1-cache  con-
tention,  for  which  there is no real way to work around without
rewriting the basic algorithm. The speedup on average  is  around
5-10%  (which  is  generally less than the amount of variance be-
tween subsequent executions).  However, when used at level 9 com-
pression,  the  cache contention can drop enough for the assembly
version to achieve 10-20% speedup (and sometimes more,  depending
on  the  amount  of  overall redundancy in the files). Even here,
though, cache contention can still be the  limiting  factor,  de-
pending  on  the  nature  of  the program using the zlib library.
This may also mean that better improvements will  be  seen  on  a
Pentium  with  MMX,  which  suffers  much less from L1-cache con-
tention, but I have not yet verified this.

Note that this code has been tailored for the Pentium in particu-
lar, and will not perform well on the Pentium Pro (due to the use
of a partial register in the inner loop).

If you are using an assembler other than GNU as, you will have to
translate match.S to use your assembler's syntax. (Have fun.)

Brian Raiter breadbox@muppetlabs.com April, 1998


Added for zlib 1.1.3:

The  patches  come from http://www.muppetlabs.com/~breadbox/soft-
ware/assembly.html

To compile zlib with this asm file, copy match.S to the zlib  di-
rectory then do:

CFLAGS="-O3 -DASMV" ./configure make OBJA=match.o