Linux‎ > ‎

parallel compression utilities on linux -- lbzip2, pbzip2 and pigz

Thank you for visiting this page, this page has been update in another link Parallel compression utilities on Linux


In the article of Compression tools on linux, I compared 4 different compression utilities which using different compression algorithms. Here I'll compare 3 popular parallel compression utilities. lbzip2, pbzip2 and pigz. lbzip2 and pbzip2 uses the same compression algorithm as bzip's, while pigz uese zlib which gzip uses.
Here are home pages of three compression utilities.
http://lbzip2.org/
http://compression.ca/pbzip2/
http://zlib.net/pigz/

Test condition

The test bed was on the machine that has
2 sockets, 8 Intel(R) Core(TM) i7 CPUs         870  @ 2.93GHz
16GB memory, 2 sets of mirrored local SATA disks.
Linux kernel 2.6.32-358.18.1.el6.x86_64
lbzip2-2.2-1.el6.x86_64
pbzip2-1.1.6-1.el6.x86_64
pigz-2.2.5-1.el6.x86_64

Even though there is no disk bottle nect, I load file into memory before each test, so that there is no io concern at all.
Each tool utility has compression levels, I test level(fast) 1, 6(default) and 9(best). The ratio is the percentage reduction for each file compressed or decompressed(compatible with gzip -v)

Performance test

 fast
1
 decompress default 6 decompress best 9 decompress
  ratio     time
 time 
 ratio time time
 ratio     time time
lbzip2 68.08% 1m20.834s 0m59.871s 70.50%1m18.459s 1m1.109s 70.87%1m27.518s1m0.194s
pbzip2 68.05% 2m1.905s0m59.859s 70.24%1m59.790s1m2.317s 70.82%2m10.272s1m1.615s
pigz 63.48%1m59.500s1m0.215s 67.11%0m50.527s1m0.092s 67.70%3m0.009s
1m0.482s

Realiability test

The second is to compare their stability. So, I used the following simple script.
The purpose is that for each compression utility, I run 100 times compress and uncompress. with mdsum and filesize check.
  echo "$tool compress start"
  for i in `seq 1 100`
  {
         echo "test No. $i"
         cp -f ${testfile}.orig $testfile
         rm -f ${testfile}.bz2
       ./to_mem $testfile
      time -p $tool -v -9 $testfile
      ls -l ${testfile}.bz2 >>${tool}.chk
      md5sum -b ${testfile}.bz2 >>${tool}.chk
      time -p $tool -v -d ${testfile}.bz2
         md5sum -b $testfile >>${tool}.chk
      sleep 3
  }
  echo "$tool tests end"

test results are from 100 round tests
  ratio     compress time uncompress time
 lbzip2 70.87%83.6321
 72.5878
 pbzip2 70.82%  130.576 69.3038
 pigz 67.70% 181.315 71.1547

Compatibility test

lbzip2 and pbzip2 use the same compression algorithm, so I also cross tested format compatibilities of bzip2, pbzip2 and lbzip2, they are all good, all passed md5sum file check.
As for pigz, there is one interesting thing, every time it yields out different compressed file, same file size, but md5sum, it does can recover the compressed file back to origional file with correct md5sum. I tested it with small files, I don't see this type of 'issue'.
As for compability with gzip, I don't see any issue, all good.

Note: I don't have any parallel compression utility which use LZMA algothrim, xz looks is heading to this.


Comments