Linux‎ > ‎

Compression tools on linux -- gzip vs bzip2 vs lzma vs compress

Thank you for visiting this page, this page has been update in another link Compression tools on Linux, gzip vs bzip2 vs lzma vs compress

There are numerous compression tools available on linux, more for other platforms, you can find lots of comprehesive compreson articles too. I'm not going to do the same thing again. Rather, I'd like to only compare 4 of them.
Because behind these compression tools, there are actually several libraries. Another word, most of them have same kernel. In this article, I'll have a simple introduction about 4 tools and their libraries, followed by a test metric.


Compress is a Unix compression utility based on the LZC compression method, which is an LZW implementation using variable size pointers as in LZ78.

The uncompress utility will restore files to their original state after they have been compressed using the compress utility. If no files are specified, the standard input will be uncompressed to the standard output.

More detail about LZW, see the following links


gzip (GNU zip) is a compression utility designed to be a replacement for compress, uses deflate compression algorithms. Its main advantages over compress are much better compression and freedom from patented algorithms. It has been adopted by the GNU project and is now relatively popular on the Internet. gzip was written by Jean-loup Gailly (, and Mark Adler for the decompression code.
More detail in
Along with zip, gzip uses zlib library


bzip2 is a freely available, patent free, high-quality data compressor, uses Burrows–Wheeler algorithm. It typically compresses files to within 10% to 15% of the best available techniques (the PPM family of statistical compressors), whilst being around twice as fast at compression and six times faster at decompression. bzip2's command line flags are similar to those of GNU Gzip, so if you know how to use gzip, you know how to use bzip2

More detail in


The Lempel–Ziv–Markov chain algorithm (LZMA) is an algorithm used to perform lossless data compression. It has been under development since 1998. The SDK history file states that it was in development from 1996, and first used in 7-Zip 2001-08-30. LZMA uses a dictionary compression algorithm (a variant of LZ77 with huge dictionary sizes and special support for repeatedly used match distances), whose output is then encoded with a range encoder, using a complex model to make a probability prediction of each bit.
LZMA features:
    Compression speed: 2 MB/s on 2 GHz dual-core CPU.
    Decompression speed:
        20-30 MB/s on 2 GHz Intel Core2 or AMD Athlon 64.
        1-2 MB/s on 200 MHz ARM, MIPS, PowerPC or other simple RISC CPU.
    Small memory requirements for decompression: 8-32 KB + DictionarySize
    Small code size for decompression: 2-8 KB (depending on speed optimizations)
More details in following link


Chosing sample file is also a trick task, the principle is that I don't want to use some files which already zipped, for example, image,audio,video files. Even .doc or pdf can not be simply used as test sample file because they may have image in the document. Generally speaking, binary and text file should be good and fair for all compression algorithms. In the test below, I chose data file(like text file, but without space,tab and newline etc..), which generated log file by Postgresql. I made a large file out of many of small archive files for this test, 5GiB.
Test bed:
The machine has 2 sockets, 8 Intel(R) Core(TM) i7 CPUs         870  @ 2.93GHz
16GB memory, 2 sets of mirrored local SATA disks.
Linux kernel 2.6.32-358.18.1.el6.x86_64
gzip version  1.3.12
bzip2 version 1.0.5
lzma version xz (XZ Utils) 4.999.9beta
compress version  compress 4.2.4
Even though there is no disk bottle nect, I load file into memory before each test, so that there is no io concern at all.

Each tool utility has compression levels, I test level(fast) 1, 6(default) and 9(best). The ratio is the percentage reduction for each file compressed or decompressed(compatible with gzip -v)
 decompress default 6 decompress best 9 decompress
  ratio     time
 ratio time time
 ratio     time time
 gzip 63.40% 1m33.634s 1m29.541s 67.10% 3m31.250s 1m1.039s 67.70% 13m2.217s 1m30.315s
 bzip2 68.10% 9m3.227s 3m23.074s 70.49% 8m45.153s 3m23.402s 70.86% 8m51.783s 3m33.458s
 lzma 72.26% 6m35.570s 1m50.126s 78.68% 48m36.579s 1m46.419s 79.51% 63m38.849s 1m54.913s

 47.65%  2m13.842s  1m28.769s   

As you can see who's the best, you trade off either time cost or space, but generaly speaking, gzip is better than compress on both time and space saving, while lzma does the best on space saving, but it takes horribly long, if you have use case need to save space or compress once then uncompress a lot, it is the best option for uncompress time is incredibly fast.
So, hard to say which one is the best, it's up to your application needs.

Note1: There are also some other types compression algorithm, I just did most popular ones.

Note2: one of other way doing compression is to split compression process into two steps,  like rzip, which is huge-scale data compression software designed around initial LZ77-style string matching on a 900 MB dictionary window, followed by bzip2-based Burrows–Wheeler transform (BWT) and entropy coding (Huffman) on 900 kB output chunks. lrzip is its improved version, but doesn't compatible with rzip's.

Note3: Lots of other compression tools support not only one format, like 7za, it supports 7z, ZIP, CAB, ARJ, GZIP, BZIP2, TAR, CPIO, RPM and DEB formats
Note4: Some tools support multiple threads compression, like lbzip2, pigz and pbzip2 etc.. I'll have another article for this