Scripting‎ > ‎Linux shell‎ > ‎

Speed up files zipping on linux by using multiple processes

Thank you for visiting this page, this page has been update in another link Speed up files ziping on linux by using multiple processes
There are quite a log articles compared different zipping tools on linux, some are good for speed, while some are good for space saving or for particular type of files. But in general, the biggest fact that impact zipping time is CPU resource, not i/o or tool itself, it's true, even with one tool, there are some options to speed up zipping process, but in the mean time, you trade off some others.
What I do is to to let zipping tool to utilize more CPU resources.

Multiple zipping processes

Here is the example, details in  Control and run multiple processes in bash

nice /usr/bin/find /home/backups/archivelogs -not -name "*.bz2" | xargs -n 1 -P 5 bzip2

In the case above, there usually have thousands of small log files need to be zipped everything before archived, by using 5 parallel processes, easily, I got zipping process done 5 times faster.

Multiple zipping thread

What about big single file?
There is a multiple threads zipping tool called pbzip2. It's available in many linux distributions.
Here is example, file is loaded into memory before following test.

Bzip2 and time cost

$time bzip2 M
real    0m49.041s
user    0m48.709s
sys    0m0.268s

bzip2 -d and time cost

$time bzip2 -d M.bz2
real    0m16.593s
user    0m16.055s
sys    0m0.524s

pbzip2 and time cost

$time pbzip2 M
real    0m11.787s
user    1m29.173s
sys    0m1.445s

pbzip2 -d and time cost

$time pbzip2 -d M.bz2
real    0m3.469s
user    0m25.821s
sys    0m1.021s

More examples for pbzip?

Example: pbzip2 -b15vk myfile.tar
Example: pbzip2 -p4 -r -5 myfile.tar second*.txt
Example: tar cf myfile.tar.bz2 --use-compress-prog=pbzip2 dir_to_compress/
Example: pbzip2 -d -m500 myfile.tar.bz2
Example: pbzip2 -dc myfile.tar.bz2 | tar x
Example: pbzip2 -c < myfile.txt > myfile.txt.bz2

Another tools is called pigz, it does even better, but with a little less space saving.

$time pigz M
real    0m3.552s
user    0m26.017s
sys    0m0.403s

$time pigz -d M
real    0m1.224s
user    0m1.890s
sys    0m0.172s

File size compare

origional filesize
-rw-r--r-- 1 test test 363407360 Dec 19 11:24 M
pbzip2 filesize
-rw-r--r-- 1 test test 142629350 Dec 19 11:24 M.bz2
pigz filesize
-rw-r--r-- 1 test test 152936211 Dec 19 11:24 M.gz