Squash 0.5 and the Squash Benchmark

A while back I started working on a project called Squash, and today I’m pleased to announce the first release, version 0.5.

Squash is an abstraction layer for general-purpose data compression (zlib, LZMA, LZ4, etc.).  It is based on dynamically loaded plugins, and there are a lot of them (currently 25 plugins to support 42 different codecs, though 2 plugins are currently disabled pending bug fixes from their respective compression libraries), covering a wide range of compression codecs with vastly different performance characteristics.

The API isn’t final yet (hence version 0.5 instead of 1.0), but I don’t think it will change much.  I’m rolling out a release now in the hope that it encourages people to give it a try, since I don’t want to commit to API stability until a few people have given it a try. There is currently support for C and Vala, but I’m hopeful more languages will be added soon.

So, why should you be interested in Squash?  Well, because it allows you to support a lot of different compression codecs without changing your code, which lets you swap codecs with virtually no effort.  Different algorithm perform very differently with different data and on different platforms, and make different trade-offs between compression speed, decompression speed, compression ratio, memory usage, etc.

One of the coolest things about Squash is that it makes it very easy to benchmark tons of different codecs and configurations with your data, on whatever platform you’re running.  To give you an idea of what settings might be interesting to you I also created the Squash Benchmark, which tests lots of standard datasets with every codec Squash supports (except those which are disabled right now) at every preset level on a bunch of different machines.  Currently that is 28 datasets with 39 codecs in 178 different configurations on 8 different machines (and I’m adding more soon), for a total of 39,872 different data points. This will grow as more machines are added (some are already in progress) and more plugins are added to Squash.

There is a complete list of plugins on the Squash web site, but even with the benchmark there is a pretty decent amount of data to sift through, so here are some of the plugins I think are interesting (in alphabetical order):

bsc
libbsc targets very high compression ratios, achieving ratios similar to ZPAQ at medium levels, but it is much faster than ZPAQ. If you mostly care about compression ratio, libbsc could be a great choice for you.

DENSITY
DENSITY is fast. For text on x86_64 it is much faster than anything else at both compression and decompression. For binary data decompression speed is similar to LZ4, but compression is faster. That said, the compression ratio is relatively low. If you are on x86_64 and mostly care about speed DENSITY could be a great choice, especially if you’re working with text.

LZ4
You have probably heard of LZ4, and for good reason. It has a pretty good compression ratio, fast compression, and very fast decompression. It’s a very strong codec if you mostly care about speed, but still want decent compression.

LZHAM
LZHAM compresses similarly to LZMA, both in terms of ratio and speed, but with faster decompression.

Snappy
Snappy is another codec you’ve probably heard of. Overall, performance is pretty similar to LZ4—it seems to be a bit faster at compressing than LZ4 on ARM, but a bit slower on x86_64. For compressing small pieces of data (like fields.c from the benchmark) nothing really comes close. Decompression speed isn’t as strong, but it’s still pretty good. If you have a write-heavy application, especially on ARM or with small pieces of data, Snappy may be the way to go.