Portability is Reliability

Anyone familiar with my open source work knows that a major focus of my work is on portability, but it recently occurred to me that a lot of people probably don’t know why.

For example, some of my portability-focused projects include:

  • Hedley — a C/C++ header to let you take advantage of features which may not be available on all platforms (older standards, C vs. C++, different compilers, older compilers, etc.) without creating a hard dependency.
  • Portable Snippets — a collection of loosly-related modules designed to provide relatively portable access to different features. For example, the builtin module contains portable implementations of compiler-specific builtins/intrinsics, such as __builtin_ffs and _BitScanForward.
  • SIMDe — implementations of SIMD APIs for targets which don’t natively support them (e.g., compiling code written for SSE on NEON), and also irons out a lot of minor differences across compilers.
  • Salieri — a wrapper for Microsoft’s Source Annotation Language (SAL) which lets you use SAL without creating a hard dependency on Microsoft’s compiler.
  • TinyCThread — a library I maintain (though didn’t originally create) which implements the C11 threads API portably works well as an abstraciton layer over the POSIX and Windows threads API.
  • I have published scripts for installing the Intel C/C++ Compiler (back when you needed to deal with license keys, before oneAPI made installation trivial and free), NVCC (formerly PGI), TI compilers on CI platforms.

I could keep going, but hopefully you get the point: I have spent, and continue to spend, a lot of time and energy making software portable. That’s something that isn’t always, or even usually, easy to do.

Here’s the punchline: I don’t care deeply about portability. At least not portability across compilers; I do care to varying degrees about portability across a limited set of architectures (currently mostly x86_64, AArch64, WebAssembly, POWER, and to an extent RISC-V and s390x). Sure, I write open source code so people can use it so support for, e.g., MSVC means a wider potential user base for my code, which is great, but it’s not the primary reason I try to support MSVC, and it’s certainly not enough of a reason to put up with MSVC’s crap.

When building code for production purposes, I’m really not going to use anything other than clang and GCC. I don’t use Windows, I hate Visual Studio, and I find it difficult to express my opinion of MSVC in polite company (though finally adding C11 support has greatly improved things). Supporting MSVC is a huge annoyance for me, so why bother?

Portability across compilers is a means to an end, not an end in and of itself. What I really care about is writing reliable software. And, since I mostly write software in C, writing reliable software is a non-trivial task. Actually, “non-trivial” is an understatement: it’s somewhere between extremely difficult and impossible.

Compilers and Static Analyzers

Luckily, tools can help. A lot. Pretty much everyone knows (or at least should know) that cranking up compiler warnings during development is a basic necessity; if you’re not using at least -Wall, you’re doing it wrong. -Wextra (GCC) and -Weverything (clang) are better, though you do end up having to deal with a fair amount of false positives… learn to do that, it’s worth it.

Clang’s diagnostics tend to catch a superset of what GCC’s catch, but GCC also catches some stuff which clang doesn’t; think of a Venn diagram where one circle is bigger and mostly overlaps the smaller circle. It’s a good idea to test both. I strongly suggest you do this in CI to make sure every commit runs through both compilers with your desired warnings enabled, and add -Werror to turn those warnings into errors. You should also add various sanitizers, plus scan-build on Clang and -fanalyzer on GCC.

Fixing all the warnings Clang and GCC emit is a great start, but there are still more bugs to be caught! Just like GCC and Clang support different diagnostics, MSVC does too. /W4 on MSVC is roughly analogous to -Wextra on GCC or -Weverything on Clang, and it can catch a lot of issues that GCC and Clang don’t. If you can run your code through GCC, Clang, and MSVC you’ll be able to find and fix more bugs before they reach your users.

If that’s not enough reason to bother with MSVC, you should know that it also includes a fantastic static analyzer which is analogous to scan-build or -fanalyzer. IMHO it’s easily the best part of their compiler. Sure, that may not be a particularly high bar, but I promise it’s really good… in my experience, it’s better than Clang and GCC’s static analyzers though not as good as something like Coverity.

Sadly, porting to MSVC tends to be a lot more work than porting between GCC and clang. Hedley can help a lot, and Portable Snippets can help too, and there are lots of little abstraction libraries I didn’t write which can be very helpful, but odds are pretty good that you’re going to end up with some #ifdefs no matter what you do.

What about other compilers? Well, they all catch different issues. There is a lot of overlap, and most of the time one of the popular compilers will also catch the same issue, but not always. For example, Oracle Developer Studio and (at least some) TI compilers include code which can check for MISRA C violations.

Unfortunately, in order to take advantage of all this great tooling, your code needs to work on the relevant compiler(s). If your code doesn’t compile on MSVC, good luck getting anything out of their static analyzer. Similarly, if you can’t compile your code on Clang then scan-build isn’t going to work.

It’s not just static analysis, either. Often just compiling and running your code somewhere else can uncover issues which could otherwise lay dormant. For example, in SIMDe almost every function falls back in the worst case on a simple loop. Adding together two vectors of double-precision floating point values might look like this (simplified somewhat for clarity):

for (size_t i = 0 ; i < (sizeof(r.f64) / sizeof(r.f64[0])) ; i++) {
  r.f64[i] = a.f64[i] + b.f64[i];
}

A few days ago, I messed up and did something like this:

for (size_t i = 0 ; i < (sizeof(r.f32) / sizeof(r.f32[0])) ; i++) {
  r.f64[i] = a.f64[i] + b.f64[i];
}

In SIMDe’s (rather extensive) CI tests, several compilers hit this case. They happily compiled the code, and it ran just fine. Then MSVC failed. I reviewed the log, which pointed me to the relevant location in the code and I quickly fixed the issue without the code even hitting the default branch. It happened to work on other compilers in my setup, but there is a good chance that someone calling that function from somewhere else would end up with a crash or (worse) silently incorrect data.

This is by no means a unique example; I regularly write code which works fine locally, and passes most configurations on CI, only for other CI configurations to catch the issue. Usually it’s my mistake, but I also find compiler bugs with alarming regularity.

Architectures

Just like other compilers can help you catch different bugs, other architectures can do the same.

A great example of this is identifying aliasing violations. I’m not going to explain aliasing here; if you’re not already familiar with the issue I remember Understanding Strict Aliasing being informative. What is the Strict Aliasing Rule and Why do we care? and Strict Aliasing Rule in C with Examples also look good, at least based on a very quick skim.

x86_64 is extremely tolerant of aliasing violations (because it’s extremely tolerant of unaligned access). MSVC even more so. That means that there is a lot of code out there which, often unintentionally, relies on aliasing which can easily come back to bite you later. Just because your code works on one compiler for a specific target with certain compiler flags doesn’t mean it will continue to work if you change any of those things.

If you want to get rid of potential aliasing bugs, a great way to do it is to run your code on armv7. The armv7 architecture is relatively picky about misaligned data, so code which works fine on AArch64 or x86 will often crash on armv7 due to aliasing violations. Even if your code will never run on armv7 in practice, including it in your CI setup is very much worthwhile. Drone offers armv7 and aarch64 hardware, and if your code is open source you can use it for free, but even just cross-compiling to armv7 and running your test suite in QEMU can uncover a lot of issues.

Now, you may be telling yourself that you’ll never run your code on armv7, so why bother? Everything seems okay on x86/x86_64 and AArch64, and that’s all you’re interested in, so why go looking for trouble?

Let me tell you a story. This wasn’t a major incident in the grand scheme of things, but it’s a really good example. I consider it a rather formative moment in my own development as a programmer, and hopefully others can learn from it as well.

In 2015 I was doing a lot of work on data compression (for Squash), and I had noticed a crash when using LZ4. After some testing, I realized that the crash only occured on GCC 5 (and not earlier versions), and only at -O3 (or when -ftree-loop-vectorize and -fvect-cost-model, which are included in -O3, were passed). LZ4 was pretty well tested at the time, and I assumed the problem was a bug in GCC. After all, earlier versions worked, and the code was the same, as was the hardware, the OS, and everything else. The only difference in crashing vs. not crashing was the compiler version.

I filed a bug against GCC, and minutes later some GCC developers started looking at it (side note: GCC developers, in my experience, are extremely helpful, responsive, and professional). Turns out the bug was in LZ4, where there was an aliasing violation which triggered a misaligned access, which resulted in a crash.

You can read the bug report if you want; if you don’t really understand why aliasing violations are a problem it’s a pretty straightforward introduction. That’s an important lesson, but to me it really drove home a much more important lesson: just because your code works in one version of a compiler when targeting a specific architecture doesn’t mean it will continue to work in the next version. When you rely on undefined behavior, all bets are off. People like to say that it could format your hard drive, and obviously that’s hyperbole (though technically true) but this is a good, real example of what can actually happen.

One important thing compilers can (and do) assume about undefined behavior is that your code doesn’t rely on it. In other words, it can assume that the undefined behavior is unreachable. In this case, since the alignment requirement (_Alignof(int64_t)) for a 64-bit integer on x86 is 8 bytes, the compiler can assume that you would never attempt to access data which isn’t aligned to an 8-byte boundary by dereferencing a pointer to a 64-bit integer. From the compiler’s perspective, this means it is safe to emit faster code which assumes that your pointer to a 64-bit integer is aligned.

As you probably know, compilers add new optimizations all the time. This is a good thing; your code tends to magically get faster without you having to do any work beyond simply recompiling. In this case, that new optimization “broke” code which was working. Of course, the code was already broken, but since it worked before that was hard to know.

In this case, I would argue that one of the best possible outcomes occured: the code crashed. Yes, crashes are good. With a crash you know something went wrong. That’s a much better outcome than your data gets corrupted silently, and you get an incorrect result without you knowing you got an incorrect result. The only thing better than a crash is a compile-time error.

While updating the compiler is what caught the bug here, switching architectures can also catch many bugs. I found a ton of aliasing violations in SIMDe when I started testing on armv7, and when I fixed them other architectures magically started crashing less often, too, especially at higher optimization levels. The fact that SIMDe runs, and is tested on, armv7 means that the code is more reliable on all architectures, including x86_64, AArch64, POWER, s390x, and others.

Another great trick for catching bugs is a big endian architecture such as s390x (Arm and PPC also support big endian, but little endian is much more common). If you manipulate data using the wrong types, running your test suite on a big endian machine can often make the issue quite apparent as the result will likely be garbage. If you don’t have access to s390x (who does?), QEMU’s s390x implementation is excellent.

WebAssembly, in addition to being an increasingly important target in its own right, tends to be great at catching out-of-bounds access. I’ve had code which triggers a crash in d8 even when AddressSanitizer is completely silent.

Conclusion

There has been a lot of focus lately on replacing C and C++ with safer languages like Go and Rust. I’m not opposed to that; C and C++ are usually not the right choice when starting a new project today. That said, there is a lot of code out there right now written in C/C++, and it’s not going away any time soon. Part of the solution is definitely to transition away from C/C++, but another part is finding ways to improve C/C++. You can call it “putting lipstick on a pig”, “turning lemons into lemonade”, or just “a necessary evil”, but it is necessary.

Linus Torvalds once famously stated that, “given enough eyeballs, all bugs are shallow”. While history hasn’t necessarily been kind to this assertion, I think it’s pretty clear that if we consider compilers, static analyzers, integration testing, and other tools to be (metaphorical) eyeballs it becomes much easier to accept the veracity of this statement; maybe not all bugs are shallow, but a substantial portion become a lot less deep.

Writing reliable software, especially in languages like C and C++, is a very hard problem. You need all the help you can get, especially the kind which can be automated so it just runs quietly in the background until it finds a problem. That kind of help is a lot more reliable, cheaper, and more scalable, than the human kind.

Unfortunately, no one tool is perfect. The best option is defense in depth; using as many tools as you can to catch as many issues as you can as early as possible. If you write a bug, hopefully your compiler will catch it. If your compiler misses it, hopefully another compiler (or an older compiler, or a newer compiler) will catch it. If other compilers miss it, hopefully a static analyzer will catch it. If static analyzers miss it, hopefully a sanitizer will catch it. If the sanitizers miss it, hopefully other hardware will catch it.

Portability is not the answer; I don’t think there is a single answer. Portability is, however, an important tool to help write reliable software which a lot of people overlook because they treat it as an end instead of a means. Sometimes portability is the end goal, but it’s also a means to a more important end: reliability.

Using as many tools as possible means making sure your code works in as many places as possible. In other words, portability is reliability.

Edit: there is some discussion on Twitter about this which might be interesting to some.

Squash 0.5 and the Squash Benchmark

A while back I started working on a project called Squash, and today I’m pleased to announce the first release, version 0.5.

Squash is an abstraction layer for general-purpose data compression (zlib, LZMA, LZ4, etc.).  It is based on dynamically loaded plugins, and there are a lot of them (currently 25 plugins to support 42 different codecs, though 2 plugins are currently disabled pending bug fixes from their respective compression libraries), covering a wide range of compression codecs with vastly different performance characteristics.

The API isn’t final yet (hence version 0.5 instead of 1.0), but I don’t think it will change much.  I’m rolling out a release now in the hope that it encourages people to give it a try, since I don’t want to commit to API stability until a few people have given it a try. There is currently support for C and Vala, but I’m hopeful more languages will be added soon.

So, why should you be interested in Squash?  Well, because it allows you to support a lot of different compression codecs without changing your code, which lets you swap codecs with virtually no effort.  Different algorithm perform very differently with different data and on different platforms, and make different trade-offs between compression speed, decompression speed, compression ratio, memory usage, etc.

One of the coolest things about Squash is that it makes it very easy to benchmark tons of different codecs and configurations with your data, on whatever platform you’re running.  To give you an idea of what settings might be interesting to you I also created the Squash Benchmark, which tests lots of standard datasets with every codec Squash supports (except those which are disabled right now) at every preset level on a bunch of different machines.  Currently that is 28 datasets with 39 codecs in 178 different configurations on 8 different machines (and I’m adding more soon), for a total of 39,872 different data points. This will grow as more machines are added (some are already in progress) and more plugins are added to Squash.

There is a complete list of plugins on the Squash web site, but even with the benchmark there is a pretty decent amount of data to sift through, so here are some of the plugins I think are interesting (in alphabetical order):

bsc
libbsc targets very high compression ratios, achieving ratios similar to ZPAQ at medium levels, but it is much faster than ZPAQ. If you mostly care about compression ratio, libbsc could be a great choice for you.

DENSITY
DENSITY is fast. For text on x86_64 it is much faster than anything else at both compression and decompression. For binary data decompression speed is similar to LZ4, but compression is faster. That said, the compression ratio is relatively low. If you are on x86_64 and mostly care about speed DENSITY could be a great choice, especially if you’re working with text.

LZ4
You have probably heard of LZ4, and for good reason. It has a pretty good compression ratio, fast compression, and very fast decompression. It’s a very strong codec if you mostly care about speed, but still want decent compression.

LZHAM
LZHAM compresses similarly to LZMA, both in terms of ratio and speed, but with faster decompression.

Snappy
Snappy is another codec you’ve probably heard of. Overall, performance is pretty similar to LZ4—it seems to be a bit faster at compressing than LZ4 on ARM, but a bit slower on x86_64. For compressing small pieces of data (like fields.c from the benchmark) nothing really comes close. Decompression speed isn’t as strong, but it’s still pretty good. If you have a write-heavy application, especially on ARM or with small pieces of data, Snappy may be the way to go.

Making CMake more user-friendly

If you’re like me, when you download a project and want to build it the first thing you do is look for a configure script (or maybe ./autogen.sh if you are building from git).  Lots of times I don’t bother reading the INSTALL file, or even the README.  Most of the time this works out well, but sometimes there is no such file. When that happens, more often than not there is a CMakeLists.txt, which means the project uses CMake for its build system.

The realization that that the project uses CMake is, at least for me, quickly followed by a sense of disappointment.  It’s not that I mind that a project is using CMake instead of Autotools; they both suck, as do all the other build systems I’m aware of.  Mostly it’s just that CMake is different and, for someone who just wants to build the project, not in a good way.

First you have to remember what arguments to pass to CMake. For people who haven’t built many projects with CMake before this often involves having to actually RTFM (the horrors!), or a consultation with Google. Of course, the project may or may not have good documentation, and there is much less consistency regarding which flags you need to pass to CMake than with Autotools, so this step can be a bit more cumbersome than one might expect, even for those familiar with CMake.

After you figure out what arguments you need to type, you need to actually type them. CMake has you define variables using -DVAR=VAL for everything, so you end up with things like -DCMAKE_INSTALL_PREFIX=/opt/gnome instead of --prefix=/opt/gnome. Sure, it’s not the worst thing imaginable, but let’s be honest—it’s ugly, and awkward to type.

Enter configure-cmake, a bash script that you drop into your project (as configure) which takes most of the arguments configure scripts typically accept, converts them to CMake’s particular style of insanity, and invokes CMake for you.  For example,

./configure --prefix=/opt/gnome CC=clang CFLAGS="-fno-omit-frame-pointer -fsanitize=address"

Will be converted to

cmake . -DCMAKE_BUILD_TYPE=Debug -DCMAKE_INSTALL_PREFIX=/opt/gnome -DCMAKE_INSTALL_LIBDIR=/opt/gnome/lib -DCMAKE_C_COMPILER=clang -DCMAKE_C_FLAGS="-fno-omit-frame-pointer -fsanitize=address"

Note that it assumes you’re including the GNUInstallDirs module (which ships with CMake, and you should probably be using).  Other than that, the only thing which may be somewhat contentious is that it adds -DCMAKE_BUILD_TYPE=Debug—Autotools usually  builds with debugging symbols enabled and lets the package manager take care of stripping them, but CMake doesn’t.  Unfortunately some projects use the build type to determine other things (like defining NDEBUG), so you can get configure-cmake to pass “Release” for the build type by passing it <code>–disable-debug</code>, one of two arguments that don’t mirror something from Autotools.

Sometimes you’ll want to be able to pass non-standard argument to CMake, which is where the other argument that doesn’t mirror something from Autotools comes in; --pass-thru (--pass-through, --passthru, and --passthrough also work), which just tells configure-cmake to pass all subsequent arguments to CMake untouched.  For example:

./configure --prefix=/opt/gnome --pass-thru -DENABLE_AWESOMENESS=yes

Of course none of this replaces anything CMake is doing, so people who want to keep calling cmake directly can.

So, if you maintain a CMake project, please consider dropping the configure script from configure-cmake into your project.  Or write your own, or hack what I’ve done into pieces and use that, or really anything other than asking people to type those horrible CMake invocations manually.

Announcing Planet Vala

I am pleased to announce a blog aggregator for the Vala community, which will be hosted at http://planet.vala-project.org/.

The Venus configuration information is in a the nemequ/planet-vala repository on GitHub, so if you know of anyone who blogs about Vala please file an issue or pull request to add the blog.  Like most Planets that I’m aware of you don’t need to blog exclusively about the subject of the planet (i.e., Vala).

The content is currently quite limited since I don’t know of very many Vala bloggers, but hopefully this will encourage more people to blog about Vala more often.  At the very least I know that I am more motivated to write a couple posts I’ve been thinking about for a while.