Anyone familiar with my open source work knows that a major focus of my work is on portability, but it recently occurred to me that a lot of people probably don’t know why.
For example, some of my portability-focused projects include:
- Hedley — a C/C++ header to let you take advantage of features which may not be available on all platforms (older standards, C vs. C++, different compilers, older compilers, etc.) without creating a hard dependency.
- Portable Snippets — a collection of loosly-related modules designed to provide relatively portable access to different features. For example, the builtin module contains portable implementations of compiler-specific builtins/intrinsics, such as
__builtin_ffs
and_BitScanForward
. - SIMDe — implementations of SIMD APIs for targets which don’t natively support them (e.g., compiling code written for SSE on NEON), and also irons out a lot of minor differences across compilers.
- Salieri — a wrapper for Microsoft’s Source Annotation Language (SAL) which lets you use SAL without creating a hard dependency on Microsoft’s compiler.
- TinyCThread — a library I maintain (though didn’t originally create) which implements the C11 threads API portably works well as an abstraciton layer over the POSIX and Windows threads API.
- I have published scripts for installing the Intel C/C++ Compiler (back when you needed to deal with license keys, before oneAPI made installation trivial and free), NVCC (formerly PGI), TI compilers on CI platforms.
I could keep going, but hopefully you get the point: I have spent, and continue to spend, a lot of time and energy making software portable. That’s something that isn’t always, or even usually, easy to do.
Here’s the punchline: I don’t care deeply about portability. At least not portability across compilers; I do care to varying degrees about portability across a limited set of architectures (currently mostly x86_64, AArch64, WebAssembly, POWER, and to an extent RISC-V and s390x). Sure, I write open source code so people can use it so support for, e.g., MSVC means a wider potential user base for my code, which is great, but it’s not the primary reason I try to support MSVC, and it’s certainly not enough of a reason to put up with MSVC’s crap.
When building code for production purposes, I’m really not going to use anything other than clang and GCC. I don’t use Windows, I hate Visual Studio, and I find it difficult to express my opinion of MSVC in polite company (though finally adding C11 support has greatly improved things). Supporting MSVC is a huge annoyance for me, so why bother?
Portability across compilers is a means to an end, not an end in and of itself. What I really care about is writing reliable software. And, since I mostly write software in C, writing reliable software is a non-trivial task. Actually, “non-trivial” is an understatement: it’s somewhere between extremely difficult and impossible.
Compilers and Static Analyzers
Luckily, tools can help. A lot. Pretty much everyone knows (or at least should know) that cranking up compiler warnings during development is a basic necessity; if you’re not using at least -Wall
, you’re doing it wrong. -Wextra
(GCC) and -Weverything
(clang) are better, though you do end up having to deal with a fair amount of false positives… learn to do that, it’s worth it.
Clang’s diagnostics tend to catch a superset of what GCC’s catch, but GCC also catches some stuff which clang doesn’t; think of a Venn diagram where one circle is bigger and mostly overlaps the smaller circle. It’s a good idea to test both. I strongly suggest you do this in CI to make sure every commit runs through both compilers with your desired warnings enabled, and add -Werror
to turn those warnings into errors. You should also add various sanitizers, plus scan-build on Clang and -fanalyzer on GCC.
Fixing all the warnings Clang and GCC emit is a great start, but there are still more bugs to be caught! Just like GCC and Clang support different diagnostics, MSVC does too. /W4
on MSVC is roughly analogous to -Wextra
on GCC or -Weverything
on Clang, and it can catch a lot of issues that GCC and Clang don’t. If you can run your code through GCC, Clang, and MSVC you’ll be able to find and fix more bugs before they reach your users.
If that’s not enough reason to bother with MSVC, you should know that it also includes a fantastic static analyzer which is analogous to scan-build or -fanalyzer
. IMHO it’s easily the best part of their compiler. Sure, that may not be a particularly high bar, but I promise it’s really good… in my experience, it’s better than Clang and GCC’s static analyzers though not as good as something like Coverity.
Sadly, porting to MSVC tends to be a lot more work than porting between GCC and clang. Hedley can help a lot, and Portable Snippets can help too, and there are lots of little abstraction libraries I didn’t write which can be very helpful, but odds are pretty good that you’re going to end up with some #ifdef
s no matter what you do.
What about other compilers? Well, they all catch different issues. There is a lot of overlap, and most of the time one of the popular compilers will also catch the same issue, but not always. For example, Oracle Developer Studio and (at least some) TI compilers include code which can check for MISRA C violations.
Unfortunately, in order to take advantage of all this great tooling, your code needs to work on the relevant compiler(s). If your code doesn’t compile on MSVC, good luck getting anything out of their static analyzer. Similarly, if you can’t compile your code on Clang then scan-build isn’t going to work.
It’s not just static analysis, either. Often just compiling and running your code somewhere else can uncover issues which could otherwise lay dormant. For example, in SIMDe almost every function falls back in the worst case on a simple loop. Adding together two vectors of double-precision floating point values might look like this (simplified somewhat for clarity):
for (size_t i = 0 ; i < (sizeof(r.f64) / sizeof(r.f64[0])) ; i++) {
r.f64[i] = a.f64[i] + b.f64[i];
}
A few days ago, I messed up and did something like this:
for (size_t i = 0 ; i < (sizeof(r.f32) / sizeof(r.f32[0])) ; i++) {
r.f64[i] = a.f64[i] + b.f64[i];
}
In SIMDe’s (rather extensive) CI tests, several compilers hit this case. They happily compiled the code, and it ran just fine. Then MSVC failed. I reviewed the log, which pointed me to the relevant location in the code and I quickly fixed the issue without the code even hitting the default branch. It happened to work on other compilers in my setup, but there is a good chance that someone calling that function from somewhere else would end up with a crash or (worse) silently incorrect data.
This is by no means a unique example; I regularly write code which works fine locally, and passes most configurations on CI, only for other CI configurations to catch the issue. Usually it’s my mistake, but I also find compiler bugs with alarming regularity.
Architectures
Just like other compilers can help you catch different bugs, other architectures can do the same.
A great example of this is identifying aliasing violations. I’m not going to explain aliasing here; if you’re not already familiar with the issue I remember Understanding Strict Aliasing being informative. What is the Strict Aliasing Rule and Why do we care? and Strict Aliasing Rule in C with Examples also look good, at least based on a very quick skim.
x86_64 is extremely tolerant of aliasing violations (because it’s extremely tolerant of unaligned access). MSVC even more so. That means that there is a lot of code out there which, often unintentionally, relies on aliasing which can easily come back to bite you later. Just because your code works on one compiler for a specific target with certain compiler flags doesn’t mean it will continue to work if you change any of those things.
If you want to get rid of potential aliasing bugs, a great way to do it is to run your code on armv7. The armv7 architecture is relatively picky about misaligned data, so code which works fine on AArch64 or x86 will often crash on armv7 due to aliasing violations. Even if your code will never run on armv7 in practice, including it in your CI setup is very much worthwhile. Drone offers armv7 and aarch64 hardware, and if your code is open source you can use it for free, but even just cross-compiling to armv7 and running your test suite in QEMU can uncover a lot of issues.
Now, you may be telling yourself that you’ll never run your code on armv7, so why bother? Everything seems okay on x86/x86_64 and AArch64, and that’s all you’re interested in, so why go looking for trouble?
Let me tell you a story. This wasn’t a major incident in the grand scheme of things, but it’s a really good example. I consider it a rather formative moment in my own development as a programmer, and hopefully others can learn from it as well.
In 2015 I was doing a lot of work on data compression (for Squash), and I had noticed a crash when using LZ4. After some testing, I realized that the crash only occured on GCC 5 (and not earlier versions), and only at -O3
(or when -ftree-loop-vectorize
and -fvect-cost-model
, which are included in -O3
, were passed). LZ4 was pretty well tested at the time, and I assumed the problem was a bug in GCC. After all, earlier versions worked, and the code was the same, as was the hardware, the OS, and everything else. The only difference in crashing vs. not crashing was the compiler version.
I filed a bug against GCC, and minutes later some GCC developers started looking at it (side note: GCC developers, in my experience, are extremely helpful, responsive, and professional). Turns out the bug was in LZ4, where there was an aliasing violation which triggered a misaligned access, which resulted in a crash.
You can read the bug report if you want; if you don’t really understand why aliasing violations are a problem it’s a pretty straightforward introduction. That’s an important lesson, but to me it really drove home a much more important lesson: just because your code works in one version of a compiler when targeting a specific architecture doesn’t mean it will continue to work in the next version. When you rely on undefined behavior, all bets are off. People like to say that it could format your hard drive, and obviously that’s hyperbole (though technically true) but this is a good, real example of what can actually happen.
One important thing compilers can (and do) assume about undefined behavior is that your code doesn’t rely on it. In other words, it can assume that the undefined behavior is unreachable. In this case, since the alignment requirement (_Alignof(int64_t)
) for a 64-bit integer on x86 is 8 bytes, the compiler can assume that you would never attempt to access data which isn’t aligned to an 8-byte boundary by dereferencing a pointer to a 64-bit integer. From the compiler’s perspective, this means it is safe to emit faster code which assumes that your pointer to a 64-bit integer is aligned.
As you probably know, compilers add new optimizations all the time. This is a good thing; your code tends to magically get faster without you having to do any work beyond simply recompiling. In this case, that new optimization “broke” code which was working. Of course, the code was already broken, but since it worked before that was hard to know.
In this case, I would argue that one of the best possible outcomes occured: the code crashed. Yes, crashes are good. With a crash you know something went wrong. That’s a much better outcome than your data gets corrupted silently, and you get an incorrect result without you knowing you got an incorrect result. The only thing better than a crash is a compile-time error.
While updating the compiler is what caught the bug here, switching architectures can also catch many bugs. I found a ton of aliasing violations in SIMDe when I started testing on armv7, and when I fixed them other architectures magically started crashing less often, too, especially at higher optimization levels. The fact that SIMDe runs, and is tested on, armv7 means that the code is more reliable on all architectures, including x86_64, AArch64, POWER, s390x, and others.
Another great trick for catching bugs is a big endian architecture such as s390x (Arm and PPC also support big endian, but little endian is much more common). If you manipulate data using the wrong types, running your test suite on a big endian machine can often make the issue quite apparent as the result will likely be garbage. If you don’t have access to s390x (who does?), QEMU’s s390x implementation is excellent.
WebAssembly, in addition to being an increasingly important target in its own right, tends to be great at catching out-of-bounds access. I’ve had code which triggers a crash in d8 even when AddressSanitizer is completely silent.
Conclusion
There has been a lot of focus lately on replacing C and C++ with safer languages like Go and Rust. I’m not opposed to that; C and C++ are usually not the right choice when starting a new project today. That said, there is a lot of code out there right now written in C/C++, and it’s not going away any time soon. Part of the solution is definitely to transition away from C/C++, but another part is finding ways to improve C/C++. You can call it “putting lipstick on a pig”, “turning lemons into lemonade”, or just “a necessary evil”, but it is necessary.
Linus Torvalds once famously stated that, “given enough eyeballs, all bugs are shallow”. While history hasn’t necessarily been kind to this assertion, I think it’s pretty clear that if we consider compilers, static analyzers, integration testing, and other tools to be (metaphorical) eyeballs it becomes much easier to accept the veracity of this statement; maybe not all bugs are shallow, but a substantial portion become a lot less deep.
Writing reliable software, especially in languages like C and C++, is a very hard problem. You need all the help you can get, especially the kind which can be automated so it just runs quietly in the background until it finds a problem. That kind of help is a lot more reliable, cheaper, and more scalable, than the human kind.
Unfortunately, no one tool is perfect. The best option is defense in depth; using as many tools as you can to catch as many issues as you can as early as possible. If you write a bug, hopefully your compiler will catch it. If your compiler misses it, hopefully another compiler (or an older compiler, or a newer compiler) will catch it. If other compilers miss it, hopefully a static analyzer will catch it. If static analyzers miss it, hopefully a sanitizer will catch it. If the sanitizers miss it, hopefully other hardware will catch it.
Portability is not the answer; I don’t think there is a single answer. Portability is, however, an important tool to help write reliable software which a lot of people overlook because they treat it as an end instead of a means. Sometimes portability is the end goal, but it’s also a means to a more important end: reliability.
Using as many tools as possible means making sure your code works in as many places as possible. In other words, portability is reliability.
Edit: there is some discussion on Twitter about this which might be interesting to some.
Well said! Whenever I had the need/opportunity to make my C/C++ code portable, it has without fail benefited the version running on the original platform: Bugs removed, functions streamlined, readability gotchas smoothed out…
Short:
1: Time = Relative/ 2: Quantity = Quality & 3: Happiness = succes but succes is not happiness.
Portability is Reliability
–> ‘Architectures
Just like other compilers can help you catch different bugs, other architectures can do the same.’
*So you just build a sandbox for everything?
It’s not really quantity that is important, but *diversity*. For example, if your normal target is clang on Linux, you’re going to see a lot more benefit from porting to Visual Studio on Windows than you would see from clang on Windows. Similarly, if your normal target is x86_64 you’re probably going to get much more benefit from running it on Arm than i686. Think of it like fuzzing; feeding the same inputs into your code over and over isn’t going to help, but a good fuzzer will explorer different code paths to try to trigger different results.
I’m not really sure what you mean by 1 and 3 above, at least in this context. I will say that this approach definitely provides less benefit relative to the amount of effort required than, say, using Asan/UBSan/TSan, turning on warnings, etc. Tools like those should absolutely be your first line of defense, but this provides another layer of defense. It’s like the Venn diagram I mentioned in the post; yes, there is going to be a lot of overlap, but depending on how important reliability is to you it may be worthwhile to invest in portability even if your code will never run in production on other targets to catch those issues which *don’t* overlap.
As for the sandbox thing, cross-compilation and emulation (i.e., QEMU) are pretty straightforward, and I use that approach extensively (Debian has great packaging for that stuff, so these days I tend to do a lot of this in a Docker container running Debian). Like I said in the post, Drone.io has Arm machines available, or you could use Graviton on AWS, or just buy a Raspberry Pi to use as a build server.
You also seem to be very focused on other architectures, but portability also encompasses other *compilers*, even on the same architecture. The “Compilers and Static Analyzers” section isn’t an introduction, it’s on equal footing with the “Architectures” section. Porting your code to other compilers means you get access to their compile-time diagnostics and static analysis tools. That is usually more important than porting to other architectures, but one of the overlooked benefits of porting to other architectures is that, if you you have a good test suite, they can also act a bit like static analyzers and sometimes catch issues that other tools don’t.