Testing libc++ with -fsanitize=undefined

[This article is re-posted in a slightly expanded form from Marshall's blog]

After my last article, Testing libc++ with Address Sanitizer, I thought "what other tests can I run?"

Address Sanitizer (ASan) is not the only "sanitizer" that clang offers. There are "Thread Sanitizer" (TSan), "Undefined Behavior Sanitizer" (UBSan), and others. There's an integer overflow sanitizer which is called IOC coming in the 3.3 release of clang. The documentation for UBSan can be found on the LLVM site.

I have been looking at the results of running the libc++ test suite with UBSan enabled. Even if you're not interested in libc++ specifically, this post can be a useful introduction to useful Clang bug detectors, and shows several classes of problems they can find.

The mechanics

Like ASan, UBSan is a compiler pass and a custom runtime library. You enable this by passing -fsanitize=undefined to the compiler and linker. I ran the libc++ test suite like this:

cd $LLVM/libcxx/test
CC=/path/to/tot/clang OPTIONS="--std=c++11 -stdlib=libc++ -fsanitize=undefined" ./testit

Unfortunately, this failed; working with unreleased compilers and libraries, I needed updated versions of both libc++.dylib and libc++abi.dylib. So I built those from sources, and then used DYLD_LIBRARY_PATH to make sure that the test program used the libraries that I'd just built. (I didn't want to replace the ones in /usr/lib, because lots of things in the system depend on them)

cd $LLVM/libcxx/test
DYLD_LIBRARY_PATH=$LLVM/libcxx/lib:$LLVM/libcxxabi/lib CC=/path/to/tot/clang OPTIONS="-std=c++11 -stdlib=libc++ -fsanitize=undefined -L $LLVM/libcxxabi/lib -lc++abi" ./testit

where, as before "/path/to/tot/clang" is the clang that I just built from source, and $LLVM is where I've checked out the various parts of LLVM from Subversion.

The results

And the tests were off and running. In the last article, I noted that these tests take about 30 minutes to run on my MacBook Pro. The ASan tests took about 90 minutes. I was pleasantly surprised when the UBSan tests finished in about 42 minutes, or about 40% slower than the baseline tests. There were 12 tests (out of more than 4800) that failed under normal circumstances. Using UBSan, 49 tests failed, and there were about 48,463 different runtime errors reported by UBSan.

The failing tests

Of the 37 tests that failed under UBSan, 34 of them were aborted because of uncaught exception of type XXXX, where XXX was from the standard library (std::out_of_range, for example). This is caused by a mismatch between libc++ and libc++abi, specifically by the fact that both my custom-built libc++ and my custom-built libc++abi contained typeinfo records for some of the standard exception classes. Getting this right and getting all the bits of the test infrastructure to use the right libraries turned into a big mess very quickly, and I still don't have a good solution here.

However, I was able to convince myself that these failures were not the result of a bug in either libc++, the test suite or UBSan.

The other three failures were in the std::thread test suite. When I investigated, it turned out that there was a race condition in some of the thread tests. A race condition? In threading code? Inconceivable!

Apparently the runtime environment under UBSan was different enough to trigger the (latent) race condition in these three tests. Looking at the test suite, I found the same race condition in 10 other tests as well. I committed revision 178029 to fix this in all 13 tests.

The error messages

48K errors! I can't look at 48K error messages; so I decided to bin them.

There were 37,675 messages of the form: 0x000106ae3fff: runtime error: value inf is outside the range of representable values of type 'xxxx' where "xxxx" could be "double" or "float" (this also included "-inf" as well)

and 10,693 messages of the form: 0x000101a8f244: runtime error: value nan is outside the range of representable values of type 'xxxx'; where "xxxx" could be "double" or "float".

There were 52 messages of the form:

what.pass.cpp:24:9: runtime error: member call on address 0x7fff5e8f48d0 which does not point to an object of type 'std::logic_error'.

There were 29 messages like this: eval.pass.cpp:180:14: runtime error: division by zero

There were 6 messages like this:

/Sources/LLVM/libcxx/include/memory:3163:25 runtime error: load of misaligned address 0x7fff569a85c6 for type 'const unsigned long', which requires 8 byte alignment

There were 5 messages like this:

0x0001037a329e: runtime error: load of value 4294967294, which is not a valid value for type 'std::regex_constants::match_flag_type'

There were 2 messages like this: /Sources/LLVM/libcxx/include/locale:3361:48: runtime error: index 40 out of bounds for type 'char_type [10]'

There was one message like this: runtime error: load of value 64, which is not a valid value for type 'bool'

The first thing that I noticed is that sometimes UBSan will give you file and line number, and otherwise just a hex address. The file and line number is incredibly useful for tracking stuff down.

The Analysis

Working from the bottom up:

The load of value 64, which is not a valid value for type 'bool' message came out of one of the atomics tests, where it is trying to clear and set an atomic flag that has been default constructed. I don't know what the correct behavior is here; I'm still looking at this one.

The index 40 out of bounds for type 'char_type [10]' errors came from the money formatting tests in libc++, and were failing only on "wide string" versions of the tests; i.e, with two (or four) byte characters. The offending line turned out to be:

*__nc = __src[find(__atoms, __atoms+sizeof(__atoms), *__w) - __atoms];

and the problem was that sizeof(__atoms) was assumed to be the same as the number of entries in that array. Perfectly fine for character arrays, not so fine for wide character arrays. Fixed in revision 177694.

The load of value 4294967294, which is not a valid value for type 'std::regex_constants::match_flag_type' errors turned out to be simple to fix as well, once we decided what the right fix was.

This turned out to be complicated, because it involved a close reading of the standards document. The problem was that match_flag_type was an enum, emulating a bitmask. The type also had an operator ~(), which flipped all the bits in the type. But since the type was implemented as an enum, it had an underlying integer type that it was represented as, and the operator ~ just flipped all the bits. This led to values that UBSan didn't like. A large discussion followed, with sentiments like "does it matter" and "can any code actually tell", and so on. Eventually, I just changed the operator ~ to only flip the bits that are valid in the enumeration. Fixed in revision 177693.

The load of misaligned address 0x7fff569a85c6 for type 'const unsigned long', which requires 8 byte alignment were in the hashing code for strings. They are a performance optimization, and I haven't tried to touch them. Whatever changes are made here will have to be done very carefully, since this will affect the performance of all the associative containers.

The "division by zero" messages were in three different tests. There were 3 of them in the numeric limits tests, and they were there on purpose. There were 2 of them in the complex number tests, and they were also on purpose. The other 24 of them were in the random number test suite, where the tests were generating a bunch of random numbers (using various distributions) and checking to see that the mean, variance, standard deviation, skew, etc, were all what the programmer expected. The problem is in the last measurement: skew. It is some calculated value divided by the variance. If the variance is zero, then the skew should be infinity. Many of the tests in the random number suite are testing "edge cases" of the random number generators, and some of these edge cases will produce a sequence where all the numbers are the same (and thus, the variance == 0). We solved this by commenting out the calculation of the skew for these degenerate cases, and leaving a comment in the test source file. Howard fixed this in revision 177826.

The runtime error: member call on address 0x7fff5e8f48d0 which does not point to an object of type 'std::logic_error' messages, as it turned out, were due to a bug in UBSan.

I'm just getting started on the inf/-inf/nan messages (about 48K of those). Most of these come from the complex number regression tests. Since this is a test suite for a library that implements a bunch of numeric routines, a lot of the tests actually do generate and use nan/inf, so I expect that many of these will be "false positives". Richard Smith has pointed out:

The C++ standard’s treatment of Inf and NaN values is highly underspecified, so for the most part it’s not clear what has defined behavior and what does not.

Anyway… I’m updating UBSan to suppress the diagnostics for conversions of ‘Inf’ and ‘NaN’ between floating-point types, and will probably split out a separate flag for finite overflow in conversions to floating-point types, so that users can turn it off as needed. I think that’s the right compromise for the time being.

Conclusions

This exercise, while not completed, has already turned up a set of bugs in the libc++ test suite, as well as a bug in libc++ and some undefined behavior in libc++. There's more to look at here, but I think this was a good exercise. There's kind of a mismatch of expectations here, especially in the complex and numeric test suites, because UBSan is looking for nan/inf/-inf and the libc++ test code is deliberately generating them.

Thanks to Howard Hinnant for his patience and explanations about the C++ standard and libc++ and the libc++ test suite, and to Richard Smith for his help with UBSan and interpreting the C++ standard.