LLVM Project News and Details from the Trenches

Friday, March 10, 2017

Devirtualization in LLVM and Clang

This blog post is part of a series of blog posts from students who were funded by the LLVM Foundation to attend the 2016 LLVM Developers' Meeting in San Jose, CA. Please visit the LLVM Foundation's webpage for more information on our Travel Grants program. 

This post is from Piotr Padlewski on his work that he presented at the meeting:

This blogpost will show how C++ devirtualization is performed in current (4.0) clang and LLVM and also ongoing work on -fstrict-vtable-pointers features.

Devirtualization done by the frontend

In order to transform a virtual call into a direct call, the frontend must be sure that there are no overrides of vfunction in the program or know the dynamic type of object. Compilation proceeds one translation unit at a time, so, barring LTO, there are only a few cases when the compiler may conclude that there are no overrides:

  • either the class or virtual method is marked as final
  • the class is defined in an anonymous namespace and has no deriving classes in its translation unit

The latter is more tricky for clang, which translates the source code in chunks on the fly (see: ASTProducer and ASTConsumer), so is not able to determine if there are any deriving classes later in the source. This could be dealt with in a couple of ways:
  • give up immediate generation
  • run data flow analysis in LLVM to find all the dynamic types passed to function, which has static linkage
  • hope that every use of the virtual function, which is necessarily in the same translation unit, will be inlined by LLVM -- static linkage increases the chances of inlining

Store to load propagation in LLVM

In order to devirtualize a virtual call we need:
  • value of vptr - which virtual table is pointed by it
  • value of vtable slot - which exact virtual function it is

Because vtables are constant, the latter value is much easier to get when we have the value of vptr. The only thing we need is vtable definition, which can be achieved by using available_externally linkage.

In order to figure out the vptr value, we have to find the store to the same location that defines it. There are 2 analysis responsible for it:

  • MemDep (Memory Dependence Analysis) is a simple linear algorithm that for each quered instruction iterates through all instructions above and stops when first dependency is found. Because queries might be performed for each instruction we end up with a quadratic algorithm. Of course quadratic algorithms are not welcome in compilers, so MemDep can only check certain number of instructions.
  • Memory SSA on the other hand has constant complexity because of caching. To find out more, watch “Memory SSA in 5minutes” (https://www.youtube.com/watch?v=bdxWmryoHak). MemSSA is a pretty new analysis and it doesn’t have all the features MemDep has, therefore MemDep is still widely used.
The LLVM main pass that does store to load propagation is GVN - Global Value Numbering.

Finding vptr store

In order to figure out the vptr value, we need to see store from constructor. To not rely on constructor's availability or inlining, we decided to use the @llvm.assume intrinsic to indicate the value of vptr. Assume is akin to assert - optimizer seeing call to @llvm.assume(i1 %b) can assume that %b is true after it. We can indicate vptr value by comparing it with the vtable and then call the @llvm.assume with the result of this comparison.

call void @_ZN1AC1Ev(%struct.A* %a) ; call ctor
 %3 = load {...} %a                  ; Load vptr
 %4 = icmp eq %3, @_ZTV1A      ; compare vptr with vtable
 call void @llvm.assume(i1 %4)

Calling multiple virtual functions

A non-inlined virtual call will clobber the vptr. In other words, optimizer will have to assume that vfunction might change the vptr in passed object. This sounds like something that never happens because vptr is “const”. The truth is that it is actually weaker than C++ const member, because it changes multiple times during construction of an object (every base type constructor or destructor must set vptrs). But vptr can't be changed during a virtual call, right? Well, what about that?

void A::foo() { // virtual
static_assert(sizeof(A) == sizeof(Derived));
new(this) Derived;

This is call of placement new operator - it doesn’t allocate new memory, it just creates a new object in the provided location. So, by constructing a Derived object in the place where an object of type A was living, we change the vptr to point to Derived’s vtable. Is this code even legal? C++ Standard says yes.

However it turns out that if someone called foo 2 times (with the same object), the second call would be undefined behavior. Standard pretty much says that call or dereference of a pointer to an object whose lifetime has ended is UB, and because the standard agrees that nuking object from inside ends its lifetime, the second call is UB. Be aware that this is only because a zombie pointer is used for the second call. The pointer returned by placement new is considered alive, so performing calls on that pointer is valid. Note that we also silently used that fact with the use of assume.

(un)clobbering vptr

We need to somehow say that vptr is invariant during its lifetime. We decided to introduce a new metadata for that purpose - !invariant.group. The presence of the invariant.group metadata on the load/store tells the optimizer that every load and store to the same pointer operand within the same invariant group can be assumed to load or store the same value. With -fstrict-vtable-pointers Clang decorates vtable loads with invariant.group metadana coresponding to caller pointer type. 

We can enhance the load of virtual function (second load) by decorating it with !invariant.load, which is equivalent of saying “load from this location is always the same”, which is true because vtables never changes. This way we don’t rely on having the definition of vtable.

Call like:

void g(A *a) {

Will be translated to:

define void @function(%struct.A* %a) {
 %1 = load {...} %a, !invariant.group !0
 %2 = load {...} %1, !invariant.load !1
 call void %2(%struct.A* %a)

 %3 = load {...} %a, !invariant.group !0
 %4 = load {...} %4, !invariant.load !1
 call void %4(%struct.A* %a)
 ret void

!0 = !{!"_ZTS1A"} ; mangled type name of A
!1 = !{}

And now by magic of GVN and MemDep:

define void @function(%struct.A* %a) {
 %1 = load {...} %a, !invariant.group !0
 %2 = load {...} %1, !invariant.load !1
 call void %2(%struct.A* %a)
 call void %2(%struct.A* %a)
 ret void

With this, llvm-4.0 is be able to devirtualize function calls inside loops. 


In order to prevent the middle-end from finding load/store with the same !invariant.group metadata, that would come from construction/destruction of dead dynamic object, @llvm.invariant.group.barrier was introduced. It returns another pointer that aliases its argument but is considered different for the purposes of load/store invariant.group metadata. Optimizer won’t be able to figure out that returned pointer is the same because intrinsics don’t have a definition. Barrier must be inserted in all the places where the dynamic object changes:
  • constructors
  • destructors
  • placement new of dynamic object

Dealing with barriers

Barriers hinder some other optimizations. Some ideas how it could be fixed:

  • stripping invariant.group metadata and barriers just after devirtualization. Currently it is done before codegen. The problem is that most of the devirtualization comes from GVN, which also does most of the optimizations we would miss with barriers. GVN is expensive therefore it is run only once. It also might make less sense if we are in LTO mode, because that would limit the devirtualization in the link phase. 
  • teaching important passes to look through the barrier. This might be very tricky to preserve the semantics of barrier, but e.g. looking for dependency of load without invariant.group by jumping through the barrier to find a store without invariant.group, is likely to do the trick.
  • removing invariant.barrier when its argument comes from alloca and is never used etc.
To find out more details about devirtualization check my talk (http://llvm.org/devmtg/2016-11/#talk6) from LLVM Dev Meeting 2016.

About author

Undergraduate student at University of Warsaw, currently working on C++ static analysis in IIIT.

Monday, March 6, 2017

Some news about apt.llvm.org

apt.llvm.org provides Debian and Ubuntu repositories for every maintained version of these distributions. LLVM, Clang, clang extra tools, compiler-rt, polly, LLDB and LLD packages are generated for the stable, stabilization and development branches.

As it seems that we have more and more users of these packages, I would like to share an update about various recent changes.

New features

First, the cool new stuff : lld is now proposed and built for i386/amd64 on all Debian and Ubuntu supported versions. The test suite is also executed and the coverage results are great.

Then, following the branching for the 4.0 release, I created new repositories to propose this release.
For example, for Debian stable, just add the following in /etc/apt/sources.list.d/llvm.list

deb http://apt.llvm.org/jessie/ llvm-toolchain-jessie-4.0 main
  deb-src http://apt.llvm.org/jessie/ llvm-toolchain-jessie main

Obviously, the trunk is now 5.0. If llvm-defaults is used, clang, lldb and other meta packages will be automatically updated to this version.
As a consequence and also because the branches are dead, 3.7 and 3.8 jobs have been disabled. Please note that both repositories are still available on apt.llvm.org and won't be removed.

Zesty: New Ubuntu
Packages for the next Ubuntu 17.04 (zesty) are also generated for 3.9, 4.0 and 5.0.

It has been implemented a few months ago but not clearly communicated. libfuzzer has also its own packages: libfuzzer-X.Y-dev (example: libfuzzer-3.9-dev, libfuzzer-4.0-dev or libfuzzer-5.0-dev).

Changes in the infrastructure

In order to support the load, I started to use new blades that Google (thanks again to Nick Lewycky) sponsored for an initiative that I was running for Debian and IRILL. The 6 new blades removed all the wait time. With a new salt configuration, I automated the deployment of the slaves. In case the load increases again, we will have access to more blades.

I also took the time to fix some long ongoing issues:
  • all repositories are signed and verified that they are    
  • i386 and amd64 packages are now uploaded at once instead of being uploaded separately. This was causing checksum error when one of the two architectures built correctly and the second was failing (ex: test failing)
Last but not least, the code coverage results are produced in a more reliable manner.

More information about the implementation and services.

As what is shipped on apt.llvm.org is exactly the same as in Debian and Ubuntu, packaging files are stored on the Debian subversion server.

A Jenkins instance is in charge of the orchestration of the whole build infrastructure.

The trunk packages are built twice a day for every Debian and Ubuntu packages. Branches (3.9 and 4.0 currently) are rebuilt only when the - trigger job found a change.

In both case, the Jenkins source job will checkout the Debian SVN branches for their version, checkout/update LLVM/clang/etc repositories and repack everything to create the source tarballs and Debian files (dsc, etc).The completion of job will trigger the binaries job to start. These jobs, thanks to Debian Jenkins glue will create or update Debian/Ubuntu versions.

Then builds are done the usual way through pbuilder for both i386 and amd64. All the test suites are going to be executed. If any LLVM test is failing on i386 or amd64, the whole build will fail. If both builds and the LLVM testsuite are successful, the sync job will start and rsync packages to the LLVM server to be replicated on the CDN. If one or both builds fail, a notification is sent to the administrator.

Some Debian static analysis (lintian) are executed on the packages to prevent some packaging errors. From time to time, some interesting issues are found.

In parallel, some binary builds have some special hooks like Coverity, code coverage or installation of more recent versions of gcc for Ubuntu precise.

Report bugs

Bugs can be reported on the bugzilla of the LLVM project in the product "Packaging" and the component "deb packages".

Common issues

Because packaging quickly moving projects like LLVM or clang, in some cases, this can be challenging to follow the rhythm in particular with regard to tests. For Debian unstable or the latest version of Ubuntu, the matrix is complexified by new versions of the basic pieces of the operating system like gcc/g++ or libtstdc++.

This is also not uncommon that some tests are being ignored in the process.

How to help

Some new comers bugs are available. As an example:
Related to all this, a Google Summer of Code 2017 under the LLVM umbrella has been proposed: Integrate libc++ and OpenMP in apt.llvm.org

Help is also needed to keep track of the new test failures and get them fixed upstream. For example, a few tests have been marked as expected to fail to avoid crashes.