LLVM Project Blog

LLVM Project News and Details from the Trenches

Friday, March 15, 2019

LLVM Numerics Blog

Keywords: Numerics, Clang, LLVM-IR, : 2019 LLVM Developers' Meeting, LLVMDevMtg.

The goal of this blog post is to start a discussion about numerics in LLVM – where we are, recent work and things that remain to be done.  There will be an informal discussion on numerics at the 2019 EuroLLVM conference next month. One purpose of this blog post is to refresh everyone's memory on where we are on the topic of numerics to restart the discussion.

In the last year or two there has been a push to allow fine-grained decisions on which optimizations are legitimate for any given piece of IR.  In earlier days there were two main modes of operation: fast-math and precise-math.  When operating under the rules of precise-math, defined by IEEE-754, a significant number of potential optimizations on sequences of arithmetic instructions are not allowed because they could lead to violations of the standard.  

For example: 

The Reassociation optimization pass is generally not allowed under precise code generation as it can change the order of operations altering the creation of NaN and Inf values propagated at the expression level as well as altering precision.  

Precise code generation is often overly restrictive, so an alternative fast-math mode is commonly used where all possible optimizations are allowed, acknowledging that this impacts the precision of results and possibly IEEE compliant behavior as well.  In LLVM, this can be enabled by setting the unsafe-math flag at the module level, or passing the -funsafe-math-optimizations to clang which then sets flags on the IR it generates.  Within this context the compiler often generates shorter sequences of instructions to compute results, and depending on the context this may be acceptable.  Fast-math is often used in computations where loss of precision is acceptable.  For example when computing the color of a pixel, even relatively low precision is likely to far exceed the perception abilities of the eye, making shorter instruction sequences an attractive trade-off.  In long-running simulations of physical events however loss of precision can mean that the simulation drifts from reality making the trade-off unacceptable.

Several years ago LLVM IR instructions gained the ability of being annotated with flags that can drive optimizations with more granularity than an all-or-nothing decision at the module level.  The IR flags in question are: 

nnan, ninf, nsz, arcp, contract, afn, reassoc, nsw, nuw, exact.  

Their exact meaning is described in the LLVM Language Reference Manual.   When all the flags are are enabled, we get the current fast-math behavior.  When these flags are disabled, we get precise math behavior.  There are also several options available between these two models that may be attractive to some applications.  In the past year several members of the LLVM community worked on making IR optimizations passes aware of these flags.  When the unsafe-math module flag is not set these optimization passes will work by examining individual flags, allowing fine-grained selection of the optimizations that can be enabled on specific instruction sequences.  This allows vendors/implementors to mix fast and precise computations in the same module, aggressively optimizing some instruction sequences but not others.

We now have good coverage of IR passes in the LLVM codebase, in particular in the following areas:
* Intrinsic and libcall management
* Instruction Combining and Simplification
* Instruction definition
* SDNode definition
* GlobalIsel Combining and code generation
* Selection DAG code generation
* DAG Combining
* Machine Instruction definition
* IR Builders (SDNode, Instruction, MachineInstr)
* CSE tracking
* Reassociation
* Bitcode

There are still some areas that need to be reworked for modularity, including vendor specific back-end passes.  

The following are some of the contributions mentioned above from the last 2 years of open source development:

https://reviews.llvm.org/D45781 : MachineInst support mapping SDNode fast math flags for support in Back End code generation 
https://reviews.llvm.org/D46322 : [SelectionDAG] propagate 'afn' and 'reassoc' from IR fast-math-flags
https://reviews.llvm.org/D45710 : Fast Math Flag mapping into SDNode
https://reviews.llvm.org/D46854 : [DAG] propagate FMF for all FPMathOperators
https://reviews.llvm.org/D48180 : updating isNegatibleForFree and GetNegatedExpression with fmf for fadd
https://reviews.llvm.org/D48057: easing the constraint for isNegatibleForFree and GetNegatedExpression
https://reviews.llvm.org/D47954 : Utilize new SDNode flag functionality to expand current support for fdiv
https://reviews.llvm.org/D47918 : Utilize new SDNode flag functionality to expand current support for fma
https://reviews.llvm.org/D47909 : Utilize new SDNode flag functionality to expand current support for fadd
https://reviews.llvm.org/D47910 : Utilize new SDNode flag functionality to expand current support for fsub
https://reviews.llvm.org/D47911 : Utilize new SDNode flag functionality to expand current support for fmul
https://reviews.llvm.org/D48289 : refactor of visitFADD for AllowNewConst cases
https://reviews.llvm.org/D47388 : propagate fast math flags via IR on fma and sub expressions
https://reviews.llvm.org/D47389 : guard fneg with fmf sub flags
https://reviews.llvm.org/D47026 : fold FP binops with undef operands to NaN
https://reviews.llvm.org/D47749 : guard fsqrt with fmf sub flags
https://reviews.llvm.org/D46447 : Mapping SDNode flags to MachineInstr flags
https://reviews.llvm.org/D50195 : extend folding fsub/fadd to fneg for FMF
https://reviews.llvm.org/rL339197 : [NFC] adding tests for Y - (X + Y) --> -X
https://reviews.llvm.org/D50417 : [InstCombine] fold fneg into constant operand of fmul/fdiv
https://reviews.llvm.org/rL339357 : extend folding fsub/fadd to fneg for FMF
https://reviews.llvm.org/D50996 : extend binop folds for selects to include true and false binops flag intersection
https://reviews.llvm.org/rL339938 : add a missed case for binary op FMF propagation under select folds
https://reviews.llvm.org/D51145 : Guard FMF context by excluding some FP operators from FPMathOperator
https://reviews.llvm.org/rL341138 : adding initial intersect test for Node to Instruction association
https://reviews.llvm.org/rL341565 : in preparation for adding nsw, nuw and exact as flags to MI
https://reviews.llvm.org/D51738 : add IR flags to MI
https://reviews.llvm.org/D52006 : Copy utilities updated and added for MI flags
https://reviews.llvm.org/rL342598 : add new flags to a DebugInfo lit test
https://reviews.llvm.org/D53874 : [InstSimplify] fold 'fcmp nnan oge X, 0.0' when X is not negative
https://reviews.llvm.org/D55668 : Add FMF management to common fp intrinsics in GlobalIsel
https://reviews.llvm.org/rL352396 : [NFC] TLI query with default(on) behavior wrt DAG combines for fmin/fmax target…
https://reviews.llvm.org/rL316753 (Fold fma (fneg x), K, y -> fma x, -K, y)
https://reviews.llvm.org/D57630 : Move IR flag handling directly into builder calls for cases translated from Instructions in GlobalIsel
https://reviews.llvm.org/rL332756 : adding baseline fp fold tests for unsafe on and off
https://reviews.llvm.org/rL334035 : NFC: adding baseline fneg case for fmf
https://reviews.llvm.org/rL325832 : [InstrTypes] add frem and fneg with FMF creators
https://reviews.llvm.org/D41342 : [InstCombine] Missed optimization in math expression: simplify calls exp functions
https://reviews.llvm.org/D52087 : [IRBuilder] Fixup CreateIntrinsic to allow specifying Types to Mangle.
https://reviews.llvm.org/D52075 : [InstCombine] Support (sub (sext x), (sext y)) --> (sext (sub x, y)) and (sub (zext x), (zext y)) --> (zext (sub x, y))
https://reviews.llvm.org/rL338059 : [InstCombine] fold udiv with common factor from muls with nuw
Commit: e0ab896a84be9e7beb59874b30f3ac51ba14d025 : [InstCombine] allow more fmul folds with ‘reassoc'
Commit: 3e5c120fbac7bdd4b0ff0a3252344ce66d5633f9 : [InstCombine] distribute fmul over fadd/fsub
https://reviews.llvm.org/D37427 : [InstCombine] canonicalize fcmp ord/uno with constants to null constant
https://reviews.llvm.org/D40130 : [InstSimplify] fold and/or of fcmp ord/uno when operand is known nnan
https://reviews.llvm.org/D40150 : [LibCallSimplifier] fix pow(x, 0.5) -> sqrt() transforms
https://reviews.llvm.org/D39642 : [ValueTracking] readnone is a requirement for converting sqrt to llvm.sqrt; nnan is not
https://reviews.llvm.org/D39304 : [IR] redefine 'reassoc' fast-math-flag and add 'trans' fast-math-flag
https://reviews.llvm.org/D41333 : [ValueTracking] ignore FP signed-zero when detecting a casted-to-integer fmin/fmax pattern
https://reviews.llvm.org/D5584 : Optimize square root squared (PR21126)
https://reviews.llvm.org/D42385 : [InstSimplify] (X * Y) / Y --> X for relaxed floating-point ops
https://reviews.llvm.org/D43160 : [InstSimplify] allow exp/log simplifications with only 'reassoc’ FMF
https://reviews.llvm.org/D43398 : [InstCombine] allow fdiv folds with less than fully 'fast’ ops
https://reviews.llvm.org/D44308 : [ConstantFold] fp_binop AnyConstant, undef --> NaN
https://reviews.llvm.org/D43765 : [InstSimplify] loosen FMF for sqrt(X) * sqrt(X) --> X
https://reviews.llvm.org/D44521 : [InstSimplify] fp_binop X, NaN --> NaN
https://reviews.llvm.org/D47202 : [CodeGen] use nsw negation for abs
https://reviews.llvm.org/D48085 : [DAGCombiner] restrict (float)((int) f) --> ftrunc with no-signed-zeros
https://reviews.llvm.org/D48401 : [InstCombine] fold vector select of binops with constant ops to 1 binop (PR37806)
https://reviews.llvm.org/D39669 : DAG: Preserve nuw when reassociating adds
https://reviews.llvm.org/D39417 : InstCombine: Preserve nuw when reassociating nuw ops
https://reviews.llvm.org/D51753 : [DAGCombiner] try to convert pow(x, 1/3) to cbrt(x)
https://reviews.llvm.org/D51630 : [DAGCombiner] try to convert pow(x, 0.25) to sqrt(sqrt(x))
https://reviews.llvm.org/D53650 : [FPEnv] Last BinaryOperator::isFNeg(...) to m_FNeg(...) changes
https://reviews.llvm.org/D54001 : [ValueTracking] determine sign of 0.0 from select when matching min/max FP
https://reviews.llvm.org/D51942 : [InstCombine] Fold (C/x)>0 into x>0 if possible
https://llvm.org/svn/llvm-project/llvm/trunk@348016 : [SelectionDAG] fold FP binops with 2 undef operands to undef
http://llvm.org/viewvc/llvm-project?view=revision&revision=346242 : propagate fast-math-flags when folding fcmp+fpext, part 2
http://llvm.org/viewvc/llvm-project?view=revision&revision=346240 : propagate fast-math-flags when folding fcmp+fpext
http://llvm.org/viewvc/llvm-project?view=revision&revision=346238 : [InstCombine] propagate fast-math-flags when folding fcmp+fneg, part 2
http://llvm.org/viewvc/llvm-project?view=revision&revision=346169 : [InstSimplify] fold select (fcmp X, Y), X, Y
http://llvm.org/viewvc/llvm-project?view=revision&revision=346234 : propagate fast-math-flags when folding fcmp+fneg
http://llvm.org/viewvc/llvm-project?view=revision&revision=346147 : [InstCombine] canonicalize -0.0 to +0.0 in fcmp
http://llvm.org/viewvc/llvm-project?view=revision&revision=346143 : [InstCombine] loosen FP 0.0 constraint for fcmp+select substitution
http://llvm.org/viewvc/llvm-project?view=revision&revision=345734 : [InstCombine] refactor fabs+fcmp fold; NFC
http://llvm.org/viewvc/llvm-project?view=revision&revision=345728 : [InstSimplify] fold 'fcmp nnan ult X, 0.0' when X is not negative
http://llvm.org/viewvc/llvm-project?view=revision&revision=345727 : [InstCombine] add assertion that InstSimplify has folded a fabs+fcmp; NFC


While multiple people have been working on finer-grained control over fast-math optimizations and other relaxed numerics modes, there has also been some initial progress on adding support for more constrained numerics models. There has been considerable progress towards adding and enabling constrained floating-point intrinsics to capture FENV_ACCESS ON and similar semantic models.

These experimental constrained intrinsics prohibit certain transforms that are not safe if the default floating-point environment is not in effect. Historically, LLVM has in practice basically “split the difference” with regard to such transforms; they haven’t been explicitly disallowed, as LLVM doesn’t model the floating-point environment, but they have been disabled when they caused trouble for tests or software projects. The absence of a formal model for licensing these transforms constrains our ability to enable them. Bringing language and backend support for constrained intrinsics across the finish line will allow us to include transforms that we disable as a matter of practicality today, and allow us to give developers an easy escape valve (in the form of FENV_ACCESS ON and similar language controls) when they need more precise control, rather than an ad-hoc set of flags to pass to the driver.

We should discuss these new intrinsics to make sure that they can capture the right models for all the languages that LLVM supports.


Here are some possible discussion items:

  • Should specialization be applied at the call level for edges in a call graph where the caller has special context to extend into the callee wrt to flags?
  • Should the inliner apply something similar to calls that meet inlining criteria?
  • What other part(s) of the compiler could make use of IR flags that are currently not covered?
  • What work needs to be done regarding code debt wrt current areas of implementation.

Thursday, March 7, 2019

FOSDEM 2019 LLVM developer room report


As well as at the LLVM developer meetings, the LLVM community is also present at a number of other events. One of those is FOSDEM, which has had a dedicated LLVM track since 2014.
Earlier this February, the LLVM dev room was back for the 6th time.

FOSDEM is one of the largest open source conferences, attracting over 8000 developers attending over 30 parallel tracks, occupying almost all space of the ULB university campus in Brussels.

In comparison to the LLVM developer meetings, this dev room offers more of an opportunity to meet up with developers from a very wide range of open source projects.

As in previous years, the LLVM dev room program consisted of presentations with a varied target audience, ranging from LLVM developers to LLVM users, including people not yet using LLVM but interested in discovering what can be done with it. 
On the day itself, the room was completely packed for most presentations, often with people waiting outside to be able to enter for the next presentation.
Slides and videos of the presentations are available via the links below


Finally, I want to express my gratitude to the LLVM Foundation, which sponsored travel expenses for a few presenters who couldn't otherwise have made it to the conference.

Monday, February 11, 2019

EuroLLVM'19 developers' meeting program

The LLVM Foundation is excited to announce the program for the EuroLLVM'19 developers' meeting (April 8 - 9 in Brussels / Belgium) !

Keynote
Technical talks
Tutorials
Student Research Competition
Lightning talks
BoFs
Posters
If you are interested in any of this talks, you should register to attend the EuroLLVM'19. Tickets are limited !

More information about the EuroLLVM'19 is available here

Wednesday, November 14, 2018

30% faster Windows builds with clang-cl and the new /Zc:dllexportInlines- flag

Background

In the course of adding Microsoft Visual C++ (MSVC) compatible Windows support to Clang, we worked hard to make sure the dllexport and dllimport declspecs are handled the same way by Clang as by MSVC.

dllexport and dllimport are used to specify what functions and variables should be externally accessible ("exported") from the currently compiled Dynamic-Link Library (DLL), or should be accessed ("imported") from another DLL. In the class declaration below, S::foo() will be exported when building a DLL:

struct __declspec(dllexport) S {
  void foo() {}
};

and code using that DLL would typically see a declaration like this:

struct __declspec(dllimport) S {
  void foo() {}
};

to indicate that the function is defined in and should be accessed from another DLL.

Often the same declaration is used along with a preprocessor macro to flip between dllexport and dllimport, depending on whether a DLL is being built or consumed.

The basic idea of dllexport and dllimport is simple, but the semantics get more complicated as they interact with more facets of the C++ language: templates, inheritance, different kinds of instantiation, redeclarations with different declspecs, and so on. Sometimes the semantics are surprising, but by now we think clang-cl gets most of them right. And as the old maxim goes, once you know the rules well, you can start tactfully breaking them.

One issue with dllexport is that for inline functions such as S::foo() above, the compiler must emit the definition even if it's not used in the translation unit. That's because the DLL must export it, and the compiler cannot know if any other translation unit will provide a definition.

This is very inefficient. A dllexported class with inline members in a header file will cause definitions of those members to be emitted in every translation unit that includes the header, directly or indirectly. And as we know, C++ source files often end up including a lot of headers. This behaviour is also different from non-Windows systems, where inline function definitions are not emitted unless they're used, even in shared objects and dynamic libraries.

/Zc:dllexportInlines-

To address this problem, clang-cl recently gained a new command-line flag, /Zc:dllexportInlines- (MSVC uses the /Zc: prefix for language conformance options). The basic idea is simple: since the definition of an inline function is available along with its declaration, it's not necessary to import or export it from a DLL — the inline definition can be used directly. The effect of the flag is to not apply class-level dllexport/dllimport declspecs to inline member functions. In the two examples above, it means S::foo() would not be dllexported or dllimported, even though the S class is declared as such.

This is very similar to the -fvisibility-inlines-hidden Clang and GCC flag used on non-Windows. For C++ projects with many inline functions, it can significantly reduce the set of exported functions, and thereby the symbol table and file size of the shared object or dynamic library, as well as program load time.

On Windows however, the main benefit is not having to emit the unused inline function definitions. This means the compiler has to do much less work, and reduces object file size which in turn reduces the work for the linker. For Chrome, we saw 30 % faster full builds, 30 % shorter link times for blink_core.dll, and 40 % smaller total .obj file size.

The reduction in .obj file size, combined with the enormous reduction in .lib files allowed by previously switching linkers to lld-link which uses thin archives, means that a typical Chrome build directory is now 60 % smaller than it would have been just a year ago.

(Some of the same benefit can be had without this flag if the dllexport inline function comes from a pre-compiled header (PCH) file. In that case, the definition will be emitted in the object file when building the PCH, and so is not emitted elsewhere unless it's used.)

Compatibility

Using /Zc:dllexportInlines- is "half ABI incompatible". If it's used to build a DLL, inline members will no longer be exported, so any code using the DLL must use the same flag to not dllimport those members. However, the reverse scenario generally works: a DLL compiled without the flag (such as a system DLL built with MSVC) can be referenced from code that uses the flag, meaning that the referencing code will use the inline definitions instead of importing them from the DLL.

Like -fvisibility-inlines-hidden, /Zc:dllexportInlines- breaks the C++ language guarantee that (even an inline) function has a unique address within the program. When using these flags, an inline function will have a different address when used inside the library and outside.

Also, these flags can lead to link errors when inline functions, which would normally be dllimported, refer to internal symbols of a DLL:

void internal();

struct __declspec(dllimport) S {
  void foo() { internal(); }
}

Normally, references to S::foo() would use the definition in the DLL, which also contains the definition of internal(), but when using /Zc:dllexportInlines-, the inline definition of S::foo() is used directly, resulting in a link error since no definition of internal() can be found.

Even worse, if there is an inline definition of internal() containing a static local variable, the program will now refer to a different instance of that variable than in the DLL:

inline int internal() { static int x; return x++; }

struct __declspec(dllimport) S {
  int foo() { return internal(); }
}

This could lead to very subtle bugs. However, since Chrome already uses -fvisibility-inlines-hidden, which has the same potential problem, we believe this is not a common issue.

Summary

/Zc:dllexportInlines- is like -fvisibility-inlines-hidden for DLLs and significantly reduces build times. We're excited that using Clang on Windows allows us to benefit from new features like this.

More information

For more information, see the User's Manual for /Zc:dllexportInlines-.

The flag was added in Clang r346069, which will be part of the Clang 8 release expected in March 2019. It's also available in the Windows Snapshot Build.

Acknowledgements

/Zc:dllexportInlines- was implemented by Takuto Ikuta based on a prototype by Nico Weber.

Tuesday, September 25, 2018

Integration of libc++ and OpenMP packages into llvm-toolchain

A bit more than a year ago, we gave an update about recent changes in apt.llvm.org. Since then, we noticed an important increase of the usage of the service. Just last month, we saw more than 16.5TB of data being transferred from our CDN.
Thanks to the Google Summer of Code 2018, and after number of requests, we decided to focus our energy to bring new great projects from the LLVM ecosystems into apt.llvm.org.

Starting from version 7, libc++, libc++abi and OpenMP packages are available into the llvm-toolchain packages. This means that, just like clang, lldb or lldb, libc++, libc++abi and OpenMP packages are also built, tested and shipped on https://apt.llvm.org/.

The integration focuses to preserve the current usage of these libraries. The newly merged packages have adopted the llvm-toolchain versioning:

libc++ packages
  • libc++1-7
  • libc++-7-dev
libc++abi packages
  • libc++abi1-7
  • libc++abi-7-dev
OpenMP packages
  • libomp5-7
  • libomp-7-dev
  • libomp-7-doc
This packages are built twice a day for trunk. For version 7, only when new changes happen in the SVN branches.
Integration of libc++* packages

Both libc++ and libc++abi packages are built at same time using the clang built during the process. The existing libc++ and libc++abi packages present in Debian and Ubuntu repositories will not be affected (they will be removed at some point). Newly integrated libcxx* packages are not co-installable with them.

Symlinks have been provided from the original locations to keep the library usage same.

Example:  /usr/lib/x86_64-linux-gnu/libc++.so.1.0 -> /usr/lib/llvm-7/lib/libc++.so.1.0

The usage of the libc++ remains super easy:
Usage:
$ clang++-7 -std=c++11 -stdlib=libc++ foo.cpp
$ ldd ./a.out|grep libc++
  libc++.so.1 => /usr/lib/x86_64-linux-gnu/libc++.so.1 (0x00007f62a1a90000)
  libc++abi.so.1 => /usr/lib/x86_64-linux-gnu/libc++abi.so.1 (0x00007f62a1a59000)

In order to test new developments in libc++, we are also building the experimental features.
For example, the following command will work out of the box:

$ clang++-7 -std=c++17 -stdlib=libc++ foo.cpp -lc++experimental -lc++fs

Integration of OpenMP packages

While OpenMP packages have been present in the Debian and Ubuntu archives for a while, only a single version of the package was available.

For now, the newly integrated packages creates a symlink from /usr/lib/libomp.so.5 to /usr/lib/llvm-7/lib/libomp.so.5 keeping the current usage same and making them non co-installable.

It can be used with clang through -fopenmp flag:
$ clang -fopenmp foo.c

The dependency packages providing the default libc++* and OpenMP package are also integrated in llvm-defaults. This means that the following command will install all these new packages at the current version:
$ apt-get install libc++-dev libc++abi-dev libomp-dev

LLVM 7 => 8 transition

In parallel of the libc++ and OpenMP work, https://apt.llvm.org/ has been updated to reflect the branching of 7 from the trunk branches.
Therefore, we have currently on the platform:

Stable
6.0
Qualification
7
Development
8


Please note that, from version 7, the packages and libraries are called 7 (and not 7.0).
For the rational and implementation, see https://reviews.llvm.org/D41869 & https://reviews.llvm.org/D41808.

Stable packages of LLVM toolchain are already officially available in Debian Buster and in Ubuntu Cosmic.

Cosmic support

In order to make sure that the LLVM toolchain does not have too many regressions with this new version, we also support the next Ubuntu version, 18.10, aka Cosmic.

A Note on coinstallability

We tried to make them coinstallable, in the resulting packages we had no control over the libraries used during the runtime. This could lead to many unforeseen issues. Keeping these in mind we settled to keep them conflicting with other versions.

Future work
  • Code coverage build fails for newly integrated packages
  • Move to a 2 phases build to generate clang binary using clang

Sources of the project are available on the gitlab instance of Debian: https://salsa.debian.org/pkg-llvm-team/llvm-toolchain/tree/7


Reshabh Sharma & Sylvestre Ledru