LLVM Project Blog

LLVM Project News and Details from the Trenches

Thursday, September 19, 2019

Closing the gap: cross-language LTO between Rust and C/C++

Link time optimization (LTO) is LLVM's way of implementing whole-program optimization. Cross-language LTO is a new feature in the Rust compiler that enables LLVM's link time optimization to be performed across a mixed C/C++/Rust codebase. It is also a feature that beautifully combines two respective strengths of the Rust programming language and the LLVM compiler platform:
  • Rust, with its lack of a language runtime and its low-level reach, has an almost unique ability to seamlessly integrate with an existing C/C++ codebase, and
  • LLVM, as a language agnostic foundation, provides a common ground where the source language a particular piece of code was written in does not matter anymore.
So, what does cross-language LTO do? There are two answers to that:
  • From a technical perspective it allows for codebases to be optimized without regard for implementation language boundaries, making it possible for important optimizations, such as function inlining, to be performed across individual compilation units even if, for example, one of the compilation units is written in Rust while the other is written in C++.
  • From a psychological perspective, which arguably is just as important, it helps to alleviate the nagging feeling of inefficiency that many performance conscious developers might have when working on a piece of software that jumps back and forth a lot between functions implemented in different source languages.
Because Firefox is a large, performance sensitive codebase with substantial parts written in Rust, cross-language LTO has been a long-time favorite wish list item among Firefox developers. As a consequence, we at Mozilla's Low Level Tools team took it upon ourselves to implement it in the Rust compiler.

To explain how cross-language LTO works it is useful to take a step back and review how traditional compilation and "regular" link time optimization work in the LLVM world.


Background - A bird's eye view of the LLVM compilation pipeline

Clang and the Rust compiler both follow a similar compilation workflow which, to some degree, is prescribed by LLVM:
  1. The compiler front-end generates an LLVM bitcode module (.bc) for each compilation unit. In C and C++ each source file will result in a single compilation unit. In Rust each crate is translated into at least one compilation unit.
    
        .c --clang--> .bc
    
        .c --clang--> .bc
    
    
        .rs --+
              |
        .rs --+--rustc--> .bc
              |
        .rs --+
    
    
  2. In the next step, LLVM's optimization pipeline will optimize each LLVM module in isolation:
    
        .c --clang--> .bc --LLVM--> .bc (opt)
    
        .c --clang--> .bc --LLVM--> .bc (opt)
    
    
        .rs --+
              |
        .rs --+--rustc--> .bc --LLVM--> .bc (opt)
              |
        .rs --+
    
    
  3. LLVM then lowers each module into machine code so that we get one object file per module:
    
        .c --clang--> .bc --LLVM--> .bc (opt) --LLVM--> .o
    
        .c --clang--> .bc --LLVM--> .bc (opt) --LLVM--> .o
    
    
        .rs --+
              |
        .rs --+--rustc--> .bc --LLVM--> .bc (opt) --LLVM--> .o
              |
        .rs --+
    
    
  4. Finally, the linker will take the set of object files and link them together into a binary:
    
        .c --clang--> .bc --LLVM--> .bc (opt) --LLVM--> .o ------+
                                                                 |
        .c --clang--> .bc --LLVM--> .bc (opt) --LLVM--> .o ------+
                                                                 |
                                                                 +--ld--> bin
        .rs --+                                                  |
              |                                                  |
        .rs --+--rustc--> .bc --LLVM--> .bc (opt) --LLVM--> .o --+
              |
        .rs --+
    
    
This is the regular compilation workflow if no kind of LTO is involved. As you can see, each compilation unit is optimized in isolation. The optimizer does not know the definition of functions inside of other compilation units and thus cannot inline them or make other kinds of decisions based on what they actually do. To enable inlining and optimizations to happen across compilation unit boundaries, LLVM supports link time optimization.


Link time optimization in LLVM

The basic principle behind LTO is that some of LLVM's optimization passes are pushed back to the linking stage. Why the linking stage? Because that is the point in the pipeline where the entire program (i.e. the whole set of compilation units) is available at once and thus optimizations across compilation unit boundaries become possible. Performing LLVM work at the linking stage is facilitated via a plugin to the linker.

Here is how LTO is concretely implemented:
  • the compiler translates each compilation unit into LLVM bitcode (i.e. it skips lowering to machine code),
     
  • the linker, via the LLVM linker plugin, knows how to read LLVM bitcode modules like regular object files, and
     
  • the linker, again via the LLVM linker plugin, merges all bitcode modules it encounters and then runs LLVM optimization passes before doing the actual linking.
With these capabilities in place a new compilation workflow with LTO enabled for C++ code looks like this:

    .c --clang--> .bc --LLVM--> .bc (opt) ------------------+ - - +
                                                            |     |
    .c --clang--> .bc --LLVM--> .bc (opt) ------------------+ - - +
                                                            |     |
                                                            +-ld+LLVM--> bin
    .rs --+                                                 |
          |                                                 |
    .rs --+--rustc--> .bc --LLVM--> .bc (opt) --LLVM--> .o -+
          |
    .rs --+

As you can see our Rust code is still compiled to a regular object file. Therefore, the Rust code is opaque to the optimization taking place at link time. Yet, looking at the diagram it seems like that shouldn't be too hard to change, right?


Cross-language link time optimization

Implementing cross-language LTO is conceptually simple because the feature is built on the shoulders of giants. Since the Rust compiler uses LLVM all the important building blocks are readily available. The final diagram looks very much as you would expect, with rustc emitting optimized LLVM bitcode and the LLVM linker plugin incorporating that into the LTO process with the rest of the modules:

    .c --clang--> .bc --LLVM--> .bc (opt) ---------+
                                                   |
    .c --clang--> .bc --LLVM--> .bc (opt) ---------+
                                                   |
                                                   +-ld+LLVM--> bin
    .rs --+                                        |
          |                                        |
    .rs --+--rustc--> .bc --LLVM--> .bc (opt) -----+
          |
    .rs --+

Nonetheless, achieving a production-ready implementation still turned out to be a significant time investment. After figuring out how everything fits together, the main challenge was to get the Rust compiler to produce LLVM bitcode that was compatible with both the bitcode that Clang produces and with what the linker plugin would accept. Some of the issues we ran into where:
  • The Rust compiler and Clang are both based on LLVM but they might be using different versions of LLVM. This was further complicated by the fact that Rust's LLVM version often does not match a specific LLVM release, but can be an arbitrary revision from LLVM's repository. We learned that all LLVM versions involved really have to be a close match in order for things to work out. The Rust compiler's documentation now offers a compatibility table for the various versions of Rust and Clang.
     
  • The Rust compiler by default performs a special form of LTO, called ThinLTO, on all compilation units of the same crate before passing them on to the linker. We quickly learned, however, that the LLVM linker plugin crashes with a segmentation fault when trying to perform another round of ThinLTO on a module that had already gone through the process. No problem, we thought and instructed the Rust compiler to disable its own ThinLTO pass when compiling for the cross-language case and indeed everything was fine -- until the segmentation faults mysteriously returned a few weeks later even though ThinLTO was still disabled.

    We noticed that the problem only occurred in a specific, presumably innocent setting: again two passes of LTO needed to happen, this time the first was a regular LTO pass within rustc and the output of that would then be fed into ThinLTO within the linker plugin. This setup, although computationally expensive, was desirable because it produced faster code and allowed for better dead-code elimination on the Rust side. And in theory it should have worked just fine. Yet somehow rustc produced symbol names that had apparently gone through ThinLTO's mangling even though we checked time and again that ThinLTO was disabled for Rust. We were beginning to seriously question our understanding of LLVM's inner workings as the problem persisted while we slowly ran out of ideas on how to debug this further.

    You can picture the proverbial lightbulb appearing over our heads when we figured out that Rust's pre-compiled standard library would still have ThinLTO enabled, no matter the compiler settings we were using for our tests. The standard library, including its LLVM bitcode representation, is compiled as part of Rust's binary distribution so it is always compiled with the settings from Rust's build servers. Our local full LTO pass within rustc would then pull this troublesome bitcode into the output module which in turn would make the linker plugin crash again. Since then ThinLTO is turned off for libstd by default.
     
  • After the above fixes, we succeeded in compiling the entirety of Firefox with cross-language LTO enabled. Unfortunately, we discovered that no actual cross-language optimizations were happening. Both Clang and rustc were producing LLVM bitcode and LLD produced functioning Firefox binaries, but when looking at the machine code, not even trivial functions were being inlined across language boundaries. After days of debugging (and unfortunately without being aware of LLVM's optimization remarks at the time) it turned out that Clang was emitting a target-cpu attribute on all functions while rustc didn't, which made LLVM reject inlining opportunities.

    In order to prevent the feature from silently regressing for similar reasons in the future we put quite a bit of effort into extending the Rust compiler's testing framework and CI. It is now able to compile and run a compatible version of Clang and uses that to perform end-to-end tests of cross-language LTO, making sure that small functions will indeed get inlined across language boundaries.
This list could still go on for a while, with each additional target platform holding new surprises to be dealt with. We had to progress carefully by putting in regression tests at every step in order to keep the many moving parts in check. At this point, however, we feel confident in the underlying implementation, with Firefox providing a large, complex, multi-platform test case where things have been working well for several months now.


Using cross-language LTO: a minimal example

The exact build tool invocations differ depending on whether it is rustc or Clang performing the final linking step, and whether Rust code is compiled via Cargo or via rustc directly. Rust's compiler documentation describes the various cases. The simplest of them, where rustc directly produces a static library and Clang does the linking, looks as follows:

    # Compile the Rust static library, called "xyz"
    rustc --crate-type=staticlib -O -C linker-plugin-lto -o libxyz.a lib.rs

    # Compile the C code with "-flto"
    clang -flto -c -O2 main.c

    # Link everything
    clang -flto -O2 main.o -L . -lxyz

The -C linker-plugin-lto option instructs the Rust compiler to emit LLVM bitcode which then can be used for both "full" and "thin" LTO. Getting things set up for the first time can be quite cumbersome because, as already mentioned, all compilers and the linker involved must be compatible versions. In theory, most major linkers will work; in practice LLD seems to be the most reliable one on Linux, with Gold in second place and the BFD linker needing to be at least version 2.32. On Windows and macOS the only linkers properly tested are LLD and ld64 respectively. For ld64 Firefox uses a patched version because the LLVM bitcode that rustc produces likes to trigger a pre-existing issue this linker has with ThinLTO.


Conclusion

Cross-language LTO has been enabled for Firefox release builds on Windows, macOS, and Linux for several months at this point and we at Mozilla's Low Level Tools team are pleased with how it turned out. While we still need to work on making the initial setup of the feature easier, it already enabled removing duplicated logic from Rust components in Firefox because now code can simply call into the equivalent C++ implementation and rely on those calls to be inlined. Having cross-language LTO in place and continuously tested will definitely lower the psychological bar for implementing new components in Rust, even if they are tightly integrated with existing C++ code.

Cross-language LTO is available in the Rust compiler since version 1.34 and works together with Clang 8. Feel free to give it a try and report any problems in the Rust bug tracker.


Acknowledgments

I'd like to thank my Low Level Tools team colleagues David Major, Eric Rahm, and Nathan Froyd for their invaluable help and encouragement, and I'd like to thank Alex Crichton for his tireless reviews on the Rust side.

Wednesday, September 4, 2019

Announcing the program for the 2019 LLVM Developers' Meeting - Bay Area

Announcing the program for the 2019 LLVM Developers' Meeting in San Jose, CA! This program is the largest we have ever had and has over 11 tutorials, 29 technical talks, 24 lightning talks, 2 panels, 3 birds of a feather, 14 posters, and 4 SRC talks. Be sure to register to attend this event and hear some of these great talks.

Keynotes
Technical Talks
Tutorials
Student Research Competition
Panels
Birds of a Feather
Lightning Talks
Posters


Thursday, August 1, 2019

The LLVM Project is Moving to GitHub

The LLVM Project is Moving to GitHub

After several years of discussion and planning, the LLVM project is getting ready to complete the migration of its source code from SVN to GitHub!  At last year’s developer meeting, many interested community members convened at a series of round tables to lay out a plan to completely migrate LLVM source code from SVN to GitHub by the 2019 U.S. Developer’s Meeting.  We have made great progress over the last nine months and are on track to complete the migration on October 21, 2019.

As part of the migration to GitHub we are maintaining the ‘monorepo’ layout which currently exists in SVN.  This means that there will be a single git repository with one top-level directory for each LLVM sub-project.  This will be a change for those of you who are already using git and accessing the code via the official sub-project git mirrors (e.g. https://git.llvm.org/git/llvm.git) where each sub-project has its own repository.

One of the first questions people ask when they hear about the GitHub plans is: Will the project start using GitHub pull requests and issues?  And the answer to that for now is: no. The current transition plan focuses on migrating only the source code. We will continue to use Phabricator for code reviews, and bugzilla for issue tracking after the migration is complete.  We have not ruled out using pull requests and issues at some point in the future, but these are discussions we still need to have as a community.

The most important takeaway from this post, though, is that if you consume the LLVM source code in any way, you need to take action now to migrate your workflows.  If you manage any continuous integration or other systems that need read-only access to the LLVM source code, you should begin pulling from the official GitHub repository instead of SVN or the current sub-project mirrors.  If you are a developer that needs to commit code, please use the git-llvm script for committing changes.

We have created a status page, if you want to track the current progress of the migration.  We will be posting updates to this page as we get closer to the completion date.  If you run into issues of any kind with GitHub you can file a bug in bugzilla and mark it as a blocker of the github tracking bug.

This entire process has been a large community effort.  Many many people have put in time discussing, planning, and implementing all the steps required to make this happen.  Thank you to everyone who has been involved and let’s keep working to make this migration a success.

Blog post by Tom Stellard.

Friday, May 24, 2019

LLVM and Google Season of Docs

The LLVM Project is pleased to announce that we have been selected to participate in Google’s Season of Docs!

Our project idea list may be found here:

From now until May 29th, technical writers are encouraged to review the proposed project ideas and to ask any questions you have on our gsdocs@llvm.org mailing list. Other documentation ideas are allowed, but we can not guarantee that a mentor will be found for the project. You are encouraged to discuss new ideas on the mailing list prior to submitting your technical writer application, in order to start the process of finding a mentor.

When submitting your application for an LLVM documentation project, please consider the following:

  • Include Prior Experience: Do you have prior technical writing experience? We want to see this! Considering including links to prior documentation or attachments of documentation you have written. If you can’t include a link to the actual documentation, please describe in detail what you wrote, who the audience was, and any other important information that can help us gauge your prior experience. Please also include any experience with Sphinx or other documentation generation tools.
  • Take your time writing the proposal: We will be looking closely at your application to see how well it is written. Take the time to proofread and know who your audience is.
  • Propose your plan for our documentation project: We have given a rough idea of what changes or topics we envision for the documentation, but this is just a start. We expect you to take the idea and expand or modify it as you see fit. Review our existing documentation and see how it would compliment or replace other pieces. Optionally include an overview or document design or layout plan in your application.
  • Become familiar with our project: We don’t expect you to become a compiler expert, but we do expect you read up on our project to learn a bit about LLVM.

We look forward to working with some fabulous technical writers and improving our documentation. Again, please email gsdocs@llvm.org with your questions.

Friday, March 15, 2019

LLVM Numerics Blog

Keywords: Numerics, Clang, LLVM-IR, : 2019 LLVM Developers' Meeting, LLVMDevMtg.

The goal of this blog post is to start a discussion about numerics in LLVM – where we are, recent work and things that remain to be done.  There will be an informal discussion on numerics at the 2019 EuroLLVM conference next month. One purpose of this blog post is to refresh everyone's memory on where we are on the topic of numerics to restart the discussion.

In the last year or two there has been a push to allow fine-grained decisions on which optimizations are legitimate for any given piece of IR.  In earlier days there were two main modes of operation: fast-math and precise-math.  When operating under the rules of precise-math, defined by IEEE-754, a significant number of potential optimizations on sequences of arithmetic instructions are not allowed because they could lead to violations of the standard.  

For example: 

The Reassociation optimization pass is generally not allowed under precise code generation as it can change the order of operations altering the creation of NaN and Inf values propagated at the expression level as well as altering precision.  

Precise code generation is often overly restrictive, so an alternative fast-math mode is commonly used where all possible optimizations are allowed, acknowledging that this impacts the precision of results and possibly IEEE compliant behavior as well.  In LLVM, this can be enabled by setting the unsafe-math flag at the module level, or passing the -funsafe-math-optimizations to clang which then sets flags on the IR it generates.  Within this context the compiler often generates shorter sequences of instructions to compute results, and depending on the context this may be acceptable.  Fast-math is often used in computations where loss of precision is acceptable.  For example when computing the color of a pixel, even relatively low precision is likely to far exceed the perception abilities of the eye, making shorter instruction sequences an attractive trade-off.  In long-running simulations of physical events however loss of precision can mean that the simulation drifts from reality making the trade-off unacceptable.

Several years ago LLVM IR instructions gained the ability of being annotated with flags that can drive optimizations with more granularity than an all-or-nothing decision at the module level.  The IR flags in question are: 

nnan, ninf, nsz, arcp, contract, afn, reassoc, nsw, nuw, exact.  

Their exact meaning is described in the LLVM Language Reference Manual.   When all the flags are are enabled, we get the current fast-math behavior.  When these flags are disabled, we get precise math behavior.  There are also several options available between these two models that may be attractive to some applications.  In the past year several members of the LLVM community worked on making IR optimizations passes aware of these flags.  When the unsafe-math module flag is not set these optimization passes will work by examining individual flags, allowing fine-grained selection of the optimizations that can be enabled on specific instruction sequences.  This allows vendors/implementors to mix fast and precise computations in the same module, aggressively optimizing some instruction sequences but not others.

We now have good coverage of IR passes in the LLVM codebase, in particular in the following areas:
* Intrinsic and libcall management
* Instruction Combining and Simplification
* Instruction definition
* SDNode definition
* GlobalIsel Combining and code generation
* Selection DAG code generation
* DAG Combining
* Machine Instruction definition
* IR Builders (SDNode, Instruction, MachineInstr)
* CSE tracking
* Reassociation
* Bitcode

There are still some areas that need to be reworked for modularity, including vendor specific back-end passes.  

The following are some of the contributions mentioned above from the last 2 years of open source development:

https://reviews.llvm.org/D45781 : MachineInst support mapping SDNode fast math flags for support in Back End code generation 
https://reviews.llvm.org/D46322 : [SelectionDAG] propagate 'afn' and 'reassoc' from IR fast-math-flags
https://reviews.llvm.org/D45710 : Fast Math Flag mapping into SDNode
https://reviews.llvm.org/D46854 : [DAG] propagate FMF for all FPMathOperators
https://reviews.llvm.org/D48180 : updating isNegatibleForFree and GetNegatedExpression with fmf for fadd
https://reviews.llvm.org/D48057: easing the constraint for isNegatibleForFree and GetNegatedExpression
https://reviews.llvm.org/D47954 : Utilize new SDNode flag functionality to expand current support for fdiv
https://reviews.llvm.org/D47918 : Utilize new SDNode flag functionality to expand current support for fma
https://reviews.llvm.org/D47909 : Utilize new SDNode flag functionality to expand current support for fadd
https://reviews.llvm.org/D47910 : Utilize new SDNode flag functionality to expand current support for fsub
https://reviews.llvm.org/D47911 : Utilize new SDNode flag functionality to expand current support for fmul
https://reviews.llvm.org/D48289 : refactor of visitFADD for AllowNewConst cases
https://reviews.llvm.org/D47388 : propagate fast math flags via IR on fma and sub expressions
https://reviews.llvm.org/D47389 : guard fneg with fmf sub flags
https://reviews.llvm.org/D47026 : fold FP binops with undef operands to NaN
https://reviews.llvm.org/D47749 : guard fsqrt with fmf sub flags
https://reviews.llvm.org/D46447 : Mapping SDNode flags to MachineInstr flags
https://reviews.llvm.org/D50195 : extend folding fsub/fadd to fneg for FMF
https://reviews.llvm.org/rL339197 : [NFC] adding tests for Y - (X + Y) --> -X
https://reviews.llvm.org/D50417 : [InstCombine] fold fneg into constant operand of fmul/fdiv
https://reviews.llvm.org/rL339357 : extend folding fsub/fadd to fneg for FMF
https://reviews.llvm.org/D50996 : extend binop folds for selects to include true and false binops flag intersection
https://reviews.llvm.org/rL339938 : add a missed case for binary op FMF propagation under select folds
https://reviews.llvm.org/D51145 : Guard FMF context by excluding some FP operators from FPMathOperator
https://reviews.llvm.org/rL341138 : adding initial intersect test for Node to Instruction association
https://reviews.llvm.org/rL341565 : in preparation for adding nsw, nuw and exact as flags to MI
https://reviews.llvm.org/D51738 : add IR flags to MI
https://reviews.llvm.org/D52006 : Copy utilities updated and added for MI flags
https://reviews.llvm.org/rL342598 : add new flags to a DebugInfo lit test
https://reviews.llvm.org/D53874 : [InstSimplify] fold 'fcmp nnan oge X, 0.0' when X is not negative
https://reviews.llvm.org/D55668 : Add FMF management to common fp intrinsics in GlobalIsel
https://reviews.llvm.org/rL352396 : [NFC] TLI query with default(on) behavior wrt DAG combines for fmin/fmax target…
https://reviews.llvm.org/rL316753 (Fold fma (fneg x), K, y -> fma x, -K, y)
https://reviews.llvm.org/D57630 : Move IR flag handling directly into builder calls for cases translated from Instructions in GlobalIsel
https://reviews.llvm.org/rL332756 : adding baseline fp fold tests for unsafe on and off
https://reviews.llvm.org/rL334035 : NFC: adding baseline fneg case for fmf
https://reviews.llvm.org/rL325832 : [InstrTypes] add frem and fneg with FMF creators
https://reviews.llvm.org/D41342 : [InstCombine] Missed optimization in math expression: simplify calls exp functions
https://reviews.llvm.org/D52087 : [IRBuilder] Fixup CreateIntrinsic to allow specifying Types to Mangle.
https://reviews.llvm.org/D52075 : [InstCombine] Support (sub (sext x), (sext y)) --> (sext (sub x, y)) and (sub (zext x), (zext y)) --> (zext (sub x, y))
https://reviews.llvm.org/rL338059 : [InstCombine] fold udiv with common factor from muls with nuw
Commit: e0ab896a84be9e7beb59874b30f3ac51ba14d025 : [InstCombine] allow more fmul folds with ‘reassoc'
Commit: 3e5c120fbac7bdd4b0ff0a3252344ce66d5633f9 : [InstCombine] distribute fmul over fadd/fsub
https://reviews.llvm.org/D37427 : [InstCombine] canonicalize fcmp ord/uno with constants to null constant
https://reviews.llvm.org/D40130 : [InstSimplify] fold and/or of fcmp ord/uno when operand is known nnan
https://reviews.llvm.org/D40150 : [LibCallSimplifier] fix pow(x, 0.5) -> sqrt() transforms
https://reviews.llvm.org/D39642 : [ValueTracking] readnone is a requirement for converting sqrt to llvm.sqrt; nnan is not
https://reviews.llvm.org/D39304 : [IR] redefine 'reassoc' fast-math-flag and add 'trans' fast-math-flag
https://reviews.llvm.org/D41333 : [ValueTracking] ignore FP signed-zero when detecting a casted-to-integer fmin/fmax pattern
https://reviews.llvm.org/D5584 : Optimize square root squared (PR21126)
https://reviews.llvm.org/D42385 : [InstSimplify] (X * Y) / Y --> X for relaxed floating-point ops
https://reviews.llvm.org/D43160 : [InstSimplify] allow exp/log simplifications with only 'reassoc’ FMF
https://reviews.llvm.org/D43398 : [InstCombine] allow fdiv folds with less than fully 'fast’ ops
https://reviews.llvm.org/D44308 : [ConstantFold] fp_binop AnyConstant, undef --> NaN
https://reviews.llvm.org/D43765 : [InstSimplify] loosen FMF for sqrt(X) * sqrt(X) --> X
https://reviews.llvm.org/D44521 : [InstSimplify] fp_binop X, NaN --> NaN
https://reviews.llvm.org/D47202 : [CodeGen] use nsw negation for abs
https://reviews.llvm.org/D48085 : [DAGCombiner] restrict (float)((int) f) --> ftrunc with no-signed-zeros
https://reviews.llvm.org/D48401 : [InstCombine] fold vector select of binops with constant ops to 1 binop (PR37806)
https://reviews.llvm.org/D39669 : DAG: Preserve nuw when reassociating adds
https://reviews.llvm.org/D39417 : InstCombine: Preserve nuw when reassociating nuw ops
https://reviews.llvm.org/D51753 : [DAGCombiner] try to convert pow(x, 1/3) to cbrt(x)
https://reviews.llvm.org/D51630 : [DAGCombiner] try to convert pow(x, 0.25) to sqrt(sqrt(x))
https://reviews.llvm.org/D53650 : [FPEnv] Last BinaryOperator::isFNeg(...) to m_FNeg(...) changes
https://reviews.llvm.org/D54001 : [ValueTracking] determine sign of 0.0 from select when matching min/max FP
https://reviews.llvm.org/D51942 : [InstCombine] Fold (C/x)>0 into x>0 if possible
https://llvm.org/svn/llvm-project/llvm/trunk@348016 : [SelectionDAG] fold FP binops with 2 undef operands to undef
http://llvm.org/viewvc/llvm-project?view=revision&revision=346242 : propagate fast-math-flags when folding fcmp+fpext, part 2
http://llvm.org/viewvc/llvm-project?view=revision&revision=346240 : propagate fast-math-flags when folding fcmp+fpext
http://llvm.org/viewvc/llvm-project?view=revision&revision=346238 : [InstCombine] propagate fast-math-flags when folding fcmp+fneg, part 2
http://llvm.org/viewvc/llvm-project?view=revision&revision=346169 : [InstSimplify] fold select (fcmp X, Y), X, Y
http://llvm.org/viewvc/llvm-project?view=revision&revision=346234 : propagate fast-math-flags when folding fcmp+fneg
http://llvm.org/viewvc/llvm-project?view=revision&revision=346147 : [InstCombine] canonicalize -0.0 to +0.0 in fcmp
http://llvm.org/viewvc/llvm-project?view=revision&revision=346143 : [InstCombine] loosen FP 0.0 constraint for fcmp+select substitution
http://llvm.org/viewvc/llvm-project?view=revision&revision=345734 : [InstCombine] refactor fabs+fcmp fold; NFC
http://llvm.org/viewvc/llvm-project?view=revision&revision=345728 : [InstSimplify] fold 'fcmp nnan ult X, 0.0' when X is not negative
http://llvm.org/viewvc/llvm-project?view=revision&revision=345727 : [InstCombine] add assertion that InstSimplify has folded a fabs+fcmp; NFC


While multiple people have been working on finer-grained control over fast-math optimizations and other relaxed numerics modes, there has also been some initial progress on adding support for more constrained numerics models. There has been considerable progress towards adding and enabling constrained floating-point intrinsics to capture FENV_ACCESS ON and similar semantic models.

These experimental constrained intrinsics prohibit certain transforms that are not safe if the default floating-point environment is not in effect. Historically, LLVM has in practice basically “split the difference” with regard to such transforms; they haven’t been explicitly disallowed, as LLVM doesn’t model the floating-point environment, but they have been disabled when they caused trouble for tests or software projects. The absence of a formal model for licensing these transforms constrains our ability to enable them. Bringing language and backend support for constrained intrinsics across the finish line will allow us to include transforms that we disable as a matter of practicality today, and allow us to give developers an easy escape valve (in the form of FENV_ACCESS ON and similar language controls) when they need more precise control, rather than an ad-hoc set of flags to pass to the driver.

We should discuss these new intrinsics to make sure that they can capture the right models for all the languages that LLVM supports.


Here are some possible discussion items:

  • Should specialization be applied at the call level for edges in a call graph where the caller has special context to extend into the callee wrt to flags?
  • Should the inliner apply something similar to calls that meet inlining criteria?
  • What other part(s) of the compiler could make use of IR flags that are currently not covered?
  • What work needs to be done regarding code debt wrt current areas of implementation.