The LLVM Project Blog

LLVM Project News and Details from the Trenches

LLVM Fortran Levels Up: Goodbye flang-new, Hello flang!

LLVM has included a Fortran compiler “Flang” since LLVM 11 in late 2020. However, until recently the Flang binary was not flang (like clang) but instead flang-new.

LLVM 20 ends the era of flang-new. The community has decided that Flang is worthy of a new name.

The “new” name? You guessed it, flang.

A simple change that represents a major milestone for Flang.

This article will cover the almost 10 year journey of Flang. The first concepts, multiple rewrites, the adoption of LLVM’s Multi Level Intermediate Representation (MLIR) and Flang entering the LLVM Project.

If you want to try flang right now, you can download it or try it in your browser using Compiler Explorer.

Why Fortran?

Fortran was first created in the 1950s, and the name came from “Formula Translation”. Fortran focused on the mathematics use case and freed programmers from writing assembly code that could only run on specific machines.

Instead they could write code that looked like a formula. You expect this today but for the time it was a revolution. This feature led to heavy use in scientific computing: weather modelling, fluid dynamics and computational chemistry, just to name a few.

Whilst many alternative programming languages have come and gone, it [Fortran] has regained its popularity for writing high performance codes. Indeed, over 80% of the applications running on ARCHER2, a 750,000 core Cray EX which is the UK national supercomputer, are written in Fortran.

Fortran has had a resurgence in recent years, gaining a package manager, an unofficial standard library and LFortran, a compiler that supports interactive programming (LFortran also uses LLVM).

For the full history of Fortran, IBM has an excellent article on the topic and I encourage you to look at the “Programmer’s Primer for Fortran” if you want to see the early form of Fortran.

If you want to learn the language, fortran-lang.org is a great place to start.

Why Would You Make Another Fortran Compiler?

There are many Fortran compilers. Some are vendor specific such as the Intel Fortran Compiler or NVIDIA’s HPC compilers. Then there are open source options like GFortran, which supports many platforms.

Why build one more?

The two partners in the early days of Flang were the US National Labs and NVIDIA.

For Pat McCormick (Flang project lead at Los Alamos National Laboratory) preserving the utility of Fortran code was imperative:

These [Fortran] codes represent an essential capability that supports many elements of our [The United States’] scientific mission and will continue to do so for the foreseeable future. A fundamental risk facing these codes is the absence of a long-term, non-proprietary support path for Fortran.

GFortran might seem to counter that statement, but remember that a single project is a single point of failures, incompatibilities and disagreements. Having multiple implementations reduces that risk.

NVIDIA’s Gary Klimowicz laid out their goals for Flang in a presentation to FortranCon in 2020:

  • Use a permissive license like that of LLVM, which is more palatable to commercial users and contributors.
  • Develop an active community of Fortran compiler developers that includes companies and institutions.
  • Support Fortran tool development by basing Flang on existing LLVM frameworks.
  • Support Fortran language experimentation for future language standards proposals.

Intentions echoed by Pat McCormick:

The overarching goal was to establish an open-source, modern implementation and simultaneously grow a community that spanned industry, academia, and federal agencies at both the national and international levels.

Fortran as a language also benefits from having many implementations. For C++ language features, it is common to implement them on top of Clang and GCC, to prove the feature is viable and get feedback.

Implementing the feature multiple times in different compilers uncovers assumptions that may be a problem for certain compilers, or certain groups of compiler users.

In the same way, Flang and GFortran can provide that diversity.

However, even when features are standardised, standards can be ambiguous and implementations do make mistakes. A new compiler is a chance to uncover these.

Jeff Hammond (NVIDIA) is very familiar with this, having tested Flang with many existing applications. They had this to say on the motivations for Flang and how users have reacted to it:

The Fortran language has changed quite a bit over the past 30 years. Modern Fortran deserves a modern compiler ecosystem, that’s not only capable of compiling all the old codes and all the code written for the current standard, but also supports innovation in the future.

Because it’s a huge amount of work to build a feature-complete modern Fortran compiler, it’s useful to leverage the resources of the entire LLVM community for this effort. NVIDIA and ARM play leading roles right now, with important contributions from IBM, Fujitsu and LBNL [Lawrence Berkeley National Laboratory], e.g. related to test suites and coarrays. We hope to see the developer community grow in the future.

Another benefit from the LLVM Fortran compiler is that users are more likely to invest in supporting a new compiler when it has full language support and runs on all the platforms. A broad developer base is critical to support all the platforms.

What I have seen so far interacting with our Fortran users is that they are very excited about LLVM Flang and were willing to commit to supporting it in their build systems and CI systems, which has driven quality improvements in both the Flang compiler and the applications.

Like Clang did with C and C++ codes when it started to become popular, Flang is helping to identify bugs in Fortran code that weren’t noticed before, which is making the Fortran software ecosystem better.

PGI to LLVM: The Flang Timeline

The story of Flang really starts in 2015, but the Portland Group (PGI) collaborated with US National Labs prior to this. PGI would later become part of NVIDIA and be instrumental to the Flang project.

  • 1989 The Portland Group is formed. To provide C, Fortran 77 and C++ compilers for the Intel i860 market.
  • 1990 Intel bundles PGI compilers with its iPSC/860 supercomputer.
  • 1996 PGI works with Sandia National Laboratories to provide compilers for the Accelerated Strategic Computing Initiative (ASCI) Option Red supercomputer.
  • December 2000 PGI becomes a wholly owned subsidiary of STMicroElectronics.
  • August 2011 Away from PGI, Bill Wendling starts an LLVM based Fortran compiler called “Flang” (later known as “Fort”). Bill is joined by several collaborators a few months later.
  • July 2013 PGI is sold to NVIDIA.

In late 2015 there were the first signs of what would become “Classic Flang”. Though at the time it was just “Flang”, I will use “Classic Flang” here for clarity.

Development of what was to become “Fort” continued under the “Flang” name, completely separate from the Classic Flang project.

  • November 2015 NVIDIA joins the US Department of Energy Exascale Computing Project. Including a commitment to create an open source Fortran compiler.

    “The U.S. Department of Energy’s National Nuclear Security Administration and its three national labs [Los Alamos, Lawrence Livermore and Sandia] have reached an agreement with NVIDIA’s PGI division to adapt and open-source PGI’s Fortran frontend, and associated Fortran runtime library, for contribution to the LLVM project.”

    (this news is also the first appearance of Flang in an issue of LLVM Weekly)

  • May 2017 The first release of Classic Flang as a separate repository, outside of the LLVM Project. Composed of a PGI compiler frontend and a new backend that generates LLVM Intermediate Representation (LLVM IR).

  • August 2017 The Classic Flang project is announced officially (according to LLVM Weekly’s report, the original mailing list is offline).

During this time, plans were formed to propose moving Classic Flang into the LLVM Project.

  • December 2017 The original “Flang” is renamed to “Fort” so as not to compete with Classic Flang.

  • April 2018 Steve Scalpone (NVIDIA) announces at the European LLVM Developers’ Conference that the frontend of Classic Flang will be rewritten to address feedback from the LLVM community. This new front end became known as “F18”.

  • August 2018 Eric Schweitz (NVIDIA) begins work on what would become “Fortran Intermediate Representation”, otherwise known as “FIR”. This work would later become the fir-dev branch.

  • February 2019 Steve Scalpone proposes contributing F18 to the LLVM Project.

  • April 2019 F18 is approved for migration into the LLVM Project monorepo.

    At this point F18 was only the early parts of the compiler, it could not generate code (later fir-dev work addressed this). Despite that, it moved into flang/ in the monorepo, awaiting the completion of the rest of the work.

  • June 2019 Peter Waller (Arm) proposes adding a Fortran mode to the Clang compiler driver.

  • August 2019 The first appearance of the flang.sh driver wrapper script (more on this later).

  • December 2019 The plan for rewriting the F18 git history to fit into the LLVM project is announced. This effort was led by Arm, with Peter Waller going so far as to write a custom tool to rewrite the history of F18.

    Kiran Chandramohan (Arm) proposes an OpenMP dialect for MLIR, with the intention of using it in Flang (discussion continues on Discourse during the following January).

  • February 2020 The plan for improvements to F18 to meet the standards required for inclusion in the LLVM monorepo is announced by Richard Barton (Arm).

  • April 2020 Upstreaming of F18 into the LLVM monorepo is completed.

At this point what was in the LLVM monorepo was F18, the rewritten frontend of Classic Flang. Classic Flang remained unchanged, still using the PGI based frontend.

Around this time work started in the Classic Flang repo on the fir-dev branch that would enable code generation when using F18.

For the following events remember that Classic Flang was still in use. The Classic Flang binary is named flang, just like the folder F18 now occupies in the LLVM Project.

Note: Some LLVM changes referenced below will appear to have skipped an LLVM release. This is because they were done after the release branch was created, but before the first release from that branch was distributed.

  • April 2020 The first attempt at adding a new compiler driver for Flang is posted for review. It used the name flang-tmp. This change was later abandoned in favour of a different approach.

  • September 2020 Flang’s new compiler driver is added as an experimental option. This is the first appearance of the flang-new binary, instead of flang-tmp as proposed before.

    The name was intended as temporary, but not the driver.

    • Andrzej Warzyński (Arm, Flang Driver Maintainer)
  • October 2020 Flang is included in an LLVM release for the first time in LLVM 11.0.0. There is an f18 binary and the previously mentioned script flang.sh.

  • August 2021 flang-new is no longer experimental and replaces the previous Flang compiler driver binary f18.

  • October 2021 LLVM 13.0.0 is the first release to include a flang-new binary (alongside f18).

  • March 2022 LLVM 14.0.0 releases, with flang-new replacing f18 as the Flang compiler driver.

  • April 2022 NVIDIA ceases development of the fir-dev branch in the Classic Flang project. Upstreaming of fir-dev to the LLVM Project begins around this date.

    flang-new can now do code generation if the -flang-experimental-exec option is used. This change used work originally done on the fir-dev branch.

  • May 2022 Kiran Chandramohan announces at the European LLVM Developers’ Meeting that Flang’s OpenMP 1.1 support is close to complete.

    The flang.sh compiler driver script becomes flang-to-external-fc. It allows the user to use flang-new to parse Fortran source code, then write it back to a file to be compiled with an existing Fortran compiler.

    The script can be put in place of an existing compiler to test Flang’s parser on large projects.

  • June 2022 Brad Richardson (Berkeley Lab) changes flang-new to generate code by default, removing the -flang-experimental-exec option.

  • July 2022 Valentin Clément (NVIDIA) announces that upstreaming of fir-dev to the LLVM Project is complete.

  • September 2022 LLVM 15.0.0 releases, including Flang’s experimental code generation option.

  • September 2023 LLVM 17.0.0 releases, with Flang’s code generation enabled by default.

At this point the LLVM Project contained Flang as it is known today. Sometimes referred to as “LLVM Flang”.

“LLVM Flang” is the combination of the F18 frontend and MLIR-based code generation from fir-dev. As opposed to “Classic Flang” that combines a PGI based frontend and its own custom backend.

The initiative to upstream Classic Flang was in some sense complete. Though with all of the compiler rewritten in the process, what landed in the LLVM Project was very different to Classic Flang.

  • April 2024 The flang-to-external-fc script is removed.

  • September 2024 LLVM 19.1.0 releases. The first release of flang-new as a standalone compiler.

  • October 2024 The community deems that Flang has met the criteria to not be “new” and the name is changed. Goodbye flang-new, hello flang!

  • November 2024 AMD announces its next generation Fortran compiler, based on LLVM Flang.

    Arm releases an experimental version of its new Arm Toolchain for Linux product, which includes LLVM Flang as the Fortan compiler.

  • March 2025 LLVM 20.1.0 releases. The first time the flang binary has been included in a release.

Flang and the Definition of New

Renaming Flang was discussed a few times before the final proposal. It was always contentious, so for the final proposal Brad Richardson decided to use the LLVM proposal process. Rarely used, but specifically designed for these situations.

After several rounds of back and forth, I thought the discussion was devolving and there wasn’t much chance we’d come to a consensus without some outside perspective.

  • Brad Richardson

That outside perspective included Chris Lattner (co-founder of the LLVM Project), who quickly identified a unique problem:

We have a bit of an unprecedented situation where an LLVM project is taking the name of an already established compiler [Classic Flang]. Everyone seems to want the older flang [Classic Flang] to fade away, but flang-new is not as mature and it isn’t clear when and what the criteria should be for that.

Confusion about the flang name was a key motivation for Brad Richardson too:

Part of my concern was that the name “flang-new” would get common usage before we were able to change it. I think it’s now been demonstrated that that concern was valid, because right now [November 2024] fpm [Fortran Package Manager] recognizes the compiler by that name.

My main goal at that point was just clear goals for when we would make the name change.

No single list of goals was agreed, but some came up many times:

  • Known limitations and supported features should be documented.
  • As much as possible, work that was expected to fix known bugs should be completed, to prevent duplicate bug reports.
  • Unimplemented language features should fail with a message saying that they are unimplemented. Rather than with a confusing failure or by producing incorrect code.
  • LLVM Flang should perform relatively well when compared to other Fortran compilers.
  • LLVM Flang must have a reasonable pass rate with a large Fortran language test suite, and results of that must be shown publicly.
  • All reasonable steps should be taken to prevent anyone using a pre-packaged Classic Flang confusing it with LLVM Flang.

You will see a lot of relative language in those, like “reasonable”. No one could say exactly what that meant, but everyone agreed that it was inevitable that one day it would all be true.

Paul T Robinson summarised the dilemma early in the thread:

the plan is to replace Classic Flang with the new Flang in the future.

I suppose one of the relevant questions here is: Has the future arrived?

After that Steve Scalpone (NVIDIA) gave their perspective that it was not yet time to change the name.

So the community got to work on those goals:

  • Many performance and correctness issues were addressed by the “High Level Fortran Intermediate Representation” (HLFIR) (which this article will explain later).
  • A cross-company team including Arm, Huawei, Linaro, Nvidia and Qualcomm collaborated to make it possible to build the popular SPEC 2017 benchmark with Flang.
  • Flang gained support for OpenMP up to version 2.5, and was able to compile OpenMP specific benchmarks like SPEC OMP and the NAS Parallel Benchmarks.
  • Linaro showed that the performance of Flang compared favourably with Classic Flang and was not far behind GFortran.
  • The GFortran test suite was added to the LLVM Test Suite, and Flang achieved good results.
  • Fujitsu’s test suite was made public and tested with Flang. The process to make IBM’s Fortran test suite public was started.

With all that done, in October of 2024 flang-new became flang. The future had arrived.

And it’s merged! It’s been a long (and sometimes contentious) process, but thank you to everyone who contributed to the discussion.

The goals the community achieved have certainly been worth it for Flang as a compiler, but did Brad achieve their own goals?

What did I hope to see as a result of the name change? I wanted it to be easier for more people to try it out.

So once you have finished reading this article, download Flang or try it out on Compiler Explorer. You know at least one person will appreciate it!

Fortran Intermediate Representation (FIR)

All compilers that use LLVM as a backend eventually produce code in the form of the LLVM Intermediate Representation (LLVM IR).

A drawback of this is that LLVM IR does not include language specific information. This means that for example, it cannot be used to optimise arrays in a way specific to Fortran programs.

One solution to this has been to build a higher level IR that represents the unique features of the language, optimise that, then convert the result into LLVM IR.

Eric Schweitz (NVIDIA) started to do that for Fortran in late 2018:

FIR was originally conceived as a high-level IR that would interoperate with LLVM but have a representation more friendly and amenable to Fortran optimizations.

Naming is hard but Eric did well here:

FIR was a pun of sorts. Fortran IR and meant to be evocative of the trees (Abstract Syntax Trees).

We will not go into detail about this early FIR, because MLIR was revealed soon after Eric started the project and they quickly adopted it.

When MLIR was announced, I quickly switched gears from building data structures for a new “intermediate IR” to porting my IR design to MLIR and using that instead.

I believe FIR was probably the first “serious project” outside of Google to start using MLIR.

The FIR work continued to develop, with Jean Perier (NVIDIA) joining Eric on the project. It became its own public branch fir-dev, which was later contributed to the LLVM Project.

The following sections will go into detail on the intermediate representations that Flang uses today.

MLIR

The journey from Classic Flang to LLVM Flang involved a rewrite of the entire compiler. This provided an opportunity to pick up new things from the LLVM Project. Most notably MLIR.

“Multi-Level Intermediate Representation” (MLIR) was first introduced to the LLVM community in 2019, around the time that F18 was approved to move into the LLVM Project.

The problem that MLIR addresses is the same one that Eric Schweitz tackled with FIR: It is difficult to map high level details of programming languages into LLVM IR.

You either have to attach them to the IR as metadata, try to recover the lost details later, or fight an uphill battle to add the details to LLVM IR itself. These details are crucial for producing optimised code in certain languages. (Fortran array optimisations were one use case referenced).

This led languages such as Swift and Rust to create their own IRs that include information relevant to their own optimisations. After that IR has been optimised it is converted into LLVM IR and goes through the normal compilation pipeline.

To implement these IRs they have to build a lot of infrastructure, but it cannot be shared between the compilers. This is where MLIR comes in.

The MLIR project aims to directly tackle these programming language design and implementation challenges—by making it very cheap to define and introduce new abstraction levels, and provide “in the box” infrastructure to solve common compiler engineering problems.

Flang and MLIR

The same year MLIR debuted, Eric Schweitz gave a talk at the later US LLVM Developers’ meeting titled “An MLIR Dialect for High-Level Optimization of Fortran”. FIR by that point was implemented as an MLIR dialect.

That [switching FIR to be based on MLIR] happened very quickly and I never looked back.

MLIR, even in its infancy, was clearly solving many of the exact same problems that we were facing building a new Fortran compiler.

  • Eric Schweitz

The MLIR community were also happy to have Flang on board:

It was fantastic to have very quickly in the early days of MLIR a non-ML [Machine Learning] frontend to exercise features we built in MLIR in anticipation. It led us to course-correct in some cases, and Flang was a motivating factor for many feature requests. It contributed significantly to establishing and validating that MLIR had the right foundations.

  • Mehdi Amini

Flang did not stop there, later adding another dialect “High Level Fortran Intermediate Representation” (HLFIR) which works at a higher level than FIR. A big target of HLFIR was array optimisations, that were more complex to handle using FIR alone.

FIR was a compromise on both ends to some degree. It wasn’t trying to capture syntactic information from Fortran, and I assumed there would be work done on an Abstract Syntax Tree. That niche would later be filled by “High Level FIR” [HLFIR].

  • Eric Schweitz

IRs All the Way Down

The compilation process starts with Fortran source code.

subroutine example(a, b)
  real :: a(:), b(:)
  a = b
end subroutine

(Compiler Explorer)

The subroutine example assigns array b to array a.

It is tempting to think of the IRs in a “stack” where each one is converted into the next. However, MLIR allows multiple “dialects” of MLIR to exist in the same file.

(The steps shown here are the most important ones for Flang. In reality there are many more between Fortran and LLVM IR.)

In the first step, Flang produces a file that is a mixture of HLFIR, FIR and the built-in MLIR dialect func (function).

module attributes {<...>} {
  func.func @_QPexample(%arg0: !fir.box<!fir.array<?xf32>> {fir.bindc_name = "a"}, %arg1: !fir.box<!fir.array<?xf32>> {fir.bindc_name = "b"}) {
    %0 = fir.dummy_scope : !fir.dscope
    %1:2 = hlfir.declare %arg0 dummy_scope %0 {uniq_name = "_QFexampleEa"} : (!fir.box<!fir.array<?xf32>>, !fir.dscope) -> (!fir.box<!fir.array<?xf32>>, !fir.box<!fir.array<?xf32>>)
    %2:2 = hlfir.declare %arg1 dummy_scope %0 {uniq_name = "_QFexampleEb"} : (!fir.box<!fir.array<?xf32>>, !fir.dscope) -> (!fir.box<!fir.array<?xf32>>, !fir.box<!fir.array<?xf32>>)
    hlfir.assign %2#0 to %1#0 : !fir.box<!fir.array<?xf32>>, !fir.box<!fir.array<?xf32>>
    return
  }
}

For example, the “dummy arguments” (the arguments of a subroutine) are declared with hlfir.declare but their type is specified with fir.array.

As MLIR allows multiple dialects to exist in the same file, there is no need for HLFIR to have a hlfir.array that duplicates fir.array, unless HLFIR wanted to handle that differently.

The next step is to convert HLFIR into FIR:

module attributes {<...>} {
  func.func @_QPexample(<...>) {
    <...>
    %c3_i32 = arith.constant 3 : i32
    %7 = fir.convert %0 : (!fir.ref<!fir.box<!fir.array<?xf32>>>) -> !fir.ref<!fir.box<none>>
    %8 = fir.convert %5 : (!fir.box<!fir.array<?xf32>>) -> !fir.box<none>
    %9 = fir.convert %6 : (!fir.ref<!fir.char<1,17>>) -> !fir.ref<i8>
    %10 = fir.call @_FortranAAssign(%7, %8, %9, %c3_i32) : (!fir.ref<!fir.box<none>>, !fir.box<none>, !fir.ref<i8>, i32) -> none
    return
  }
<...>
}

Then this bundle of MLIR dialects is converted into LLVM IR:

define void @example_(ptr %0, ptr %1) {
  <...>
  store { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] } %37, ptr %3, align 8
  call void @llvm.memcpy.p0.p0.i32(ptr %5, ptr %4, i32 48, i1 false)
  %38 = call {} @_FortranAAssign(ptr %5, ptr %3, ptr @_QQclX2F6170702F6578616D706C652E66393000, i32 3)
  ret void
}
<...>

This LLVM IR passes through the standard compilation pipeline that clang also uses. Eventually being converted into target specific Machine IR (MIR), into assembly and finally into a binary program.

  • Fortran
  • MLIR (including HLFIR and FIR)
  • MLIR (including FIR)
  • LLVM IR
  • MIR
  • Assembly
  • Binary

At each stage, the optimisations most suited to that stage are done. For example, while you have HLFIR you could optimise array accesses because at that point you have the most information about how the Fortran treats arrays.

If Flang were to do this later on, in LLVM IR, it would be much more difficult. Either the information would be lost or incomplete, or you would be at a stage in the pipeline where you cannot assume that you started with a specific source language.

OpenMP to Everyone

Note: Most of the points made in this section also apply to OpenACC support in Flang. In the interest of brevity, I will only describe OpenMP in this article. You can find more about OpenACC in this presentation.

OpenMP Basics

OpenMP is a standardised API for adding parallelism to C, C++ and Fortran programs.

Programmers mark parts of their code with “directives”. These directives tell the compiler how the work of the program should be distributed. Based on this, the compiler transforms the code and inserts calls to an OpenMP runtime library for certain operations.

This is a Fortran example:

SUBROUTINE SIMPLE(N, A, B)
  INTEGER I, N
  REAL B(N), A(N)
!$OMP PARALLEL DO
  DO I=2,N
    B(I) = (A(I) + A(I-1)) / 2.0
  ENDDO
END SUBROUTINE SIMPLE

(from “OpenMP Application Programming Interface Examples”, Compiler Explorer)

Note: Fortran arrays are one-based by default. So the first element is at index 1. This example reads the previous element as well, so it starts I at 2.

!$OMP PARALLEL DO is a directive in the form of a Fortran comment (Fortran comments start with !).PARALLEL DO starts a parallel “region” that includes the code from DO to ENDDO.

This tells the compiler that the work in the DO loop should be shared amongst all the threads available to the program.

Clang has supported OpenMP for many years now. The equivalent C++ code is:

void simple(int n, float *a, float *b)
{
    int i;

    #pragma omp parallel for
    for (i=1; i<n; i++)
        b[i] = (a[i] + a[i-1]) / 2.0;
}

(Compiler Explorer)

For C++, the directive is in the form of a #pragma and attached to the for loop.

LLVM IR does not know anything about OpenMP specifically, so Clang does all the work of converting the intent of the directives into LLVM IR. The output from Clang looks like this:

define dso_local void @simple(int, float*, float*)
  (i32 noundef %n, ptr noundef %a, ptr noundef %b) <...> {
entry:
<...>
  call void (<...>) @__kmpc_fork_call(@simple <...> (.omp_outlined) <...>)
  ret void
}

define internal void @simple(int, float*, float*) (.omp_outlined)
  (ptr <...> %.global_tid.,
   ptr <...> %.bound_tid.,
   ptr <...> %n,
   ptr <...> %b,
   ptr <...> %a) {
entry:
<...>
  call void @__kmpc_for_static_init_4(<...>)
<...>
omp.inner.for.body.i:
<...>
omp.loop.exit.i:
  call void @__kmpc_for_static_fini(<...>)
<...>
  ret void
}

(output edited for readability)

The body of simple no longer does all the work. Instead it uses __kmpc_fork_call to tell the OpenMP runtime library to run another function, simple (.omp_outlined) to do the work.

This second function is referred to as a “micro task”. The runtime library splits the work across many instances of the micro task and each time the micro task function is called, it gets a different slice of the work.

The number of instances is only known at runtime, and can be controlled with settings such as OMP_NUM_THREADS.

The LLVM IR representation of simple (.omp_outlined) includes labels like omp.loop.exit.i, but these are not specific to OpenMP. They are just normal LLVM IR labels whose name includes omp.

Sharing Clang’s OpenMP Knowledge

Shortly after Flang was approved to join the LLVM Project, it was proposed that Flang should share OpenMP support code with Clang.

This is an RFC for the design of the OpenMP front-ends under the LLVM umbrella. It is necessary to talk about this now as Flang (aka. F18) is maturing at a very promising rate and about to become a sub-project next to Clang.

TLDR; Keep AST nodes and Sema separated but unify LLVM-IR generation for OpenMP constructs based on the (almost) identical OpenMP directive level.

  • “[RFC] Proposed interplay of Clang & Flang & LLVM wrt. OpenMP”, Johannes Doerfert (Lawrence Livermore National Laboratory), May 2019 (only one part of this still exists online, this quote is from a copy of the other part, which was provided to me).

For our purposes, the “TLDR” means that although both compilers have different internal representations of the OpenMP directives, they both have to produce LLVM IR from that representation.

This proposal led to the creation of the LLVMFrontendOpenMP library in llvm. By using the same class OpenMPIRBuilder, there is no need to repeat work in both compilers, at least for this part of the OpenMP pipeline.

As you will see in the following sections, Flang has diverged from Clang for other parts of OpenMP processing.

Bringing OpenMP to MLIR

Early in 2020, Kiran Chandramohan (Arm) proposed an MLIR dialect for OpenMP, for use by Flang.

We started the work for the OpenMP MLIR dialect because of Flang. … So, MLIR has an OpenMP dialect because of Flang.

  • Kiran Chandramohan

This dialect would represent OpenMP specifically, unlike the generic LLVM IR you get from Clang.

If you compile the original Fortran OpenMP example without OpenMP enabled, you get this MLIR:

module attributes {<...>} {
  func.func @_QPsimple(<...> {
    %1:2 = hlfir.declare %arg0 <...> {uniq_name = "_QFsimpleEn"} : <...>
    %3:2 = hlfir.declare %2 <...> {uniq_name = "_QFsimpleEi"} : <...>
    %10:2 = hlfir.declare %arg1(%9) <...> {uniq_name = "_QFsimpleEa"} : <...>
    %17:2 = hlfir.declare %arg2(%16) <...> {uniq_name = "_QFsimpleEb"} : <...>
    %22:2 = fir.do_loop <...> {
      <...>
      hlfir.assign %34 to %37 : f32, !fir.ref<f32>
    }
    fir.store %22#1 to %3#1 : !fir.ref<i32>
    return
  }
}

(output edited for readability)

Notice that the DO loop has been converted into fir.do_loop. Now enable OpenMP and compile again:

module attributes {<...>} {
  func.func @_QPsimple(<...> {
    %1:2 = hlfir.declare %arg0 <...> {uniq_name = "_QFsimpleEn"} : <...>
    %10:2 = hlfir.declare %arg1(%9) <...> {uniq_name = "_QFsimpleEa"} : <...>
    %17:2 = hlfir.declare %arg2(%16) <...> {uniq_name = "_QFsimpleEb"} : <...>
    omp.parallel {
      %19:2 = hlfir.declare %18 {uniq_name = "_QFsimpleEi"} : <...>
      omp.wsloop {
        omp.loop_nest (%arg3) : i32 = (%c2_i32) to (%20) inclusive step (%c1_i32) {
          hlfir.assign %32 to %35 : f32, !fir.ref<f32>
          omp.yield
        }
      }
      omp.terminator
    }
    return
  }
}

(output edited for readability)

You will see that instead of fir.do_loop you have omp.parallel, omp.wsloop and omp.loop_nest. omp is an MLIR dialect that describes OpenMP.

This translation of the PARALLEL DO directive is much more literal than the LLVM IR produced by Clang for parallel for.

As the omp dialect is specifically made for OpenMP, it can represent it much more naturally. This makes it easier to understand the code and to write optimisations.

Of course Flang needs to produce LLVM IR eventually, and to do that it uses the same OpenMPIRBuilder class that Clang does. From the MLIR shown previously, OpenMPIRBuilder produces the following LLVM IR:

define void @simple_ <...> {
entry:
  call void (<...>) @__kmpc_fork_call( <...> @simple_..omp_par <...>)
  ret void
}

define internal void @simple_..omp_par <...> {
omp.par.entry:
  call void @__kmpc_for_static_init_4u <...>
omp_loop.exit:
  call void @__kmpc_barrier(<...>)
  ret void
omp_loop.body:
  <...>
}

The LLVM IR produced by Flang and Clang is superficially different, but structurally very similar. Considering the differences in source language and compiler passes, it is not surprising that they are not identical.

ClangIR and the Future

It is surprising that a compiler for a language as old as Fortran got ahead of Clang (the most well known LLVM based compiler) when it came to adopting MLIR.

This is largely due to timing, MLIR is a recent invention and Clang existed before MLIR arrived. Clang also has a legacy to protect, so it is unlikely to migrate to a new technology right away.

The ClangIR project is working to change Clang to use a new MLIR dialect, “Clang Intermediate Representation” (“CIR”). Much like Flang and its HLFIR/FIR dialects, ClangIR will convert C and C++ into the CIR dialect.

Work on OpenMP support for ClangIR has already started, using the omp dialect that was originally added for Flang.

Unfortunately at time of writing the parallel directive is not supported by ClangIR. However, if you look at the CIR produced when OpenMP is disabled, you can see the cir.for element that the OpenMP dialect might replace:

module <...> attributes {<...>} {
  cir.func @_Z6simpleiPfS_( <...> {
    %1 = cir.alloca <...> ["a", init] <...>
    %2 = cir.alloca <...> ["b", init] <...>
    %3 = cir.alloca <...> ["i"] <...>
    cir.scope {
      cir.for :
      cond { <...> }
      body {
        <...>
        cir.yield loc(#loc13)
      } step {
        <...>
        cir.yield loc(#loc36)
      } loc(#loc36)
    } loc(#loc36)
    cir.return loc(#loc2)
  } loc(#loc31)
} loc(#loc)

(on Compiler Explorer)

Flang Takes Driving Lessons

Note: This section paraphrases material from “Flang Drivers”. If you want more detail please refer to that document, or Driving Compilers.

“Driver” in a compiler context means the part of the compiler that decides how to handle a set of options. For instance, when you use the option -march=armv8a+memtag, something in Flang knows that you want to compile for Armv8.0-a with the Memory Tagging Extension enabled.

-march= is an example of a “compiler driver” option. These options are what users give to the compiler. There is actually a second driver after this, confusingly called the “frontend” driver, despite being behind the scenes.

In Flang’s case the “compiler driver” is flang and the “frontend driver” is flang -fc1 (they are two separate tools, contained in the same binary).

They are separate tools so that the compiler driver can provide an interface suited to compiler users, with stable options that do not change over time. On the other hand, the frontend driver is suited to compiler developers, exposes internal compiler details and does not have a stable set of options.

You can see the differences if you add -### to the compiler command:

$ ./bin/flang /tmp/test.f90 -march=armv8a+memtag -###
 "<...>/flang" "-fc1"
   "-triple" "aarch64-unknown-linux-gnu"
   "-target-feature" "+v8a"
   "-target-feature" "+mte"
 "/usr/bin/ld" \
   "-o" "a.out"
   "-L/usr/lib/gcc/aarch64-linux-gnu/11"

(output edited for readability)

The compiler driver has split the compilation into a job for the frontend (flang -fc1) and the linker (ld). -march= has been converted into many arguments to flang -fc1. This means that if compiler developers decided to change how -march= was converted, existing flang commands would still work.

Another responsibility of the compiler driver is to know where to find libraries and header files. This differs between operating systems and even distributions of the same family of operating systems (for example Linux distributions).

This created a problem when implementing the compiler driver for Flang. All these details would take a long time to get right.

Luckily, by this time Flang was in the LLVM Project alongside Clang. Clang already knew how to handle this and had been tested on all sorts of systems over many years.

The intent is to mirror clang, for both the driver and CompilerInvocation, as much as makes sense to do so. The aim is to avoid re-inventing the wheel and to enable people who have worked with either the clang or flang entry points, drivers, and frontends to easily understand the other.

Flang became the first in-tree project to use Clang’s compiler driver library (clangDriver) to implement its own compiler driver.

This meant that Flang was able to handle all the targets and tools that Clang could, without duplicating large amounts of code.

Reflections on Flang

We are almost 10 years from the first announcement of what would become LLVM Flang. In the LLVM monorepo alone there have been close to 10,000 commits from around 400 different contributors. Undoubtedly more in Classic Flang before that.

So it is time to hear from users, contributors, and supporters, past and present, about their experiences with Flang.

Collaborating with NVIDIA and PGI on Classic Flang was crucial in establishing Arm in High Performance Computing. It has been an honour to continue investing in Flang, helping it become an integral part of the LLVM project and a solid foundation for building HPC toolchains.

We are delighted to see the project reach maturity, as this was the last step in allowing us to remove all downstream code from our compiler. Look out for Arm Toolchain for Linux 20, which will be a fully open source, freely available compiler based on LLVM 20, available later this year.”

  • Will Lovett, Director Technology Management at Arm.

(the following quote is presented in Japanese and English, in case of differences, Japanese is the authoritative version)

富士通は、我々の数十年にわたるHPCの経験を通じて培ったテストスイートを用いて、Flangの改善に貢献できたことを嬉しく思います。Flangの親切で協力的なコミュ ニティに大変感銘を受けました。

富士通は、より高いパフォーマンスと使いやすさを実現し、我々のプロセッサを最大限に活用するために、引き続きFlangに取り組んでいきます。Flangが改善を続け、ユーザーを増やしていくことを強く願っています。

Fujitsu is pleased to have contributed to the improvement of Flang with our test suite, which we have developed through our decades of HPC experience. Flang’s helpful and collaborative community really impressed us.

Fujitsu will continue to work on Flang to achieve higher performance and usability, to make the best of our processors. We hope that Flang will continue to improve and gain users.

  • 富士通株式会社 コンパイラ開発担当 マネージャー 鎌塚 俊 (Shun Kamatsuka, Manager of the Compiler Development Team at Fujitsu).

Collaboration between Linaro and Fujitsu on an active CI using Fujitsu’s testsuite helped find several issues and make Flang more robust, in addition to detecting any regressions early.

Linaro has been contributing to Flang development for two years now, fixing a great number of issues found by the Fujitsu testsuite.

  • Carlos Seo, Tech Lead at Linaro.

SciPy is a foundational Python package. It provides easy access to scientific algorithms, many of which are written in Fortran.

This has caused a long stream of problems for packaging and shipping SciPy, especially because users expect first-class support for Windows; a platform that (prior to Flang) had no license-free Fortran compilers that would work with the default platform runtime.

As maintainers of SciPy and redistributors in the conda-forge ecosystem, we hoped for a solution to this problem for many years. In the end, we switched to using Flang, and that process was a minor miracle.

Huge thanks to the Flang developers for removing a major source of pain for us!

  • Axel Obermeier, Quantsight Labs.

At the Barcelona Supercomputing Center, like many other HPC centers, we cannot ignore Fortran.

As part of our research activities, Flang has allowed us to apply our work in long vectors for RISC-V to complex Fortran applications which we have been able to run and analyze in our prototype systems. We have also used Flang to support an in-house task-based directive-based programming model.

These developments have proved to us that Flang is a powerful infrastructure.

  • Roger Ferrer Ibáñez, Senior Research Engineer at the Barcelona Supercomputing Center (BSC).

I am thrilled to see the LLVM Flang project achieve this milestone. It is a unique project in that it marries state of the art compiler technologies like MLIR with the venerable Fortran language and its large community of developers focused on high performance compute.

Flang has set the standard for LLVM frontends by adopting MLIR and C++17 features earlier than others, and I am thrilled to see Clang and other frontends modernize based on those experiences.

Flang also continues something very precious to me: the LLVM Project’s ability to enable collaboration by uniting people with shared interests even if they span organizations like academic institutions, companies, and other research groups.

  • Chris Lattner, serving member of the LLVM Board of Directors, co-founder of the LLVM Project, the Clang C++ compiler and MLIR.

The need for a more modern Fortran compiler motivated the creation of the LLVM Flang project and AMD fully supports that path.

In following with community trends, AMD’s Next-Gen Fortran Compiler will be a downstream flavor of LLVM Flang and will in time supplant the current AMD Flang compiler, a downstream flavor of “Classic Flang”.

Our mission is to allow anyone that is using and developing a Fortran HPC codebase to directly leverage the power of AMD’s GPUs. AMD’s Next-Gen Fortran Compiler’s goal is fulfilling our vision by allowing you to deploy and accelerate your Fortran codes on AMD GPUs using OpenMP offloading, and to directly interface and invoke HIP and ROCm kernels.

Getting Involved

Flang might not be new anymore, but it is definitely still improving. If you want to try Flang on your own projects, you can download it right now.

If you want to contribute, there are many ways to do so. Bug reports, code contributions, documentation improvements and so on. Flang follows the LLVM contribution process and you can find links to the forums, community calls and anything else you might need here.

Credits

Thank you to the following people for their contributions to this article:

  • Alex Bradbury (Igalia)
  • Andrzej Warzyński (Arm)
  • Axel Obermeier (Quansight Labs)
  • Brad Richardson (Lawrence Berkeley National Laboratory)
  • Carlos Seo (Linaro)
  • Daniel C Chen (IBM)
  • Eric Schweitz (NVIDIA)
  • Hao Jin
  • Jeff Hammond (NVIDIA)
  • Kiran Chandramohan (Arm)
  • Leandro Lupori (Linaro)
  • Luis Machado (Arm)
  • Mehdi Amini
  • Pat McCormick (Los Alamos National Laboratory)
  • Peter Waller (Arm)
  • Steve Scalpone (NVIDIA)
  • Tarun Prabhu (Los Alamos National Laboratory)

Further reading