The LLVM Project Blog

LLVM Project News and Details from the Trenches

GSoC 2025: Bfloat16 in LLVM libc

Introduction

BFloat16 is a 16-bit floating-point format, introduced by Google and standardized in C++23 as std::bfloat16_t. It uses 1 sign bit, 8 exponent bits (the same as float), and 7 mantissa bits. This configuration allows BFloat16 to represent a much wider dynamic range than IEEE binary16 (~3×10^38 values compared to 65,504), though with lower precision. BFloat16 has become popular in AI and machine learning use-cases where it offers significant performance advantages over IEEE binary32 while preserving its approximate dynamic range.

The goal of this project was to implement the BFloat16 type in LLVM libc along with the basic math functions like fabsbf16, fmaxbf16, etc.
We also want all functions to be generic (platform-independent) and correctly rounded for all rounding modes.

What was done

  • BFloat16 type was added to LLVM libc (libc/src/__support/FPUtil/bfloat16.h) #144463.
  • All 70 expected basic math functions for bfloat16 were implemented, using a generic approach that supports all libc supported architectures (ARM, RISC-V, GPUs, x86, Darwin) (see table below).
  • Implemented two additional basic math functions: iscanonicalbf16 and issignalingbf16.
  • Implemented higher math functions: sqrtbf16 #156654 and log_bf16 #157811 (open).
  • Comparison operations for the FPBits class were added #144983.
Basic Math FunctionPR
fabsbf16#148398
ceilbf16, floorbf16, roundbf16, roundevenbf16, truncbf16#152352
bf16add, bf16addf, bf16addl, bf16addf128, bf16sub, bf16subf, bf16subl, bf16subf128#152774
fmaxbf16, fminbf16#152782
bf16mul, bf16mulf, bf16mull, bf16mulf128#152847
fmaximumbf16, fmaximum_magbf16, fmaximum_mag_numbf16, fmaximum_numbf16, fminimumbf16, fminimum_magbf16, fminimum_mag_numbf16, fminimum_numbf16#152881
bf16div, bf16divf, bf16divl, bf16divf128#153191
bf16fma, bf16fmaf, bf16fmal, bf16fmaf128#153231
llrintbf16, llroundbf16, lrintbf16, lroundbf16, nearbyintbf16, rintbf16#153882
fromfpbf16, fromfpxbf16, ufromfpbf16, ufromfpxbf16#153992
nextafterbf16, nextdownbf16, nexttowardbf16, nextupbf16#153993
getpayloadbf16, setpayloadbf16, setpayloadsigbf16#153994
nanbf16#153995
frexpbf16, ilogbbf16, ldexpbf16, llogbbf16, logbbf16#154427
modfbf16, remainderbf16, remquobf16#154652
canonicalizebf16, iscanonicalbf16, issignalingbf16, copysignbf16, fdimbf16#155567
totalorderbf16, totalordermagbf16#155568
scalbnbf16, scalblnbf16#155569
fmodbf16#155575

The implementation status can be viewed at the libc math.h header implementation status page, which is updated regularly.

What was not done

  • The implementation used a generic approach and did not rely on the __bf16 compiler intrinsic, as it is not available in all compilers versions. Our goal is to ensure that the type is supported by all compilers and versions supported by LLVM libc.
  • Hardware optimizations provided by Intel’s AVX-512_BF16 were not utilized. These instructions only support round-to-nearest-even mode, always flush output denormals to zero, and treat input denormals as zero, which does not align with our goal. See VCVTNE2PS2BF16 instruction description.
  • ARMv9 SVE instructions were not utilized, as they are relatively new and not yet widely supported.
  • Not all higher math functions were implemented due to time constraints.

Future Work

  • Implement the remaining higher math functions.
  • Perform performance comparisons with other libc implementations once their bfloat16 support is available and also with the CORE-MATH project.
  • Update the test suite when the mpfr_get_bfloat16 function becomes available.

Acknowledgements

I would like to thank my mentors, Tue Ly and Nicolas Celik, for their invaluable guidance and support throughout this project. The project wouldn’t have been possible without them. I am also grateful to the LLVM Foundation and the GSoC admins for giving me this opportunity.