GSoC 2025: Bfloat16 in LLVM libc
Introduction
BFloat16 is a 16-bit floating-point format, introduced by Google and standardized in C++23 as std::bfloat16_t
. It uses 1 sign bit, 8 exponent bits (the same as float
), and 7 mantissa bits. This configuration allows BFloat16 to represent a much wider dynamic range than IEEE binary16
(~3×10^38 values compared to 65,504), though with lower precision. BFloat16 has become popular in AI and machine learning use-cases where it offers significant performance advantages over IEEE binary32
while preserving its approximate dynamic range.
The goal of this project was to implement the BFloat16 type in LLVM libc along with the basic math functions like fabsbf16
, fmaxbf16
, etc.
We also want all functions to be generic (platform-independent) and correctly rounded for all rounding modes.
What was done
- BFloat16 type was added to LLVM libc (
libc/src/__support/FPUtil/bfloat16.h
) #144463. - All 70 expected basic math functions for
bfloat16
were implemented, using a generic approach that supports all libc supported architectures (ARM, RISC-V, GPUs, x86, Darwin) (see table below). - Implemented two additional basic math functions:
iscanonicalbf16
andissignalingbf16
. - Implemented higher math functions:
sqrtbf16
#156654 andlog_bf16
#157811 (open). - Comparison operations for the
FPBits
class were added #144983.
Basic Math Function | PR |
---|---|
fabsbf16 | #148398 |
ceilbf16 , floorbf16 , roundbf16 , roundevenbf16 , truncbf16 | #152352 |
bf16add , bf16addf , bf16addl , bf16addf128 , bf16sub , bf16subf , bf16subl , bf16subf128 | #152774 |
fmaxbf16 , fminbf16 | #152782 |
bf16mul , bf16mulf , bf16mull , bf16mulf128 | #152847 |
fmaximumbf16 , fmaximum_magbf16 , fmaximum_mag_numbf16 , fmaximum_numbf16 , fminimumbf16 , fminimum_magbf16 , fminimum_mag_numbf16 , fminimum_numbf16 | #152881 |
bf16div , bf16divf , bf16divl , bf16divf128 | #153191 |
bf16fma , bf16fmaf , bf16fmal , bf16fmaf128 | #153231 |
llrintbf16 , llroundbf16 , lrintbf16 , lroundbf16 , nearbyintbf16 , rintbf16 | #153882 |
fromfpbf16 , fromfpxbf16 , ufromfpbf16 , ufromfpxbf16 | #153992 |
nextafterbf16 , nextdownbf16 , nexttowardbf16 , nextupbf16 | #153993 |
getpayloadbf16 , setpayloadbf16 , setpayloadsigbf16 | #153994 |
nanbf16 | #153995 |
frexpbf16 , ilogbbf16 , ldexpbf16 , llogbbf16 , logbbf16 | #154427 |
modfbf16 , remainderbf16 , remquobf16 | #154652 |
canonicalizebf16 , iscanonicalbf16 , issignalingbf16 , copysignbf16 , fdimbf16 | #155567 |
totalorderbf16 , totalordermagbf16 | #155568 |
scalbnbf16 , scalblnbf16 | #155569 |
fmodbf16 | #155575 |
The implementation status can be viewed at the libc math.h
header implementation status page, which is updated regularly.
What was not done
- The implementation used a generic approach and did not rely on the
__bf16
compiler intrinsic, as it is not available in all compilers versions. Our goal is to ensure that the type is supported by all compilers and versions supported by LLVM libc. - Hardware optimizations provided by Intel’s AVX-512_BF16 were not utilized. These instructions only support round-to-nearest-even mode, always flush output denormals to zero, and treat input denormals as zero, which does not align with our goal. See VCVTNE2PS2BF16 instruction description.
- ARMv9 SVE instructions were not utilized, as they are relatively new and not yet widely supported.
- Not all higher math functions were implemented due to time constraints.
Future Work
- Implement the remaining higher math functions.
- Perform performance comparisons with other libc implementations once their
bfloat16
support is available and also with the CORE-MATH project. - Update the test suite when the
mpfr_get_bfloat16
function becomes available.
Acknowledgements
I would like to thank my mentors, Tue Ly and Nicolas Celik, for their invaluable guidance and support throughout this project. The project wouldn’t have been possible without them. I am also grateful to the LLVM Foundation and the GSoC admins for giving me this opportunity.