LLVM Project News and Details from the Trenches

Monday, December 19, 2011

NVIDIA CUDA 4.1 Compiler Now Built on LLVM

From the NVIDIA CUDA compiler team:

CUDA is a parallel programming model and platform created by NVIDIA for harnessing the power of hundreds of cores in modern graphics processing units (GPUs). NVIDIA provides free support for CUDA C and C++ in the CUDA toolkit. The CUDA programming environment consists of a compiler targeting NVIDIA GPUs and has been adopted by thousands of developers.

At NVIDIA we have switched over to using LLVM inside the CUDA C/C++ compiler for Fermi and future architectures. We use LLVM for optimizations and PTX code generation and for generating debug information for CUDA debugging. From a developer's perspective the new compiler is functionally on par with the previous compilers and produces better code with better compile times. We have extended the LLVM core compiler to understand data parallel programming model. It is now available, as part of CUDA 4.1 and you can learn more here.

Our experience with the use of LLVM has been very positive, starting with a modern compiler infrastructure and with high quality optimizations contributed by a large community of developers. The effort required to learn LLVM infrastructure is quite small and reasonable.

LLVM 3.1 vector changes

Intel uses the Low-Level Virtual Machine (LLVM) in a number of products, including the Intel® OpenCL SDK. The SDK's implicit vectorization module generates LLVM-IR (intermediate representation) which uses vector types.

LLVM-IR supports operations that use vector data types, and the LLVM code generator needs to do non-trivial work in order to efficiently compile vector operations into SIMD instructions. Recently, there were changes to the LLVM code generation that enabled better code generation for vector operations. In addition to many low level optimizations, this post talks about two major changes: the implementation of vector-select, and the support for vectors-of-pointers.