NOTE: Support for NVIDIA GPUs via the CUDA backend is currently experimental and many features may be missing or incomplete.
The experimental CUDA backend provides support for CUDA-capable NVIDIA GPUs under Linux or macOS. The goal of this backend is to provide an open-source alternative to the proprietary NVIDIA OpenCL implementation. This makes use of the NVPTX backend in LLVM and the CUDA driver API.
Aside from the usual pocl dependencies, you will also need the CUDA toolkit. Currently this backend has only been tested against CUDA 8.0, but it may also be possible to build against other versions.
If you experience build failures regarding missing CUDA headers or libraries, you may need to add the include directory containing
cuda.hto your header search path, and/or the library directory containinglibcuda.{so,dylib}to your library search path.The CUDA backend requires LLVM built with the NVPTX backend enabled.
To enable the CUDA backend, add
-DENABLE_CUDA=ONto your CMake configuration command line.Otherwise, build and install pocl as normal.
After building pocl, you can smoke test the CUDA backend by executing the subset of pocl’s tests that are known to pass on NVIDIA GPUs:
../tools/scripts/run_cuda_tests
Use
POCL_DEVICES=CUDAto select only CUDA devices. If the system has more than one GPU, specify theCUDAdevice multiple times (e.g.POCL_DEVICES=CUDA,CUDAfor two GPUs).The CUDA backend currently has a runtime dependency on the CUDA toolkit. If you receive errors regarding a failure to load
libdevice, you may need to set thePOCL_CUDA_TOOLKIT_PATHenvironment variable to tell pocl where the CUDA toolkit is installed. Set this variable to the root of the toolkit installation (the directory containing thenvvmdirectory).The
POCL_CUDA_GPU_ARCHenvironment variable can be set to override the target GPU architecture (e.g.POCL_CUDA_GPU_ARCH=sm_35), which may be necessary in cases where LLVM doesn’t yet support the architecture.The
POCL_CUDA_VERIFY_MODULEenvironment variable can be set to0to skip verification that the LLVM module produced by the CUDA backend is well formed. Currently defaults to 1 = ON.The
POCL_CUDA_DUMP_NVVMenvironment variable can be set to1to dump the LLVM IR that is fed into the NVPTX backend for debugging purposes (requiresPOCL_DEBUG=1).The
POCL_CUDA_DISABLE_QUEUE_THREADSenvironment variable can be set to1to disable background threads for handling command submission. This can potentially reduce command launch latency, but can cause problems if using user events or sharing a context with a non-CUDA device.
(last updated: 2017-06-02)
The CUDA backend currently passes 73 tests from pocl’s internal testsuite, and is capable of running various real OpenCL codes. Unlike NVIDIA’s proprietary OpenCL implementation, pocl supports SPIR consumption, and so this backend has also been able to run (for example) SYCL codes using Codeplay’s ComputeCpp implementation on NVIDIA GPUs. Since it uses CUDA under-the-hood, this backend also works with all of the NVIDIA CUDA profiling and debugging tools, many of which don’t work with NVIDIA’s own OpenCL implementation.
The Khronos OpenCL 1.2 conformance tests are available here. The following test categories are known to pass on at least one NVIDIA GPU using pocl’s CUDA backend:
The CUDA backend has been tested on Linux (CentOS 7.3) with SM_35, SM_52, SM_60, and SM_61 capable NVIDIA GPUs.
The backend is also functional on macOS, with just one additional test failure
compared to Linux (test_event_cycle).
The following is a non-comprehensive list of known issues in the CUDA backend:
Additionally, there has been little effort to optimize the performance of this backend so far - the current effort is on implementing remaining functionality. Once the core functionality is completed, optimization of the code generation and runtime can begin.
For bug reports and questions, please use pocl’s GitHub issue tracker. Pull requests and other contributions are also very welcome.
This work has primarily been done by James Price from the University of Bristol’s High Performance Computing Group.