NVIDIA GPU supportÂ¶

NOTE: Support for NVIDIA GPUs via the CUDA backend is currently experimental and many features may be missing or incomplete.

The experimental CUDA backend provides support for CUDA-capable NVIDIA GPUs under Linux or macOS. The goal of this backend is to provide an open-source alternative to the proprietary NVIDIA OpenCL implementation. This makes use of the NVPTX backend in LLVM and the CUDA driver API.

Building pocl with CUDA supportÂ¶

1) Install prerequisitesÂ¶

Aside from the usual pocl dependencies, you will also need the CUDA toolkit. Currently this backend has only been tested against CUDA 8.0, but it may also be possible to build against other versions.

If you experience build failures regarding missing CUDA headers or libraries, you may need to add the include directory containing cuda.h to your header search path, and/or the library directory containing libcuda.{so,dylib} to your library search path.

The CUDA backend requires LLVM built with the NVPTX backend enabled.

2) Build poclÂ¶

To enable the CUDA backend, add -DENABLE_CUDA=ON to your CMake configuration command line.

Otherwise, build and install pocl as normal.

3) Run testsÂ¶

After building pocl, you can smoke test the CUDA backend by executing the subset of poclâ€™s tests that are known to pass on NVIDIA GPUs:
../tools/scripts/run_cuda_tests

4) ConfigurationÂ¶

Use POCL_DEVICES=CUDA to select only CUDA devices. If the system has more than one GPU, specify the CUDA device multiple times (e.g. POCL_DEVICES=CUDA,CUDA for two GPUs).

The CUDA backend currently has a runtime dependency on the CUDA toolkit. If you receive errors regarding a failure to load libdevice, you may need to set the POCL_CUDA_TOOLKIT_PATH environment variable to tell pocl where the CUDA toolkit is installed. Set this variable to the root of the toolkit installation (the directory containing the nvvm directory).

The POCL_CUDA_GPU_ARCH environment variable can be set to override the target GPU architecture (e.g. POCL_CUDA_GPU_ARCH=sm_35), which may be necessary in cases where LLVM doesnâ€™t yet support the architecture.

The POCL_CUDA_VERIFY_MODULE environment variable can be set to 0 to skip verification that the LLVM module produced by the CUDA backend is well formed. Currently defaults to 1 = ON.

The POCL_CUDA_DUMP_NVVM environment variable can be set to 1 to dump the LLVM IR that is fed into the NVPTX backend for debugging purposes (requires POCL_DEBUG=1).

The POCL_CUDA_DISABLE_QUEUE_THREADS environment variable can be set to 1 to disable background threads for handling command submission. This can potentially reduce command launch latency, but can cause problems if using user events or sharing a context with a non-CUDA device.

CUDA backend statusÂ¶

(last updated: 2017-06-02)

The CUDA backend currently passes 73 tests from poclâ€™s internal testsuite, and is capable of running various real OpenCL codes. Unlike NVIDIAâ€™s proprietary OpenCL implementation, pocl supports SPIR consumption, and so this backend has also been able to run (for example) SYCL codes using Codeplayâ€™s ComputeCpp implementation on NVIDIA GPUs. Since it uses CUDA under-the-hood, this backend also works with all of the NVIDIA CUDA profiling and debugging tools, many of which donâ€™t work with NVIDIAâ€™s own OpenCL implementation.

Conformance statusÂ¶

The Khronos OpenCL 1.2 conformance tests are available here. The following test categories are known to pass on at least one NVIDIA GPU using poclâ€™s CUDA backend:

allocations
api
atomics
basic
commonfns
computeinfo
contractions
events
profiling
relationals
thread_dimensions
vec_step

Tested platformsÂ¶

The CUDA backend has been tested on Linux (CentOS 7.3) with SM_35, SM_52, SM_60, and SM_61 capable NVIDIA GPUs.

The backend is also functional on macOS, with just one additional test failure compared to Linux (test_event_cycle).

Known issuesÂ¶

The following is a non-comprehensive list of known issues in the CUDA backend:

image types and samplers are unimplemented
printf format support is incomplete

Additionally, there has been little effort to optimize the performance of this backend so far - the current effort is on implementing remaining functionality. Once the core functionality is completed, optimization of the code generation and runtime can begin.

SupportÂ¶

For bug reports and questions, please use poclâ€™s GitHub issue tracker. Pull requests and other contributions are also very welcome.

This work has primarily been done by James Price from the University of Bristolâ€™s High Performance Computing Group.

Feb	MAR	Apr
	04
2022	2023	2024

NVIDIA GPU supportÂ¶

Building pocl with CUDA supportÂ¶

1) Install prerequisitesÂ¶

2) Build poclÂ¶

3) Run testsÂ¶

4) ConfigurationÂ¶

CUDA backend statusÂ¶

Conformance statusÂ¶

Tested platformsÂ¶

Known issuesÂ¶

SupportÂ¶

Table of Contents

Previous topic

Next topic

This Page

Navigation

NVIDIA GPU supportÂ¶

Building pocl with CUDA supportÂ¶

1) Install prerequisitesÂ¶

2) Build poclÂ¶

3) Run testsÂ¶

4) ConfigurationÂ¶

CUDA backend statusÂ¶

Conformance statusÂ¶

Tested platformsÂ¶

Known issuesÂ¶

SupportÂ¶

Table of Contents

Previous topic

Next topic

This Page

Quick search

Navigation