Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

PeterCalifano/cpp_cuda_template_project

Open more actions menu

Repository files navigation

cpp_cuda_template_project

A CMake template for building GPU-accelerated C++ libraries with optional CUDA/OptiX, Python/MATLAB bindings, and profiling support. Shared builds are the default, and static builds are selectable through standard CMake BUILD_SHARED_LIBS. Designed to be cloned and renamed into a real project.

Requirements

Dependency Version Notes
CMake ≥ 3.15
C++ compiler C++20 GCC 11+, Clang 13+
Eigen3 ≥ 3.4 Required
CUDA Toolkit ≥ 12.0 Optional (-DENABLE_CUDA=ON)
OptiX SDK any Optional (-DENABLE_OPTIX=ON), requires CUDA
oneTBB any Optional (-DENABLE_TBB=ON)
Catch2 3.x Auto-fetched from GitHub if not found
pyparsing latest Required for gtwrap Python/MATLAB code generation
Valgrind / perf any Optional, for profiling scripts
libgoogle-perftools-dev any Optional (-DENABLE_PROFILING=ON)

Quick Start

git clone <repo-url> my_project && cd my_project

# Default shared build (RelWithDebInfo) + run tests
./build_lib.sh

# Static library build
./build_lib.sh -D BUILD_SHARED_LIBS=OFF

# Debug build, Ninja generator, 8 jobs
./build_lib.sh -t debug -N -j 8

# Build + install to ./install
./build_lib.sh -t release -i

Optimized builds (Release, RelWithDebInfo) enable -march=native -mtune=native by default. Disable this for portable binaries with -D CPU_ENABLE_NATIVE_TUNING=OFF.

Run tests manually after a build:

cd build && ctest --output-on-failure
# Run a single test by name
ctest --output-on-failure -R <test_name>

Using as a Template

To start a new project from this template, rename the following (all in one pass with your editor's global find-and-replace):

Placeholder Replace with
template_project your project name (snake_case)
template_src your library module name
template_src_kernels your CUDA module name (or delete if no CUDA)

Files/directories to rename:

src/template_src/            --> src/<your_lib>/
src/template_src_kernels/    --> src/<your_lib>_kernels/    (if using CUDA)
src/cmake/template_projectConfig.cmake.in  --> src/cmake/<your_project>Config.cmake.in

CMakeLists.txt (root, line 11):

set(project_name "your_project_name")

What to keep as-is: the entire cmake/ module system, build_lib.sh, configure_devcontainer.sh, generate_version.sh, and the profiling/ scripts - these are project-agnostic.


Build Options

All options are passed via build_lib.sh flags or directly as -D<VAR>=<VAL> to CMake.

build_lib.sh reference

-B, --buildpath <dir>     Build directory (default: ./build)
-t, --type <type>         debug | release | relwithdebinfo | minsizerel
-j, --jobs <N>            Parallel jobs (default: nproc or 4)
-r, --rebuild-only        Skip CMake configure; rebuild sources only
-N, --ninja-build         Use Ninja generator
-f, --flagsCXX "<flags>"  Extra compiler flags (e.g. "-march=native")
-D, --define <VAR=VAL>    Extra CMake cache definitions (repeatable)
    --clean               Delete build dir before configure
    --profile             Enable profiling build (see Profiling section)
    --skip-tests          Do not run tests after build
-i, --install             Run install target after tests
-p, --python-wrap         Enable Python wrappers
-m, --matlab-wrap         Enable MATLAB wrappers
    --gtwrap-root <dir>   Path to local wrap checkout root
    --no-wrap-update      Disable auto-update of local wrap checkout to latest master
    --no-wrap-submodule-init
                          Disable wrap submodule initialization fallback
    --toolchain <file>    CMake toolchain file
-h, --help                Show full help

See doc/build_script_doc.md for a detailed option reference.

CMake feature flags

Option Default Description
ENABLE_CUDA OFF CUDA GPU acceleration
ENABLE_OPTIX OFF NVIDIA OptiX (enables CUDA automatically)
ENABLE_TBB OFF Intel oneTBB support (find_package(TBB))
ENABLE_OPENGL OFF OpenGL support
ENABLE_TESTS ON Build and run Catch2 tests
ENABLE_PROFILING OFF Profiling-friendly flags + optional gperftools
BUILD_SHARED_LIBS ON Build compiled libraries as shared (OFF builds static archives)
SANITIZE_BUILD OFF Enable sanitizers (see SANITIZERS variable)
SANITIZERS address,undefined,leak Comma-separated sanitizer list
CPU_ENABLE_NATIVE_TUNING ON Adds -march=native -mtune=native for GNU/Clang optimized builds
CPU_ENABLE_SIMD OFF Adds explicit SIMD ISA flag from CPU_SIMD_LEVEL
CPU_SIMD_LEVEL native SIMD target: native, sse4.2, avx, avx2, avx512f
CPU_ENABLE_FMA OFF Adds -mfma for GNU/Clang optimized builds
CPU_EXTRA_OPT_FLAGS "" Extra CPU optimization flags for optimized builds
CUDA_ENABLE_FMAD ON NVCC fused multiply-add control (--fmad=true/false)
CUDA_ENABLE_EXTRA_DEVICE_VECTORIZATION OFF Adds NVCC --extra-device-vectorization
CUDA_USE_FAST_MATH OFF Adds NVCC --use_fast_math to regular CUDA builds
CUDA_PTX_USE_FAST_MATH ON Adds NVCC --use_fast_math to PTX generation path
CUDA_NVCC_EXTRA_FLAGS "" Extra NVCC flags for CUDA and PTX compilation
NO_OPTIMIZATION OFF Force -O0 regardless of build type
WARNINGS_ARE_ERRORS OFF Treat all warnings as errors (-Werror)

Build type compiler flags

Build type Flags Notes
Debug -Og -g + sanitizers Max debug info
RelWithDebInfo -O2 -g + stricter warnings Default
Release -O3 Tests forced on
MinSizeRel -Os
NOPTIM -O0 -g Stricter warnings, no optimization

Optional Features

CUDA / OptiX

./build_lib.sh -D ENABLE_CUDA=ON
./build_lib.sh -D ENABLE_CUDA=ON -D ENABLE_OPTIX=ON

GPU architecture is auto-detected via nvidia-smi. CUDA kernels live in src/template_src_kernels/:

  • .cu files - standard CUDA kernels
  • .ptx.cu files - compiled to embedded const char[] arrays for OptiX modules

Auto-detection is intentionally strict:

  • On x86_64/amd64, a working nvidia-smi is required unless you set CUDA_ARCHITECTURES or CMAKE_CUDA_ARCHITECTURES explicitly.
  • On aarch64/arm64, the template first tries nvidia-smi, then falls back to native Jetson/Tegra markers for Xavier (72), Orin (87), and Thor (101).
  • If detection is unavailable or ambiguous, configure fails with guidance to set CUDA_ARCHITECTURES or CMAKE_CUDA_ARCHITECTURES explicitly.

Example with explicit CUDA optimization toggles:

./build_lib.sh -D ENABLE_CUDA=ON \
  -D CUDA_ARCHITECTURES=87 \
  -D CUDA_ENABLE_FMAD=ON \
  -D CUDA_ENABLE_EXTRA_DEVICE_VECTORIZATION=ON \
  -D CUDA_NVCC_EXTRA_FLAGS="--maxrregcount=128"

When ENABLE_OPTIX=ON, configuration also fails fast unless the project contains:

  • at least one compilable library source under src/ (*.cpp or *.cu, excluding *.ptx.cu, and excluding src/bin/)
  • at least one PTX kernel source (*.ptx.cu)

This template treats OptiX on a header-only library as a configuration error.

TBB

./build_lib.sh -D ENABLE_TBB=ON

CPU vectorization tuning

CPU_ENABLE_NATIVE_TUNING is ON by default for optimized builds.

# Disable native tuning for portable binaries
./build_lib.sh -D CPU_ENABLE_NATIVE_TUNING=OFF

# Enable explicit AVX2 + FMA flags
./build_lib.sh -D CPU_ENABLE_SIMD=ON -D CPU_SIMD_LEVEL=avx2 -D CPU_ENABLE_FMA=ON

Sanitizers

./build_lib.sh -t debug -D SANITIZE_BUILD=ON
# Custom sanitizer set:
./build_lib.sh -t debug -D SANITIZE_BUILD=ON -D SANITIZERS="address,undefined"

Python and MATLAB Wrappers (gtwrap)

This template supports wrappers via gtwrap in two modes:

  1. Installed package mode (find_package(gtwrap)).
  2. Local checkout mode (--gtwrap-root /path/to/wrap or -D<project>_GTWRAP_ROOT_DIR=...).

When -p and/or -m is used, wrapper resolution now follows this order:

  1. Use an explicit --gtwrap-root or an existing local checkout at ./wrap, ./lib/wrap, or ../wrap.
  2. Fall back to an installed gtwrap package discoverable via find_package(gtwrap).
  3. If still unresolved and GTWRAP_INIT_SUBMODULE_IF_MISSING=ON, initialize a declared wrap or lib/wrap git submodule and use that checkout.

Existing local wrap roots are updated to latest origin/master by default. This includes detached/tag states by switching/creating local master from origin/master. Pass --no-wrap-update to disable that update step, or --no-wrap-submodule-init to disable the submodule fallback entirely.

Prerequisites

Install pyparsing in the same Python environment used for wrapping:

python3 -m pip install pyparsing

pybind11 is provided by gtwrap (installed package or local checkout).

The default wrapper entrypoint is src/wrap_interface.i. If it is missing or the configured interface list is invalid, wrapper generation is auto-disabled during configure.

Build examples

# Python wrapper only
./build_lib.sh -p

# Python + MATLAB wrappers
./build_lib.sh -p -m

# Force local wrap checkout
./build_lib.sh -p --gtwrap-root /path/to/wrap

# Rebuild an already-configured wrapper build
./build_lib.sh -r -p

-p enables namespaced CMake wrapper options and ensures the resolved Python wrapper target is built when that target exists in the configured cache.

--rebuild-only does not reconfigure CMake. If you use ./build_lib.sh -r -p, the existing build directory must already have been configured with Python wrapping enabled.

Generated sources

If your wrapper interface uses gtsam::Vector/gtsam::Matrix without a full GTSAM dependency, include src/utils/wrap_adapters/GtsamAliases.h in src/wrap_interface.i to alias them to Eigen types.

Wrapper generators produce different C++ files by design:

  1. Python (pybind): <build>/wrap_interface.cpp (from top-level wrap_interface.i).
  2. MATLAB: <build>/wrap/<project>/<project>_wrapper.cpp.

Python package install workflow

Python package metadata is owned by python/pyproject.toml.in and configured into python/pyproject.toml when Python wrapping is requested. The optional setup.py.in augments installation behavior without duplicating package name/version metadata.

The checked-in python/<project>/__init__.py is the public package entrypoint:

  • import <project> is the supported import path.
  • HAS_WRAPPER is True when the compiled wrapper imports successfully.
  • HAS_WRAPPER is False when the pure-Python package imports without the wrapper.
  • WRAPPER_IMPORT_ERROR stores the wrapper import exception when fallback is active.

When Python wrapping is requested, the source package becomes the public install entrypoint. CMake updates it with:

  • generated python/pyproject.toml
  • generated python/setup.py
  • generated python/<project>/_wrapper_build.py linking the latest wrapper build

Install from the source Python package directory:

cd python
python -m pip install .

For convenience, the main project also provides:

cmake --build build --target python-install

When using Conda, activate the target environment first, then run the same command.


Versioning

Version is resolved in order:

  1. Git tags - tag format vMAJOR.MINOR.PATCH (e.g. v1.2.0)
  2. VERSION file - parsed from Project version: X.Y.Z if git is unavailable
  3. CMake defaults - 0.0.0 if neither source is available

The VERSION file is always written to the source and build directories during CMake configure. To write it without building:

./generate_version.sh

Version is available in C++ via the generated config.h:

#include "config.h"
PrintVersion();          // prints to stdout
GetVersionString();      // returns std::string
PROJECT_VERSION_MAJOR    // integer macros

Installation and Consuming as a Library

Install to the default prefix (./install) or a custom one:

./build_lib.sh -t release -i
# or with custom prefix:
./build_lib.sh -t release -i -D CMAKE_INSTALL_PREFIX=/opt/my_project
# or install a static library package:
./build_lib.sh -t release -i -D BUILD_SHARED_LIBS=OFF

In a downstream CMake project:

# Option 1: set the path explicitly
set(my_project_DIR "/path/to/install/lib/cmake/my_project")
find_package(my_project REQUIRED)

# Option 2: via CMAKE_PREFIX_PATH
cmake -DCMAKE_PREFIX_PATH=/path/to/install ...

Then link:

target_link_libraries(my_target PRIVATE my_project::my_project)

See examples/template_consumer_project/ for a complete working example.


Profiling

Profiling-friendly build

--profile adds -fno-omit-frame-pointer -fno-inline-functions to all build types - required for perf and callgrind to produce accurate call stacks even in optimized builds. Optionally links gperftools if found.

./build_lib.sh --profile -t relwithdebinfo

Profiling scripts

Three wrapper scripts live in profiling/. All share common options: -e <executable>, -o <output_dir>, -a "<args>", -t <trials>, -i <start_index>.

# Call graph analysis (valgrind callgrind)
./profiling/run_call_profiling.sh -e ./build/my_exe -o prof_results -t 3

# Heap memory profiling (valgrind massif)
./profiling/run_mem_complexity.sh -e ./build/my_exe -o prof_results

# CPU cycles / instruction count (perf)
./profiling/run_ops_profiling.sh  -e ./build/my_exe -o prof_results

Scripts auto-detect whether sudo is needed (skipped when running as root, e.g. inside a devcontainer).

Output files are written to <output_dir>/ and are gitignored by default.


DevContainer

The project ships a VS Code DevContainer configuration. To reconfigure it (base image, ROS, CUDA):

# Interactive
./configure_devcontainer.sh

# Non-interactive
./configure_devcontainer.sh --cuda --base ubuntu-22.04 --ros2 humble
./configure_devcontainer.sh --base ubuntu-22.04 --ros noetic --ros-profile desktop
./configure_devcontainer.sh --non-interactive --base ubuntu-24.04

ROS 1 requires Ubuntu 18.04 (melodic) or 20.04 (noetic). ROS 2 requires Ubuntu 22.04+.


Documentation

Doxygen documentation is auto-built when CMake finds doxygen:

cmake -B build && cmake --build build --target doc

Output goes to build/doc/html/index.html.


Project Structure

├── src/
│   ├── template_src/            Core C++ library implementation
│   ├── template_src_kernels/    CUDA kernels (.cu) and PTX sources (.ptx.cu)
│   ├── wrapped_impl/            C wrapper layer for Python/MATLAB bindings
│   ├── config.h.in              CMake-configured header (version, feature flags)
│   └── global_includes.h        Shared utilities (ANSI colors, precision constants)
├── cmake/                       CMake module system (Handle*.cmake)
├── profiling/                   Valgrind/perf wrapper scripts
├── tests/                       Catch2 unit tests and fixtures
├── examples/
│   ├── template_consumer_project/   Using the library via find_package()
│   └── template_examples/           Standalone usage examples
├── doc/                         Doxygen configuration
├── build_lib.sh                 Primary build entry point
├── generate_version.sh          Write VERSION file without building
└── configure_devcontainer.sh    Reconfigure VS Code DevContainer

About

C++/CUDA template cmake project. Unit tests supported by catch2, with doxygen autodoc and bindings using GTwrap (python, MATLAB). Provides tag-based versioning, profiling tools, dependencies management and semi-auto configured docker containers (devcontainer).

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Morty Proxy This is a proxified and sanitized view of the page, visit original site.