- https://docs.python.org/3/c-api
- https://numpy.org/doc/stable/reference/c-api
- https://docs.nvidia.com/cuda/cuda-programming-guide/
- https://pytorch.org/cppdocs/
- https://pytorch.org/tutorials/advanced/cpp_extension.html
- https://pybind11.readthedocs.io/
- uv
- gcc
- clang-format
- cuda 12.6 or later
Install SO packages required for building each of the extension modules:
find src/*/ext -name "packages.json" -exec jq -r '.[].build[], .[].runtime[]' {} \
| sort -u | xargs sudo apt-get install -yLock and synchronize Python packages:
uv syncActivate:
source .venv/bin/activateDeactivate:
deactivate.venv/bin/python -c '
import torch
import pybind11_stubgen;
pybind11_stubgen.main(["my_proj.ext.torch_ext", "-o", "src"]);
'uv run pytest ./src/tests -vmake clean run PROGRAM=reduce_addPython:
uvx ruff format
uvx ruff check --fixC/C++/CUDA:
find src -name "*.c" -o -name "*.cpp" -o -name "*.h" -o -name "*.cu" | xargs clang-format -iPrepare the build context:
mkdir -p build
uv export --no-dev --no-emit-project > build/requirements.txt
find src/my_proj/ext -name "packages.json" -exec jq -r '.[].build[]' {} \; | sort -u > build/packages-build.txt
find src/my_proj/ext -name "packages.json" -exec jq -r '.[].runtime[]' {} \; | sort -u > build/packages-runtime.txtBuild:
docker buildx build --pull -t my-proj . \
--build-arg OS_VERSION=ubuntu24.04 \
--build-arg CUDA_VERSION=12.6.0 \
--build-arg PYTHON_VERSION=$(cat .python-version)Build and run with ncu:
uv sync && ncu .venv/bin/python examples/torch_reduce_add.pyBuild with debug symbols and run with cuda-gdb:
DEBUG_MODE=1 uv sync && cuda-gdb --args .venv/bin/python examples/torch_reduce_add.py.
├── CMakeLists.txt # Not for builds, just for CLion integration
├── Dockerfile
├── examples/...
├── MANIFEST.in
├── pyproject.toml
├── setup.py
├── src
│ ├── my_proj
│ │ ├── ext/... # C/C++/CUDA extensions
│ │ ├── __init__.py # Extensions re-exports
│ │ └── torch.py # PyTorch wrappers for the `torch` C++/CUDA extension
│ └── tests/...
└── uv.lock
Every C/C++/CUDA extension has the following structure:
_my_ext # Builds as `my_proj.my_ext` (no leading underscore)
├── binds
│ ├── <feature>.c/cpp # Feature interfacing with Python/PyTorch/NumPy types
│ └── <feature>.h
├── lib
│ ├── <feature>.c/cpp/cu # Standalone feature implementation
│ └── <feature>.h
├── packages.json # Required OS packages
└── module.c/cpp # Methods table
The following extensions are available:
_math: Pure C._linalg: CUDA-only implementation NumPy interface in C (not installed if CUDA is missing, requires numpy)._torch_ext: Hybrid CUDA/CPU implementation with PyTorch interface in C++ (installed in CPU-only mode if CUDA is missing, requires torch).
Issue:
error: nvcc not found at 'XXXX/bin/nvcc'. Ensure CUDA path 'XXXX' is correct.
Solution:
# Replacing `XXXX` with the expected CUDA path
mkdir -p XXXX/bin
sudo ln -s $(which nvcc) XXXX/bin/nvccIssue:
No CMAKE_CUDA_COMPILER could be found.
Tell CMake where to find the compiler by setting either the environment
variable "CUDACXX" or the CMake cache entry CMAKE_CUDA_COMPILER to the full
path to the compiler, or to the compiler name if it is in the PATH.
Solution:
- Find the
nvccpath:which nvcc
- Go to
File->Settings->Build, Execution, Deployment->CMake - In every profile, go to
Environmentand add the environment variableCUDACXX=.../nvccpointing to thenvccpath.