Python with C and CUDA example

References

Prerequisites

uv
gcc
clang-format

Recommended Packages

cuda 12.6 or later

Install

Install SO packages required for building each of the extension modules:

find src/*/ext -name "packages.json" -exec jq -r '.[].build[], .[].runtime[]' {} \
 | sort -u | xargs sudo apt-get install -y

Lock and synchronize Python packages:

uv sync

Use Python virtual environment

Activate:

source .venv/bin/activate

Deactivate:

deactivate

Re-generate torch extension stubs

.venv/bin/python -c '
  import torch
  import pybind11_stubgen;

  pybind11_stubgen.main(["my_proj.ext.torch_ext", "-o", "src"]);
'

Test

PyTest

uv run pytest ./src/tests -v

Test standalone CUDA program

make clean run PROGRAM=reduce_add

Lint/Format

Python:

uvx ruff format
uvx ruff check --fix

C/C++/CUDA:

find src -name "*.c" -o -name "*.cpp" -o -name "*.h" -o -name "*.cu" | xargs clang-format -i

Build docker

Prepare the build context:

mkdir -p build
uv export --no-dev --no-emit-project > build/requirements.txt
find src/my_proj/ext -name "packages.json" -exec jq -r '.[].build[]' {} \; | sort -u > build/packages-build.txt
find src/my_proj/ext -name "packages.json" -exec jq -r '.[].runtime[]' {} \; | sort -u > build/packages-runtime.txt

Build:

docker buildx build --pull -t my-proj . \
  --build-arg OS_VERSION=ubuntu24.04 \
  --build-arg CUDA_VERSION=12.6.0 \
  --build-arg PYTHON_VERSION=$(cat .python-version)

Profiling

CUDA Profiling

Build and run with ncu:

uv sync && ncu .venv/bin/python examples/torch_reduce_add.py

Debugging

CUDA Debugging

Build with debug symbols and run with cuda-gdb:

DEBUG_MODE=1 uv sync && cuda-gdb --args .venv/bin/python examples/torch_reduce_add.py

Project Structure

.
├── CMakeLists.txt       # Not for builds, just for CLion integration 
├── Dockerfile
├── examples/...
├── MANIFEST.in          
├── pyproject.toml
├── setup.py
├── src
│   ├── my_proj
│   │   ├── ext/...      # C/C++/CUDA extensions
│   │   ├── __init__.py  # Extensions re-exports
│   │   └── torch.py     # PyTorch wrappers for the `torch` C++/CUDA extension
│   └── tests/...
└── uv.lock

Extension Structure

Every C/C++/CUDA extension has the following structure:

_my_ext                   # Builds as `my_proj.my_ext` (no leading underscore)
├── binds
│   ├── <feature>.c/cpp     # Feature interfacing with Python/PyTorch/NumPy types
│   └── <feature>.h
├── lib
│   ├── <feature>.c/cpp/cu  # Standalone feature implementation 
│   └── <feature>.h
├── packages.json           # Required OS packages
└── module.c/cpp            # Methods table

The following extensions are available:

_math: Pure C.
_linalg: CUDA-only implementation NumPy interface in C (not installed if CUDA is missing, requires numpy).
_torch_ext: Hybrid CUDA/CPU implementation with PyTorch interface in C++ (installed in CPU-only mode if CUDA is missing, requires torch).

Dev. Environment Troubleshooting

`nvcc` not found when running `uv sync`

Issue:

error: nvcc not found at 'XXXX/bin/nvcc'. Ensure CUDA path 'XXXX' is correct.

Solution:

# Replacing `XXXX` with the expected CUDA path
mkdir -p XXXX/bin
sudo ln -s $(which nvcc) XXXX/bin/nvcc

`CMAKE_CUDA_COMPILER` not found when loading the project in CLion

Issue:

No CMAKE_CUDA_COMPILER could be found.

Tell CMake where to find the compiler by setting either the environment
variable "CUDACXX" or the CMake cache entry CMAKE_CUDA_COMPILER to the full
path to the compiler, or to the compiler name if it is in the PATH.

Solution:

Find the nvcc path:
```
which nvcc
```
Go to File -> Settings -> Build, Execution, Deployment -> CMake
In every profile, go to Environment and add the environment variable CUDACXX=.../nvcc pointing to the nvcc path.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
examples		examples
src		src
.clang-format		.clang-format
.clang-tidy		.clang-tidy
.dockerignore		.dockerignore
.gitignore		.gitignore
.python-version		.python-version
CMakeLists.txt		CMakeLists.txt
Dockerfile		Dockerfile
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
setup.py		setup.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Python with C and CUDA example

References

Prerequisites

Recommended Packages

Install

Use Python virtual environment

Re-generate torch extension stubs

Test

PyTest

Test standalone CUDA program

Lint/Format

Build docker

Profiling

CUDA Profiling

Debugging

CUDA Debugging

Project Structure

Extension Structure

Dev. Environment Troubleshooting

`nvcc` not found when running `uv sync`

`CMAKE_CUDA_COMPILER` not found when loading the project in CLion

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Python with C and CUDA example

References

Prerequisites

Recommended Packages

Install

Use Python virtual environment

Re-generate torch extension stubs

Test

PyTest

Test standalone CUDA program

Lint/Format

Build docker

Profiling

CUDA Profiling

Debugging

CUDA Debugging

Project Structure

Extension Structure

Dev. Environment Troubleshooting

nvcc not found when running uv sync

CMAKE_CUDA_COMPILER not found when loading the project in CLion

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`nvcc` not found when running `uv sync`

`CMAKE_CUDA_COMPILER` not found when loading the project in CLion

Packages