CCCL Python Libraries

Overview

The CUDA Core Compute Libraries (CCCL) for Python are a collection of modules with the shared goal of providing high-quality, high-performance, and easy-to-use abstractions for CUDA Python developers.

:doc:`cuda.compute <compute>` — Composable device-level primitives for building custom parallel algorithms, without writing CUDA kernels directly.
:doc:`cuda.coop <coop>` — Cooperative block- and warp-level algorithms for writing highly efficient CUDA kernels with Numba CUDA.
:doc:`cuda.stf <stf>` — Sequential Task Flow for CUDA: define logical data and tasks with read/write annotations; STF orchestrates execution and data movement.

These libraries expose the generic, highly-optimized algorithms from the CCCL C++ libraries, which have been tuned to provide optimal performance across GPU architectures.

Who is this for?

Library authors building parallel algorithms that need portable performance across GPU architectures—without dropping to CUDA C++.
Application developers using PyTorch, CuPy, or other GPU-accelerated frameworks who need custom algorithms beyond what those libraries provide.

.. toctree::
   :maxdepth: 2
   :caption: CCCL Python Libraries

   setup
   compute
   coop
   stf
   resources
   api_reference

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CCCL Python Libraries

Overview

Who is this for?

FilesExpand file tree

index.rst

Latest commit

History

index.rst

File metadata and controls

CCCL Python Libraries

Overview

Who is this for?