Skip to content
Open
Show file tree
Hide file tree
Changes from 250 commits
Commits
Show all changes
373 commits
Select commit Hold shift + click to select a range
b04cebf
Implement like_empty
caugonnet Aug 28, 2025
9ed5ace
More comprehensive FHE test
caugonnet Aug 28, 2025
e27ef5b
test fhe with stf decorator
caugonnet Aug 28, 2025
d0f915e
Merge branch 'main' into stf_c_api
caugonnet Aug 28, 2025
6963ec0
fix merge error
caugonnet Aug 28, 2025
06fab11
Appropriate checks
caugonnet Aug 29, 2025
2fc802e
Add missing ;
caugonnet Aug 29, 2025
a43db62
- Make it possible to create a borrowed context from a handle
caugonnet Aug 29, 2025
9c07679
invert ctx and exec place in the decorator
caugonnet Aug 29, 2025
947bbcc
fix decorator api
caugonnet Aug 29, 2025
22b2d19
Add ciphertext.like_empty()
caugonnet Aug 29, 2025
66bcde3
Removing prints
caugonnet Aug 29, 2025
84534c8
do not import specific methods
caugonnet Aug 29, 2025
acf0cce
fix decorator api
caugonnet Aug 29, 2025
6a6e84f
Add a pytorch experiment
Aug 29, 2025
297a69b
more pytorch test
Aug 29, 2025
533ca5a
better interop with pytorch
Aug 29, 2025
9aa749f
remove useless pass
Aug 29, 2025
b11aa4b
tensor_arguments
Aug 29, 2025
0af151f
simpler code
Aug 29, 2025
746d308
pre-commit hooks
caugonnet Aug 29, 2025
d9195f5
try to remove dependency on torch and have adapters (WIP)
caugonnet Aug 31, 2025
f5ac828
remove unused code
caugonnet Aug 31, 2025
454a5da
cleanups
caugonnet Aug 31, 2025
ccfbb6b
fix numba adapter
caugonnet Aug 31, 2025
c6e7c07
skip torch test if torch is not available
caugonnet Aug 31, 2025
842a651
add dot vertex even in the low level api
caugonnet Aug 31, 2025
00c649c
fix types
caugonnet Aug 31, 2025
b0fc18d
pre-commit hooks
caugonnet Aug 31, 2025
3b257df
Merge branch 'main' into stf_c_api
caugonnet Aug 31, 2025
04cc07a
dot add_vertex is done in start() now
caugonnet Aug 31, 2025
bce25b8
Start to implement the FDTD example in pytorch
caugonnet Sep 1, 2025
d9c5f11
Start to port in STF version of pytorch
caugonnet Sep 1, 2025
70fa5d8
Adapt the FDTD example to use STF constructs and add methods to initi…
caugonnet Sep 1, 2025
5587a8d
format issue
caugonnet Sep 1, 2025
5ea5243
charset issue
caugonnet Sep 1, 2025
f7fbd34
rank agnostic method to init
caugonnet Sep 1, 2025
aec2d71
use .zero_() to blank fields
caugonnet Sep 1, 2025
eb71880
print values
caugonnet Sep 1, 2025
aaf6ec6
Experiment to display output as an image
caugonnet Sep 1, 2025
ae4c6d6
Use non blocking API
caugonnet Sep 2, 2025
9029fda
remove dead code
caugonnet Sep 2, 2025
ce7a33b
remove dead code
caugonnet Sep 2, 2025
cbde742
minor cleanup
caugonnet Sep 2, 2025
1936db6
Merge branch 'main' into stf_c_api
caugonnet Sep 2, 2025
c91e814
clang-format
caugonnet Sep 2, 2025
3fe6178
Add a C library for CUDASTF (to be used in the python bindings)
caugonnet Sep 2, 2025
666bd07
Merge branch 'main' into stf_c_lib
caugonnet Sep 2, 2025
522b630
remove dead code
caugonnet Sep 2, 2025
4315314
do define and use CCCL_C_EXPERIMENTAL_STF_ENABLE_TESTING
caugonnet Sep 2, 2025
48627aa
Add CUDASTF C lib to tests
caugonnet Sep 2, 2025
410aadd
Merge branch 'main' into stf_c_lib
caugonnet Sep 2, 2025
c87cdaa
Add missing headers
caugonnet Sep 2, 2025
02a9eb6
use snake_case
caugonnet Sep 2, 2025
232133b
Do define CCCL_C_EXPERIMENTAL=1
caugonnet Sep 2, 2025
b60eb6b
Do not do redundant tests
caugonnet Sep 2, 2025
c4c99f0
Add a project to ci/inspect_changes.sh
caugonnet Sep 2, 2025
2f5925b
missing changes in previous commit
caugonnet Sep 2, 2025
3417075
add presets
caugonnet Sep 2, 2025
8c05034
Add override matrix
alliepiper Sep 2, 2025
20faa8f
Properly define structs with a typedef and remove superfluous struct …
caugonnet Sep 3, 2025
d378f5a
Merge branch 'main' into stf_c_lib
caugonnet Sep 3, 2025
8c5e760
fix previous merge
caugonnet Sep 3, 2025
78dc197
Change tensor_arguments to return an element instead of a tuple of on…
caugonnet Sep 3, 2025
2eb2ace
Remove intermediate structures and use opaque pointers instead
caugonnet Sep 3, 2025
6557067
Automatically generated documentation
caugonnet Sep 3, 2025
60266ff
Better implementation of the help to convert C places to the C++ API,…
caugonnet Sep 3, 2025
59f1983
Tell where to find cudax, and remove unnecessary libs
caugonnet Sep 3, 2025
c7fa9e6
Merge branch 'main' into stf_c_lib
caugonnet Sep 3, 2025
97dd6f7
CCCL_ENABLE_C enables c/parallel, CCCL_ENABLE_C_EXPERIMENTAL_STF enab…
caugonnet Sep 3, 2025
1610f0b
Remove unnecessary definitions
caugonnet Sep 3, 2025
4383eaf
Merge branch 'main' into stf_c_lib
caugonnet Sep 3, 2025
101fd0b
Merge branch 'main' into stf_c_lib
caugonnet Sep 4, 2025
4db210b
Merge branch 'main' into stf_c_lib
caugonnet Sep 5, 2025
90a8d20
use more consistent option names
caugonnet Sep 5, 2025
f2d7528
Merge branch 'main' into stf_c_lib
caugonnet Sep 9, 2025
ac667ca
Do not use [[maybe_unused]] for the C lib header because this is only…
caugonnet Sep 9, 2025
5bf62b3
Return an error code in stf_cuda_kernel_add_desc rather than use asse…
caugonnet Sep 9, 2025
c0a54f1
clang-format
caugonnet Sep 9, 2025
4573f9f
Merge branch 'main' into stf_c_lib
caugonnet Sep 9, 2025
abc58d8
Merge branch 'main' into stf_c_api
caugonnet Sep 9, 2025
af43da5
Merge stf_c_lib: Update c/ directory with complete C library implemen…
caugonnet Sep 9, 2025
c00c915
Revert Python linting changes
caugonnet Sep 9, 2025
cdd0d85
Fix Python CMakeLists.txt: Update C library feature flags
caugonnet Sep 9, 2025
afda29f
Fix Python build: Add missing CCCL_ENABLE_C master flag
caugonnet Sep 9, 2025
4f1f079
Complete STF C library configuration: Enable all C library features a…
caugonnet Sep 9, 2025
ccfc41d
Remove obsolete CCCL_ENABLE_C flag
caugonnet Sep 9, 2025
e4b8277
Update CMake configuration to match stf_c_lib structure
caugonnet Sep 9, 2025
6931fa8
Optimize Python build: Remove unnecessary C parallel library
caugonnet Sep 9, 2025
a1a1139
clang-format
caugonnet Sep 9, 2025
a3071f7
Merge branch 'stf_c_lib' into stf_c_api
caugonnet Sep 9, 2025
ecd9f4e
fix pytorch example
caugonnet Sep 9, 2025
4b2ae75
use ascii symbols
caugonnet Sep 9, 2025
5881081
Merge branch 'main' into stf_c_api
caugonnet Sep 9, 2025
4eef870
Merge branch 'main' into stf_c_api
caugonnet Sep 10, 2025
dcb3d39
Cleanup some changes in the infra from a previous merge
caugonnet Sep 10, 2025
1284eb2
Implement logical_data_empty logical_data_zeros, and logical_data_full
caugonnet Sep 10, 2025
0514f29
short names for torch.cuda
caugonnet Sep 10, 2025
5e9b4d5
Introduce pytorch_task
caugonnet Sep 10, 2025
53a4542
clang-format and some minor comment
caugonnet Sep 10, 2025
989f58b
Merge branch 'main' into stf_c_api
caugonnet Sep 17, 2025
93055c0
Merge branch 'main' into stf_c_api
caugonnet Sep 23, 2025
218fda2
make sure stf python tests are wrapped into functions so that pytest …
caugonnet Sep 25, 2025
1f97482
fix the return values of pytests
caugonnet Sep 25, 2025
1e482a4
Merge branch 'main' into stf_c_api
caugonnet Sep 25, 2025
7a58d68
Start to experiment with Warp
caugonnet Sep 25, 2025
9fb1c26
logical_data in python are now initialized with a data place, and the…
caugonnet Sep 25, 2025
5c1d50e
Save WIP: add access modes
caugonnet Sep 25, 2025
9f31b1e
cleanups
caugonnet Sep 25, 2025
c0bb070
Save WIP
caugonnet Sep 25, 2025
7094dd5
Merge branch 'main' into stf_c_api
caugonnet Oct 7, 2025
76d78b4
Adopt to new python hierarchy
caugonnet Oct 8, 2025
e03b062
Merge branch 'main' into stf_c_api
caugonnet Oct 8, 2025
0c11b6a
fix errors in a previous merge
caugonnet Oct 8, 2025
f6c50e1
cuda.cccl.experimental.stf => cuda.stf
caugonnet Oct 8, 2025
efea184
Misc stf python tests improvements
caugonnet Oct 8, 2025
c0d3592
Save WIP on this warp example
caugonnet Oct 8, 2025
eba61eb
Add sanity checks to test the is_void_interface() API
caugonnet Oct 8, 2025
e17c261
support tokens in python
caugonnet Oct 8, 2025
ec9c955
remove debug print
caugonnet Oct 8, 2025
52f4823
python cholesky with cupy
caugonnet Oct 8, 2025
5a32881
improve cholesky example
caugonnet Oct 8, 2025
abd5778
POTRI and Cholesky
caugonnet Oct 9, 2025
80e1085
clang-format
caugonnet Oct 9, 2025
865cf7b
Merge branch 'main' into stf_c_api
caugonnet Oct 9, 2025
4c1551a
how changes to numba-cuda have been merged
caugonnet Oct 9, 2025
77d6af1
Merge branch 'main' into stf_c_api
caugonnet Nov 14, 2025
acc8f49
Merge branch 'main' into stf_c_api
andralex Nov 14, 2025
de333b2
Fix CI precommit
andralex Nov 14, 2025
3834c8f
Merge branch 'main' into stf_c_api
andralex Nov 15, 2025
9a5c265
no need for numba.cuda.config.CUDA_ENABLE_PYNVJITLINK = 1 anymore
caugonnet Nov 24, 2025
9932a24
Merge origin/main into stf_c_api
caugonnet Nov 24, 2025
e7e2adb
Our numba-cuda fix is part of 0.21.0
caugonnet Nov 24, 2025
39040a9
Minor doc fix
caugonnet Nov 25, 2025
8f27fa2
Ensure matplotlib is only used if available
caugonnet Nov 25, 2025
73ac963
Cleanup examples
caugonnet Nov 25, 2025
d90ed64
cmake fix
caugonnet Nov 25, 2025
eb77519
Cmake fixes (need extra cleanup)
caugonnet Nov 25, 2025
b38ff80
Work-around for lazy resource init during graph capture in cuda core
caugonnet Nov 25, 2025
0a3e667
Use a relaxed capture mode
caugonnet Nov 25, 2025
8642fdd
This work-around is not needed anymore with a relaxed capture mode
caugonnet Nov 25, 2025
2a75766
Merge branch 'main' into stf_c_api
caugonnet Nov 25, 2025
0f9865d
cleanup warp example
caugonnet Nov 25, 2025
6466347
Cleanups in the cython code for STF
caugonnet Nov 25, 2025
cfb2930
no need for math.prod for such a simple thing
caugonnet Nov 26, 2025
130ee2a
Simpler code to handle vector types
caugonnet Nov 26, 2025
4bb4d23
fix grid dimension
caugonnet Nov 26, 2025
b8c745e
Use from_dlpack
caugonnet Nov 26, 2025
fb2a3ba
Change the mock-up FHE toy example to have operations that are homomo…
caugonnet Nov 26, 2025
6c2f850
Merge branch 'main' into stf_c_api
caugonnet Nov 26, 2025
da2e1aa
Add some explanation for the use of a relaxed capture mode
caugonnet Nov 26, 2025
852b400
cleaner pytorch adapter
caugonnet Nov 26, 2025
9308af5
Merge branch 'main' into stf_c_api
caugonnet Nov 27, 2025
09913dc
Code simplification
caugonnet Nov 26, 2025
237b2c1
minor fixes
caugonnet Dec 16, 2025
dd6cc26
Merge branch 'main' into stf_c_api
caugonnet Feb 3, 2026
ac148e8
Merge branch 'main' into stf_c_api
caugonnet Feb 8, 2026
5fedcfb
remove a change from main
caugonnet Feb 9, 2026
1fa449f
Merge branch 'main' into stf_c_api
caugonnet Feb 9, 2026
9839495
avoid a pre-commit fail
caugonnet Feb 9, 2026
65155d1
Include STF python bindings in CI
caugonnet Feb 9, 2026
1cce4d4
Make the script executable
caugonnet Feb 9, 2026
1dbfd64
Disable CUFILE in the python build
caugonnet Feb 9, 2026
5545ffb
Attempt to fix compilation on aarch64
caugonnet Feb 9, 2026
291e00c
fix a type conversion issue
caugonnet Feb 9, 2026
3a12081
Merge branch 'main' into stf_c_api
caugonnet Feb 9, 2026
4b54abc
try to fix python packages
caugonnet Feb 9, 2026
97d9b8b
Merge branch 'main' into stf_c_api
caugonnet Feb 9, 2026
5f950c4
gersemi pre-commit hook
caugonnet Feb 10, 2026
8727b24
Conditionally provide the jit decorator if numba-cuda is available
caugonnet Feb 10, 2026
f4c8800
clang-format
caugonnet Feb 10, 2026
ac980ec
Skip STF with MSVC in CI
caugonnet Feb 11, 2026
4821ebd
Merge branch 'main' into stf_c_api
caugonnet Feb 11, 2026
8559e8c
More consistent examples
caugonnet Feb 12, 2026
698739e
pre-commit hooks
caugonnet Feb 12, 2026
4d73287
Add missing copyrights
caugonnet Feb 12, 2026
97ae928
add missing file
caugonnet Feb 12, 2026
6903af7
like_empty -> empty_like
caugonnet Feb 12, 2026
99655d3
Report if the STF bindings cannot be loaded
caugonnet Feb 12, 2026
096ea44
Avoid a global context variable in fhe tests
caugonnet Feb 12, 2026
08fa67d
support an optional name= field in logical_data init methods to have …
caugonnet Feb 12, 2026
5145dff
more consistent aliases in example
caugonnet Feb 12, 2026
cd51231
Fix cmake message
caugonnet Feb 12, 2026
5245067
Remove commented debug leftovers
caugonnet Feb 12, 2026
c055d52
Merge branch 'main' into stf_c_api
caugonnet Feb 12, 2026
79be7ec
fix string format
caugonnet Feb 12, 2026
606896d
Do not tamper HOST_COMPILER
caugonnet Feb 12, 2026
da0487e
Use the existing mechanism to cleanly exclude the test_py_stf job fro…
caugonnet Feb 12, 2026
0fd485a
Merge branch 'main' into stf_c_api
caugonnet Feb 12, 2026
9d8ed4b
Merge branch 'main' into stf_c_api
caugonnet Feb 13, 2026
d9b1ca5
Merge branch 'main' into stf_c_api
caugonnet Feb 13, 2026
f78290a
Merge branch 'main' into stf_c_api
caugonnet Feb 13, 2026
da27328
Restore C in STF's C lib
caugonnet Feb 13, 2026
f22aa6b
Merge branch 'main' into stf_c_api
caugonnet Feb 23, 2026
e4bafa9
Use cuda.core.Buffer.fill (except for 8 bytes values) instead of cupy…
caugonnet Feb 25, 2026
a0c8227
Move fill utilities
caugonnet Feb 25, 2026
2e63918
Make pytorch_task a free function and move it to the test directory
caugonnet Feb 25, 2026
b21d930
wrappers to build pytorch tensors outside of cuda.stf
caugonnet Feb 25, 2026
de6f2f8
Remove dead code
caugonnet Feb 25, 2026
2f89cc1
Move numba utilities outside of the core cuda.stf
caugonnet Feb 25, 2026
1df9b9b
clang-format
caugonnet Feb 25, 2026
97ef675
Move the jit numba decorator in tests too
caugonnet Feb 25, 2026
16e8eec
Add missing file
caugonnet Feb 25, 2026
9fc3250
Merge branch 'main' into stf_c_api
caugonnet Feb 25, 2026
a516a2e
Only keep a cupy fallback to fill 8bytes values, not both cupy and numba
caugonnet Feb 25, 2026
ae43907
Some details about the stf_cai for CAI v3
caugonnet Feb 25, 2026
64cf200
Use relative paths to fix tests in CI
caugonnet Feb 25, 2026
b7abbac
pre-commit hooks
caugonnet Feb 25, 2026
daae555
Add a doc for cuda.stf
caugonnet Feb 25, 2026
0089958
Ensure cuda.stf is usable
caugonnet Feb 26, 2026
201e198
Merge branch 'main' into stf_c_api
caugonnet Feb 26, 2026
96c2421
Try to fix cuda.stf CI
caugonnet Feb 26, 2026
f6d5c2a
Merge branch 'main' into stf_c_api
caugonnet Feb 26, 2026
664f61d
remove __init__.py from test/stf to avoid confusion between libs
caugonnet Feb 26, 2026
5d735d7
Merge branch 'main' into stf_c_api
caugonnet Feb 26, 2026
5f0b044
Merge branch 'main' into stf_c_api
caugonnet Feb 28, 2026
cf7c11e
pre-commit hooks
caugonnet Feb 28, 2026
2570bae
Merge branch 'main' into stf_c_api
caugonnet Mar 2, 2026
62404a6
Merge branch 'main' into stf_c_api
caugonnet Mar 3, 2026
c7b6aab
Merge branch 'main' into stf_c_api
caugonnet Mar 9, 2026
b0df198
Merge branch 'main' into stf_c_api
caugonnet Mar 10, 2026
1f9b89a
There should be no __init__.py file here, otherwise tests becomes a p…
caugonnet Mar 10, 2026
94bdd96
Merge branch 'main' into stf_c_api
caugonnet Mar 10, 2026
832fd76
Disable SASS verification for tests which might generate LDL instruct…
caugonnet Mar 10, 2026
52fb401
Merge branch 'main' into stf_c_api
caugonnet Mar 10, 2026
5156a03
Merge branch 'main' into stf_c_api
caugonnet Mar 10, 2026
7993f55
Merge branch 'main' into stf_c_api
caugonnet Mar 11, 2026
113276c
Merge branch 'main' into stf_c_api
caugonnet Mar 13, 2026
5f12cca
Merge branch 'main' into stf_c_api
caugonnet Mar 13, 2026
961ae8b
Merge branch 'main' into stf_c_api
caugonnet Mar 16, 2026
7bbb57c
Merge branch 'main' into stf_c_api
caugonnet Mar 20, 2026
04f3d6c
Fix STF init helpers with exec_place
caugonnet Mar 22, 2026
3553134
Use CAI v3 for STF task views
caugonnet Mar 22, 2026
dc78d7d
Merge branch 'main' into stf_c_api
caugonnet Mar 25, 2026
779153f
Merge branch 'main' into stf_c_api
caugonnet Mar 25, 2026
6cf5276
Make STF Python bindings opt-in during source builds
caugonnet Mar 25, 2026
4f6d518
pre-commit hooks
caugonnet Mar 25, 2026
223c74d
Merge branch 'main' into stf_c_api
caugonnet Mar 26, 2026
fca46b0
Extract STF into separate cuda-cccl-experimental wheel
caugonnet Mar 26, 2026
a94034a
Merge branch 'main' into stf_c_api
caugonnet Mar 26, 2026
d3baa83
Fix CI workflow build: use single string for `needs` in test_py_stf
caugonnet Mar 26, 2026
c662728
pre-commit hooks
caugonnet Mar 26, 2026
157c772
Update inspect_changes test fixtures for python_experimental project
caugonnet Mar 26, 2026
9b7676a
Rename CI scripts to match project name convention
caugonnet Mar 26, 2026
95af444
Leave python/cuda_cccl/ untouched
caugonnet Mar 26, 2026
f491332
Merge branch 'main' into stf_c_api
caugonnet Mar 26, 2026
03ed80e
fixes for doc
caugonnet Mar 26, 2026
2f4a651
Merge branch 'main' into stf_c_api
caugonnet Mar 27, 2026
a4a001c
Merge branch 'main' into stf_c_api
andralex Mar 28, 2026
443adbc
Merge branch 'main' into stf_c_api
caugonnet Mar 29, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
130 changes: 130 additions & 0 deletions ci/build_cuda_cccl_experimental_python_experimental.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,130 @@
#!/bin/bash
set -euo pipefail

ci_dir="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"

usage="Usage: $0 -py-version <python_version> [additional options...]"

source "$ci_dir/util/python/common_arg_parser.sh"
parse_python_args "$@"

# Check if py_version was provided (this script requires it)
require_py_version "$usage" || exit 1

echo "Docker socket: " $(ls /var/run/docker.sock)

if [[ -n "${GITHUB_ACTIONS:-}" ]]; then
# Prepare mount points etc for getting artifacts in/out of the container.
source "$ci_dir/util/artifacts/common.sh"
action_mounts=$(cat <<EOF
--mount type=bind,source=${ARTIFACT_ARCHIVES},target=${ARTIFACT_ARCHIVES} \
--mount type=bind,source=${ARTIFACT_UPLOAD_STAGE},target=${ARTIFACT_UPLOAD_STAGE}
EOF
)

else
action_mounts=""
fi

readonly cuda12_version=12.9.1
readonly cuda13_version=13.0.2
readonly devcontainer_version=26.02
readonly devcontainer_distro=rockylinux8

if [[ "$(uname -m)" == "aarch64" ]]; then
readonly cuda12_image=rapidsai/ci-wheel:${devcontainer_version}-cuda${cuda12_version}-${devcontainer_distro}-py${py_version}-arm64
readonly cuda13_image=rapidsai/ci-wheel:${devcontainer_version}-cuda${cuda13_version}-${devcontainer_distro}-py${py_version}-arm64
else
readonly cuda12_image=rapidsai/ci-wheel:${devcontainer_version}-cuda${cuda12_version}-${devcontainer_distro}-py${py_version}
readonly cuda13_image=rapidsai/ci-wheel:${devcontainer_version}-cuda${cuda13_version}-${devcontainer_distro}-py${py_version}
fi

mkdir -p wheelhouse_experimental

for ctk in 12 13; do
image=$(eval echo \$cuda${ctk}_image)
echo "::group::⚒️ Building CUDA ${ctk} experimental wheel on ${image}"
(
set -x
docker pull $image
docker run --rm -i \
--workdir /workspace/python/cuda_cccl_experimental \
--mount type=bind,source=${HOST_WORKSPACE},target=/workspace/ \
${action_mounts} \
--env py_version=${py_version} \
--env GITHUB_ACTIONS=${GITHUB_ACTIONS:-} \
--env GITHUB_RUN_ID=${GITHUB_RUN_ID:-} \
--env JOB_ID=${JOB_ID:-} \
$image \
/workspace/ci/build_cuda_cccl_experimental_wheel.sh
# Prevent GHA runners from exhausting available storage with leftover images:
if [[ -n "${GITHUB_ACTIONS:-}" ]]; then
docker rmi -f $image
fi
)
echo "::endgroup::"
done

echo "Merging CUDA experimental wheels..."

# Needed for unpacking and repacking wheels.
python -m pip install wheel

# Find the built wheels
cu12_wheel=$(find wheelhouse_experimental -name "*cu12*.whl" | head -1)
cu13_wheel=$(find wheelhouse_experimental -name "*cu13*.whl" | head -1)

if [[ -z "$cu12_wheel" ]]; then
echo "Error: CUDA 12 experimental wheel not found in wheelhouse_experimental/"
ls -la wheelhouse_experimental/
exit 1
fi

if [[ -z "$cu13_wheel" ]]; then
echo "Error: CUDA 13 experimental wheel not found in wheelhouse_experimental/"
ls -la wheelhouse_experimental/
exit 1
fi

echo "Found CUDA 12 wheel: $cu12_wheel"
echo "Found CUDA 13 wheel: $cu13_wheel"

# Merge the wheels
python python/cuda_cccl_experimental/merge_cuda_wheels.py "$cu12_wheel" "$cu13_wheel" --output-dir wheelhouse_experimental_merged

# Install auditwheel and repair the merged wheel
python -m pip install patchelf auditwheel
for wheel in wheelhouse_experimental_merged/cuda_cccl_experimental-*.whl; do
echo "Repairing merged wheel: $wheel"
python -m auditwheel repair \
--exclude 'libnvrtc.so.12' \
--exclude 'libnvrtc.so.13' \
--exclude 'libnvJitLink.so.12' \
--exclude 'libnvJitLink.so.13' \
--exclude 'libcuda.so.1' \
"$wheel" \
--wheel-dir wheelhouse_experimental_final
done

# Clean up intermediate files and move only the final merged wheel
rm -rf wheelhouse_experimental/*
mkdir -p wheelhouse_experimental

if ls wheelhouse_experimental_final/cuda_cccl_experimental-*.whl 1> /dev/null 2>&1; then
mv wheelhouse_experimental_final/cuda_cccl_experimental-*.whl wheelhouse_experimental/
echo "Final merged experimental wheel moved to wheelhouse_experimental"
else
echo "No final repaired wheel found, moving unrepaired merged wheel"
mv wheelhouse_experimental_merged/cuda_cccl_experimental-*.whl wheelhouse_experimental/
fi

# Clean up temporary directories
rm -rf wheelhouse_experimental_merged wheelhouse_experimental_final

echo "Final experimental wheels in wheelhouse_experimental:"
ls -la wheelhouse_experimental/

if [[ -n "${GITHUB_ACTIONS:-}" ]]; then
wheel_artifact_name="$(ci/util/workflow/get_wheel_artifact_name.sh)_experimental"
ci/util/artifacts/upload.sh $wheel_artifact_name 'wheelhouse_experimental/.*'
fi
61 changes: 61 additions & 0 deletions ci/build_cuda_cccl_experimental_wheel.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
#!/bin/bash
set -euo pipefail

# Target script for `docker run` command in build_cuda_cccl_experimental_python_experimental.sh
# The /workspace pathnames are hard-wired here.

# Install GCC 13 toolset (needed for the build)
/workspace/ci/util/retry.sh 5 30 dnf -y install gcc-toolset-13-gcc gcc-toolset-13-gcc-c++
echo -e "#!/bin/bash\nsource /opt/rh/gcc-toolset-13/enable" >/etc/profile.d/enable_devtools.sh
source /etc/profile.d/enable_devtools.sh

# Check what's available
which gcc
gcc --version
which nvcc
nvcc --version

# Set up Python environment
source /workspace/ci/pyenv_helper.sh
setup_python_env "${py_version}"
which python
python --version
echo "Done setting up python env"

# Figure out the version to use for the package, we need repo history
if $(git rev-parse --is-shallow-repository); then
git fetch --unshallow
fi
export PACKAGE_VERSION_PREFIX="0.1."
package_version=$(/workspace/ci/generate_version.sh)
echo "Using package version ${package_version}"
# Override the version used by setuptools_scm to the custom version
export SETUPTOOLS_SCM_PRETEND_VERSION_FOR_CUDA_CCCL_EXPERIMENTAL="${package_version}"

cd /workspace/python/cuda_cccl_experimental

# Determine CUDA version from nvcc
cuda_version=$(nvcc --version | grep -oP 'release \K[0-9]+\.[0-9]+' | cut -d. -f1)
echo "Detected CUDA version: ${cuda_version}"

# Configure compilers:
export CXX="$(which g++)"
export CUDACXX="$(which nvcc)"
export CUDAHOSTCXX="$(which g++)"

# Build the wheel
python -m pip wheel --no-deps --verbose --wheel-dir dist .

# Rename wheel to include CUDA version suffix
for wheel in dist/cuda_cccl_experimental-*.whl; do
if [[ -f "$wheel" ]]; then
base_name=$(basename "$wheel" .whl)
new_name="${base_name}.cu${cuda_version}.whl"
mv "$wheel" "dist/${new_name}"
echo "Renamed wheel to: ${new_name}"
fi
done

# Move wheel to output directory
mkdir -p /workspace/wheelhouse_experimental
mv dist/cuda_cccl_experimental-*.cu*.whl /workspace/wheelhouse_experimental/
20 changes: 20 additions & 0 deletions ci/matrix.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,8 @@ workflows:
- {jobs: ['test'], project: 'python', ctk: ['12.X', '13.X'], py_version: ['3.10'], gpu: 'l4', cxx: ['gcc13', 'msvc']}
- {jobs: ['test'], project: 'python', ctk: ['12.X','13.0', '13.X'], py_version: ['3.13'], gpu: 'l4', cxx: ['gcc13', 'msvc']}
- {jobs: ['test'], project: 'python', py_version: '3.13', gpu: 'h100', cxx: 'gcc13'}
# Python experimental (STF) -- pinned to gcc13, Linux only
- {jobs: ['test'], project: 'python_experimental', ctk: ['12.X', '13.X'], py_version: ['3.13'], gpu: 'l4', cxx: 'gcc13'}
# CCCL packaging:
- {jobs: ['test'], project: 'packaging', ctk: '12.0', cxx: ['gcc10', 'clang14'], gpu: 'rtx2080', args: '-min-cmake'}
- {jobs: ['test'], project: 'packaging', ctk: '12.X', cxx: ['gcc10', 'clang14'], gpu: 'rtx2080'}
Expand Down Expand Up @@ -118,6 +120,7 @@ workflows:
- {project: 'cccl_c_parallel', jobs: ['test'], ctk: '13.X', cxx: ['gcc13', 'msvc'], gpu: 'rtx2080', sm: 'gpu'}
- {project: 'cccl_c_stf', jobs: ['test'], ctk: '13.X', cxx: 'gcc13', gpu: 'rtx2080', sm: 'gpu'}
- {project: 'python', jobs: ['test'], ctk: '13.X', py_version: '3.13', gpu: 'l4', cxx: ['gcc13', 'msvc']}
- {project: 'python_experimental', jobs: ['test'], ctk: '13.X', py_version: '3.13', gpu: 'l4', cxx: 'gcc13'}
# Packaging / install
- {project: 'packaging', jobs: ['test'], ctk: '13.X', cxx: ['gcc', 'clang'], gpu: 'rtx2080', sm: 'gpu'}
- {project: 'packaging', jobs: ['test'], args: '-min-cmake', gpu: 'rtx2080', sm: 'gpu'}
Expand Down Expand Up @@ -192,6 +195,9 @@ workflows:
# Python -- pinned to gcc13 on Linux for consistency across CTK images
- {jobs: ['test'], project: 'python', ctk: ['12.X', '13.0', '13.X'], py_version: ['3.10', '3.11', '3.12', '3.13'], gpu: 'l4', cxx: ['gcc13', 'msvc']}
- {jobs: ['test'], project: 'python', ctk: ['12.X', '13.X'], py_version: '3.13', gpu: 'h100', cxx: 'gcc13'}
# Python experimental (STF) -- pinned to gcc13, Linux only
- {jobs: ['test'], project: 'python_experimental', ctk: ['12.X', '13.X'], py_version: ['3.10', '3.13'], gpu: 'l4', cxx: 'gcc13'}
- {jobs: ['test'], project: 'python_experimental', ctk: '13.X', py_version: '3.13', gpu: 'h100', cxx: 'gcc13'}
# CCCL packaging:
- {jobs: ['test'], project: 'packaging', ctk: '12.0', cxx: ['gcc10', 'clang14'], gpu: 'rtx2080', args: '-min-cmake'}
- {jobs: ['test'], project: 'packaging', ctk: '12.X', cxx: ['gcc10', 'clang14'], gpu: 'rtx2080'}
Expand Down Expand Up @@ -279,6 +285,9 @@ workflows:
# Python -- pinned to gcc13 for consistency across CTK images
- {jobs: ['test'], project: 'python', ctk: ['12.X', '13.0', '13.X'], py_version: ['3.10', '3.11', '3.12', '3.13'], gpu: 'l4', cxx: ['gcc13', 'msvc']}
- {jobs: ['test'], project: 'python', ctk: ['12.X', '13.X'], py_version: '3.13', gpu: 'h100', cxx: ['gcc13', 'msvc']}
# Python experimental (STF) -- pinned to gcc13, Linux only
- {jobs: ['test'], project: 'python_experimental', ctk: ['12.X', '13.X'], py_version: ['3.10', '3.13'], gpu: 'l4', cxx: 'gcc13'}
- {jobs: ['test'], project: 'python_experimental', ctk: '13.X', py_version: '3.13', gpu: 'h100', cxx: 'gcc13'}
# CCCL packaging:
- {jobs: ['test'], project: 'packaging', ctk: '12.0', cxx: ['gcc10', 'clang14'], gpu: 'rtx2080', args: '-min-cmake'}
- {jobs: ['test'], project: 'packaging', ctk: '12.X', cxx: ['gcc10', 'clang14'], gpu: 'rtx2080'}
Expand All @@ -302,6 +311,7 @@ workflows:
- {jobs: ['test'], project: 'python', ctk: ['12.X', '13.0', '13.X'], py_version: ['3.10', '3.11', '3.12', '3.13'], gpu: 'l4', cxx: ['gcc13', 'msvc']}
- {jobs: ['test'], project: 'python', ctk: ['12.X', '13.X'], py_version: '3.13', gpu: 'h100', cxx: ['gcc13', 'msvc']}
- {jobs: ['test'], project: 'python', cpu: 'arm64', ctk: ['12.X', '13.X'], py_version: ['3.10', '3.11', '3.12', '3.13'], gpu: 'l4', cxx: 'gcc13'}
- {jobs: ['test'], project: 'python_experimental', ctk: ['12.X', '13.X'], py_version: ['3.10', '3.13'], gpu: 'l4', cxx: 'gcc13'}


# This is just used to ensure that we generate devcontainers for all images we build.
Expand All @@ -325,6 +335,8 @@ workflows:
exclude:
# GPU runners are not available on Windows.
- {jobs: ['test', 'test_gpu', 'test_nolid', 'test_lid0', 'test_lid1', 'test_lid2'], cxx: ['msvc2019', 'msvc14.39', 'msvc2022']}
# STF experimental Python bindings are not built for MSVC:
- {project: 'python_experimental', cxx: ['msvc2019', 'msvc14.39', 'msvc2022']}
# cudax doesn't support C++17 on msvc:
- {project: 'cudax', std: 17, cxx: ['msvc2019', 'msvc14.39', 'msvc2022']}

Expand Down Expand Up @@ -478,6 +490,9 @@ jobs:
test_py_coop: { name: "Test cuda.coop", gpu: true, needs: 'build_py_wheel', force_producer_ctk: "pybuild", invoke: { prefix: 'test_cuda_coop'} }
test_py_par: { name: "Test cuda.compute", gpu: true, needs: 'build_py_wheel', force_producer_ctk: "pybuild", invoke: { prefix: 'test_cuda_compute'} }
test_py_examples: { name: "Test cuda.cccl.examples", gpu: true, needs: 'build_py_wheel', force_producer_ctk: "pybuild", invoke: { prefix: 'test_cuda_cccl_examples'} }
# Python experimental (cuda-cccl-experimental wheel):
build_py_experimental_wheel: { name: "Build cuda.cccl.experimental", gpu: false, invoke: { prefix: 'build_cuda_cccl_experimental'} }
test_py_stf: { name: "Test cuda.stf", gpu: true, needs: 'build_py_experimental_wheel', force_producer_ctk: "pybuild", invoke: { prefix: 'test_cuda_stf'} }

# Run jobs for 'target' project (ci/util/build_and_test_targets.sh):
run_cpu: { gpu: false }
Expand Down Expand Up @@ -536,6 +551,11 @@ projects:
job_map:
build: ['build_py_wheel']
test: ['test_py_headers', 'test_py_coop', 'test_py_par', 'test_py_examples']
python_experimental:
name: "Python Experimental"
job_map:
build: ['build_py_experimental_wheel']
test: ['test_py_stf']
cccl_c_parallel:
name: 'CCCL C Parallel'
stds: [20]
Expand Down
12 changes: 11 additions & 1 deletion ci/project_files_and_dependencies.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -128,8 +128,18 @@ projects:
lite_dependencies: [cccl_c_parallel_public]
full_dependencies: []
include_regexes:
- "python/"
- "python/cuda_cccl/"
- "pyproject.toml"
exclude_regexes:
- "python/cuda_cccl_experimental/"

python_experimental:
name: "Python Experimental"
matrix_project: "python_experimental"
lite_dependencies: [cccl_c_stf, python]
full_dependencies: []
include_regexes:
- "python/cuda_cccl_experimental/"

packaging:
name: "CCCL Packaging"
Expand Down
2 changes: 1 addition & 1 deletion ci/test/inspect_changes/c2h_dependency.output
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
FULL_BUILD=
LITE_BUILD=libcudacxx cub cudax cccl_c_parallel cccl_c_stf packaging
LITE_BUILD=libcudacxx cub cudax cccl_c_parallel cccl_c_stf python_experimental packaging
2 changes: 1 addition & 1 deletion ci/test/inspect_changes/core_dirty.output
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
FULL_BUILD=libcudacxx cub thrust cudax cccl_c_parallel cccl_c_stf python packaging stdpar nvbench_helper nvrtcc
FULL_BUILD=libcudacxx cub thrust cudax cccl_c_parallel cccl_c_stf python python_experimental packaging stdpar nvbench_helper nvrtcc
LITE_BUILD=
2 changes: 1 addition & 1 deletion ci/test/inspect_changes/libcudacxx_both.output
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
FULL_BUILD=libcudacxx
LITE_BUILD=cub thrust cudax cccl_c_parallel cccl_c_stf python packaging stdpar nvbench_helper
LITE_BUILD=cub thrust cudax cccl_c_parallel cccl_c_stf python python_experimental packaging stdpar nvbench_helper
2 changes: 1 addition & 1 deletion ci/test/inspect_changes/libcudacxx_public_only.output
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
FULL_BUILD=libcudacxx
LITE_BUILD=cub thrust cudax cccl_c_parallel cccl_c_stf python packaging stdpar nvbench_helper
LITE_BUILD=cub thrust cudax cccl_c_parallel cccl_c_stf python python_experimental packaging stdpar nvbench_helper
2 changes: 1 addition & 1 deletion ci/test/inspect_changes/libcudacxx_thrust.output
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
FULL_BUILD=libcudacxx thrust
LITE_BUILD=cub cudax cccl_c_parallel cccl_c_stf python packaging stdpar nvbench_helper
LITE_BUILD=cub cudax cccl_c_parallel cccl_c_stf python python_experimental packaging stdpar nvbench_helper
2 changes: 1 addition & 1 deletion ci/test/inspect_changes/multiple_projects.output
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
FULL_BUILD=python packaging
LITE_BUILD=
LITE_BUILD=python_experimental
42 changes: 42 additions & 0 deletions ci/test_cuda_stf_python_experimental.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
#!/bin/bash

set -euo pipefail

ci_dir="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
source "$ci_dir/pyenv_helper.sh"

# Parse common arguments
source "$ci_dir/util/python/common_arg_parser.sh"
parse_python_args "$@"
cuda_major_version=$(nvcc --version | grep release | awk '{print $6}' | tr -d ',' | cut -d '.' -f 1 | cut -d 'V' -f 2)

# Setup Python environment
setup_python_env "${py_version}"

# Fetch or build the cuda_cccl wheel (base dependency):
if [[ -n "${GITHUB_ACTIONS:-}" ]]; then
wheel_artifact_name=$("$ci_dir/util/workflow/get_wheel_artifact_name.sh")
"$ci_dir/util/artifacts/download.sh" ${wheel_artifact_name} /home/coder/cccl/
else
"$ci_dir/build_cuda_cccl_python.sh" -py-version "${py_version}"
fi

# Install cuda_cccl base wheel
CUDA_CCCL_WHEEL_PATH="$(ls /home/coder/cccl/wheelhouse/cuda_cccl-*.whl)"
python -m pip install "${CUDA_CCCL_WHEEL_PATH}[cu${cuda_major_version}]"

# Fetch or build the experimental wheel:
if [[ -n "${GITHUB_ACTIONS:-}" ]]; then
experimental_artifact_name="${wheel_artifact_name}_experimental"
"$ci_dir/util/artifacts/download.sh" ${experimental_artifact_name} /home/coder/cccl/
else
"$ci_dir/build_cuda_cccl_experimental_python_experimental.sh" -py-version "${py_version}"
fi

# Install cuda_cccl_experimental wheel
EXPERIMENTAL_WHEEL_PATH="$(ls /home/coder/cccl/wheelhouse_experimental/cuda_cccl_experimental-*.whl)"
python -m pip install "${EXPERIMENTAL_WHEEL_PATH}[test-cu${cuda_major_version}]"

# Run tests for STF module
cd "/home/coder/cccl/python/cuda_cccl_experimental/tests/"
python -m pytest -n auto -v stf/
4 changes: 4 additions & 0 deletions docs/python/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,9 @@ abstractions for CUDA Python developers.
* :doc:`cuda.coop <coop>` — Cooperative block- and warp-level algorithms for
writing highly efficient CUDA kernels with `Numba CUDA <https://nvidia.github.io/numba-cuda/>`_.

* :doc:`cuda.stf <stf>` — Sequential Task Flow for CUDA: define logical data and
tasks with read/write annotations; STF orchestrates execution and data movement.

These libraries expose the generic, highly-optimized algorithms from the
`CCCL C++ libraries <https://nvidia.github.io/cccl/cpp.html>`_,
which have been tuned to provide optimal performance across GPU architectures.
Expand All @@ -34,5 +37,6 @@ Who is this for?
setup
compute
coop
stf
resources
api_reference
Loading
Loading