ensure 'torch' CUDA wheels are installed in CI by jameslamb · Pull Request #2279 · rapidsai/rmm

jameslamb · 2026-03-06T19:05:59Z

Description

Contributes to rapidsai/build-planning#256

Broken out from #2270

Proposes a stricter pattern for installing torch wheels, to prevent bugs of the form "accidentally used a CPU-only torch from pypi.org". This should help us to catch compatibility issues, improving release confidence.

Other small changes:

splits torch wheel testing into "oldest" (PyTorch 2.9) and "latest" (PyTorch 2.10)
introduces a require_gpu_pytorch matrix filter so conda jobs can explicitly request pytorch-gpu (to similarly ensure solvers don't fall back to the GPU-only variant)
appends rapids-generate-pip-constraint output to file PIP_CONSTRAINT points
- (to reduce duplication and the risk of failing to apply constraints)

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

copy-pr-bot · 2026-03-06T19:06:03Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

jameslamb · 2026-03-06T19:06:08Z

/ok to test

jameslamb · 2026-03-06T19:08:21Z

dependencies.yaml

+          # avoid pulling in 'torch' in places like DLFW builds that prefer to install it other ways
+          - matrix:
+              no_pytorch: "true"
+            packages:


This follows the pattern @trxcllnt has been introducing across RAPIDS: rapidsai/cugraph-gnn#421

Think rmm never needed patches for DLFW and so was missed in that round of PRs because its - depends_on_pytorch group doesn't end up in test_python or similar commonly-used lists.

jameslamb · 2026-03-06T19:16:56Z

/ok to test

coderabbitai · 2026-03-06T19:17:25Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 92c54388-7bf6-4169-a2b9-e75808aa98f6

📥 Commits

Reviewing files that changed from the base of the PR and between 9eefdea and 9efce26.

📒 Files selected for processing (1)

ci/test_wheel_integrations.sh

📝 Walkthrough

Summary by CodeRabbit

Chores
- Added an automated PyTorch wheel downloader to ensure CUDA-targeted wheels are fetched for CI.
- Centralized and clarified constraint handling for pip installs using an environment-driven constraint source.
- Expanded dependency declarations to support multiple PyTorch/CUDA matrix combinations and added a new torch-only dependency group; removed the prior test-wheels grouping.
Tests
- Updated GPU test flows to install CUDA-specific wheels and adjusted compatibility checks for newer CUDA versions.

Walkthrough

Adds a CI script to download CUDA-specific PyTorch wheels, updates CI test scripts to use an environment-driven PIP constraint and to download/use CUDA wheels for PyTorch tests, and restructures dependencies.yaml to replace a simple PyTorch entry with a multi-matrix depends_on_pytorch and a new torch_only group.

Changes

Cohort / File(s)	Summary
PyTorch wheel downloader `ci/download-torch-wheels.sh`	New executable script that generates torch-specific constraints via `rapids-dependency-file-generator` and downloads CUDA-variant PyTorch wheels with `rapids-pip-retry` into a specified directory.
CI test scripts `ci/test_python_integrations.sh`, `ci/test_wheel.sh`, `ci/test_wheel_integrations.sh`	Switch constraint generation/usage to environment-driven `${PIP_CONSTRAINT}`, add `;require_gpu_pytorch=true` to the PyTorch GPU matrix entry, and refactor integrations flow to download/use CUDA-specific PyTorch wheels for GPU test runs.
Dependency configuration `dependencies.yaml`	Remove `test_wheels_pytorch` file-group, add new `torch_only` group, and replace the simple `depends_on_pytorch` common block with a detailed `specific` multi-matrix declaration covering CUDA versions, GPU/non-GPU variants, and multiple output types (requirements, pyproject, conda).

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Add CI jobs for PyTorch and CuPy integration tests #2240: Overlapping CI changes for PyTorch wheel/conda integration and CUDA-aware depends_on_pytorch matrices.

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically describes the main change: ensuring CUDA-enabled torch wheels are installed in CI, which is the primary objective across all modified files.
Description check	✅ Passed	The description is directly related to the changeset, explaining the rationale for stricter torch wheel installation patterns, the split into PyTorch versions, the new matrix filter, and constraint handling improvements.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

ci/download-torch-wheels.sh (1)
27-40: Keep the generated constraint file inside TORCH_WHEEL_DIR.

Writing torch-constraints.txt to ./ leaves shared state in the working tree even though the caller already gives this helper a per-run temp directory. Keeping the file under ${TORCH_WHEEL_DIR} makes the whole download step self-contained and avoids cross-run collisions.
♻️ Suggested refactor
+TORCH_CONSTRAINTS="${TORCH_WHEEL_DIR}/torch-constraints.txt"
+
 rapids-dependency-file-generator \
     --output requirements \
     --file-key "torch_only" \
     --matrix "cuda=${RAPIDS_CUDA_VERSION%.*};arch=$(arch);py=${RAPIDS_PY_VERSION};dependencies=${RAPIDS_DEPENDENCIES};require_gpu_pytorch=true" \
-| tee ./torch-constraints.txt
+| tee "${TORCH_CONSTRAINTS}"
 
 rapids-pip-retry download \
   --isolated \
   --prefer-binary \
   --no-deps \
   -d "${TORCH_WHEEL_DIR}" \
   --constraint "${PIP_CONSTRAINT}" \
-  --constraint ./torch-constraints.txt \
+  --constraint "${TORCH_CONSTRAINTS}" \
   'torch'
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ci/download-torch-wheels.sh` around lines 27 - 40, The generated constraint
file is written to ./torch-constraints.txt which leaves shared state; change the
rapids-dependency-file-generator pipeline so tee writes into the per-run
directory (use ${TORCH_WHEEL_DIR}/torch-constraints.txt) and update the
subsequent rapids-pip-retry download --constraint argument to point to that file
instead of ./torch-constraints.txt, keeping all references to
torch-constraints.txt, rapids-dependency-file-generator, tee,
${TORCH_WHEEL_DIR}, and the rapids-pip-retry download --constraint option
consistent.
dependencies.yaml (1)
401-409: Mirror the oldest/latest split on the conda path or explain why conda can use relaxed version constraints.

The caller passes dependencies=${RAPIDS_DEPENDENCIES} to the generator for PyTorch conda, but the conda matrices ignore this selector. The requirements (wheel) path pins specific PyTorch versions for dependencies=oldest (e.g., torch==2.9.0+cu129), whereas the conda path always uses relaxed constraints (pytorch-gpu>=2.9). Either add the same oldest/latest branching to the conda matrices, or document why conda can rely on the solver to handle any PyTorch >=2.9 version safely while wheels cannot.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dependencies.yaml` around lines 401 - 409, The conda matrices under
output_types: conda currently ignore the caller's dependencies selector
(dependencies=${RAPIDS_DEPENDENCIES}) and always use relaxed package specs
(packages: - pytorch-gpu>=2.9 / - pytorch>=2.9), which diverges from the wheel
path that branches on oldest/latest and pins versions (e.g.,
torch==2.9.0+cu129); update the conda matrices to mirror the oldest/latest split
(add separate matrix entries for dependencies=oldest that pin exact
pytorch/pytorch-gpu versions and for dependencies=latest that keep >=
constraints) or add a clear comment/documentation explaining why conda can
safely use relaxed constraints and referencing the matrix keys (matrices,
require_gpu_pytorch, packages) and the caller variable RAPIDS_DEPENDENCIES so
reviewers can verify the intended behavior.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ci/test_wheel_integrations.sh`:
- Around line 40-45: The CI skip log in test_wheel_integrations.sh is out of
sync with the gate that checks CUDA_MAJOR/CUDA_MINOR (the conditional using
CUDA_MAJOR and CUDA_MINOR around the if block) — update the skip/message string
(the message printed around line 66) to match the actual gate: indicate "CUDA
12.9+ (for 12.x) or 13.0" instead of "12.6-12.9 or 13.0" so it accurately
reflects the condition in the if that checks CUDA_MAJOR and CUDA_MINOR.

---

Nitpick comments:
In `@ci/download-torch-wheels.sh`:
- Around line 27-40: The generated constraint file is written to
./torch-constraints.txt which leaves shared state; change the
rapids-dependency-file-generator pipeline so tee writes into the per-run
directory (use ${TORCH_WHEEL_DIR}/torch-constraints.txt) and update the
subsequent rapids-pip-retry download --constraint argument to point to that file
instead of ./torch-constraints.txt, keeping all references to
torch-constraints.txt, rapids-dependency-file-generator, tee,
${TORCH_WHEEL_DIR}, and the rapids-pip-retry download --constraint option
consistent.

In `@dependencies.yaml`:
- Around line 401-409: The conda matrices under output_types: conda currently
ignore the caller's dependencies selector (dependencies=${RAPIDS_DEPENDENCIES})
and always use relaxed package specs (packages: - pytorch-gpu>=2.9 / -
pytorch>=2.9), which diverges from the wheel path that branches on oldest/latest
and pins versions (e.g., torch==2.9.0+cu129); update the conda matrices to
mirror the oldest/latest split (add separate matrix entries for
dependencies=oldest that pin exact pytorch/pytorch-gpu versions and for
dependencies=latest that keep >= constraints) or add a clear
comment/documentation explaining why conda can safely use relaxed constraints
and referencing the matrix keys (matrices, require_gpu_pytorch, packages) and
the caller variable RAPIDS_DEPENDENCIES so reviewers can verify the intended
behavior.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 0ef2a43d-0a13-4967-a6ef-94f870a18ed8

📥 Commits

Reviewing files that changed from the base of the PR and between d1563fc and 5d328ff.

📒 Files selected for processing (5)

ci/download-torch-wheels.sh
ci/test_python_integrations.sh
ci/test_wheel.sh
ci/test_wheel_integrations.sh
dependencies.yaml

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ci/test_wheel_integrations.sh`:
- Around line 40-45: The top comment stating "requires CUDA 12.8+" is now
inaccurate relative to the conditional that permits CUDA 12.9+ for 12.x and only
13.0 for 13.x; update that comment above the gating if-block (the block checking
CUDA_MAJOR and CUDA_MINOR) to accurately describe the new policy (e.g., require
CUDA 12.9+ on 12.x and CUDA 13.0 on 13.x) so future triage matches the condition
in the { [ "${CUDA_MAJOR}" -eq 12 ] && [ "${CUDA_MINOR}" -ge 9 ]; } || { [
"${CUDA_MAJOR}" -eq 13 ] && [ "${CUDA_MINOR}" -le 0 ]; } check.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 23206a42-586c-4680-b0ff-0c16836fa1f9

📥 Commits

Reviewing files that changed from the base of the PR and between 5d328ff and 9eefdea.

📒 Files selected for processing (1)

ci/test_wheel_integrations.sh

ci/test_wheel_integrations.sh

jameslamb · 2026-03-06T19:55:06Z

ci/test_wheel_integrations.sh

        -v \
        "${PIP_INSTALL_SHARED_ARGS[@]}" \
-        -r test-pytorch-requirements.txt
+        "${TORCH_WHEEL_DIR}"/torch-*.whl


It looks to me like this is working and pulling in what we want!

CUDA 12.2.2, Python 3.11, arm64, ubuntu22.04, a100, latest-driver, latest-deps

(build link)

RAPIDS logger » [03/06/26 19:41:00] ┌──────────────────────────────────────────────────────────────────────────┐ | Skipping PyTorch tests (requires CUDA 12.9+ or 13.0, found 12.2.2) | └──────────────────────────────────────────────────────────────────────────┘

CUDA 12.9.1, Python 3.11, amd64, ubuntu22.04, l4, latest-driver, oldest-deps

(build link)

Successfully installed ... torch-2.9.0+cu129 ...

CUDA 12.9.1, Python 3.14, amd64, ubuntu24.04, h100, latest-driver, latest-deps

(build link)

Successfully installed ... torch-2.10.0+cu129 ...

CUDA 13.0.2, Python 3.12, amd64, ubuntu24.04, l4, latest-driver, latest-deps

(build link)

Successfully installed ... torch-2.10.0+cu130 ...

CUDA 13.0.2, Python 3.12, arm64, rockylinux8, l4, latest-driver, latest-deps

(build link)

Successfully installed ... torch-2.10.0+cu130 ...

CUDA 13.1.1, Python 3.13, amd64, rockylinux8, rtxpro6000, latest-driver, latest-deps

(build link)

RAPIDS logger » [03/06/26 19:35:46] ┌──────────────────────────────────────────────────────────────────────────┐ | Skipping PyTorch tests (requires CUDA 12.9+ or 13.0, found 13.1.1) | └──────────────────────────────────────────────────────────────────────────┘

CUDA 13.1.1, Python 3.14, amd64, ubuntu24.04, rtxpro6000, latest-driver, latest-deps

(build link)

RAPIDS logger » [03/06/26 19:34:37] ┌──────────────────────────────────────────────────────────────────────────┐ | Skipping PyTorch tests (requires CUDA 12.9+ or 13.0, found 13.1.1) | └──────────────────────────────────────────────────────────────────────────┘

CUDA 13.1.1, Python 3.14, arm64, ubuntu24.04, l4, latest-driver, latest-deps

(build link)

RAPIDS logger » [03/06/26 19:36:06] ┌──────────────────────────────────────────────────────────────────────────┐ | Skipping PyTorch tests (requires CUDA 12.9+ or 13.0, found 13.1.1) | └──────────────────────────────────────────────────────────────────────────┘

jameslamb added 2 commits March 6, 2026 12:58

ensure 'torch' CUDA wheels are installed in CI

5d2b4e6

limit to constraints

5d328ff

jameslamb added non-breaking Non-breaking change improvement Improvement / enhancement to an existing function labels Mar 6, 2026

github-project-automation bot added this to RMM Project Board Mar 6, 2026

jameslamb mentioned this pull request Mar 6, 2026

wheels CI: test mix of cuda-toolkit version in CI rapidsai/build-planning#256

Open

23 tasks

jameslamb commented Mar 6, 2026

View reviewed changes

jameslamb added 2 commits March 6, 2026 13:15

revert unnecessary stuff

96a8bcb

log message

9eefdea

coderabbitai bot reviewed Mar 6, 2026

View reviewed changes

ci/test_wheel_integrations.sh Show resolved Hide resolved

jameslamb commented Mar 6, 2026

View reviewed changes

remove outdated comment

9efce26

jameslamb changed the title ~~WIP: ensure 'torch' CUDA wheels are installed in CI~~ ensure 'torch' CUDA wheels are installed in CI Mar 6, 2026

jameslamb requested a review from bdice March 6, 2026 19:58

jameslamb marked this pull request as ready for review March 6, 2026 19:58

jameslamb requested review from a team as code owners March 6, 2026 19:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ensure 'torch' CUDA wheels are installed in CI#2279

ensure 'torch' CUDA wheels are installed in CI#2279
jameslamb wants to merge 5 commits intorapidsai:mainfrom
jameslamb:torch-testing

jameslamb commented Mar 6, 2026

Uh oh!

copy-pr-bot bot commented Mar 6, 2026

Uh oh!

jameslamb commented Mar 6, 2026

Uh oh!

jameslamb Mar 6, 2026

Uh oh!

jameslamb commented Mar 6, 2026

Uh oh!

coderabbitai bot commented Mar 6, 2026 •

edited

Loading

Summary by CodeRabbit

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

jameslamb Mar 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jameslamb commented Mar 6, 2026

Description

Checklist

Uh oh!

copy-pr-bot bot commented Mar 6, 2026

Uh oh!

jameslamb commented Mar 6, 2026

Uh oh!

jameslamb Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

jameslamb commented Mar 6, 2026

Uh oh!

coderabbitai bot commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jameslamb Mar 6, 2026

Choose a reason for hiding this comment

CUDA 12.2.2, Python 3.11, arm64, ubuntu22.04, a100, latest-driver, latest-deps

CUDA 12.9.1, Python 3.11, amd64, ubuntu22.04, l4, latest-driver, oldest-deps

CUDA 12.9.1, Python 3.14, amd64, ubuntu24.04, h100, latest-driver, latest-deps

CUDA 13.0.2, Python 3.12, amd64, ubuntu24.04, l4, latest-driver, latest-deps

CUDA 13.0.2, Python 3.12, arm64, rockylinux8, l4, latest-driver, latest-deps

CUDA 13.1.1, Python 3.13, amd64, rockylinux8, rtxpro6000, latest-driver, latest-deps

CUDA 13.1.1, Python 3.14, amd64, ubuntu24.04, rtxpro6000, latest-driver, latest-deps

CUDA 13.1.1, Python 3.14, arm64, ubuntu24.04, l4, latest-driver, latest-deps

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai bot commented Mar 6, 2026 •

edited

Loading