ensure 'torch' CUDA wheels are installed in CI#2279
ensure 'torch' CUDA wheels are installed in CI#2279jameslamb wants to merge 5 commits intorapidsai:mainfrom
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
/ok to test |
| # avoid pulling in 'torch' in places like DLFW builds that prefer to install it other ways | ||
| - matrix: | ||
| no_pytorch: "true" | ||
| packages: |
There was a problem hiding this comment.
This follows the pattern @trxcllnt has been introducing across RAPIDS: rapidsai/cugraph-gnn#421
Think rmm never needed patches for DLFW and so was missed in that round of PRs because its - depends_on_pytorch group doesn't end up in test_python or similar commonly-used lists.
|
/ok to test |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughSummary by CodeRabbit
WalkthroughAdds a CI script to download CUDA-specific PyTorch wheels, updates CI test scripts to use an environment-driven PIP constraint and to download/use CUDA wheels for PyTorch tests, and restructures Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
ci/download-torch-wheels.sh (1)
27-40: Keep the generated constraint file insideTORCH_WHEEL_DIR.Writing
torch-constraints.txtto./leaves shared state in the working tree even though the caller already gives this helper a per-run temp directory. Keeping the file under${TORCH_WHEEL_DIR}makes the whole download step self-contained and avoids cross-run collisions.♻️ Suggested refactor
+TORCH_CONSTRAINTS="${TORCH_WHEEL_DIR}/torch-constraints.txt" + rapids-dependency-file-generator \ --output requirements \ --file-key "torch_only" \ --matrix "cuda=${RAPIDS_CUDA_VERSION%.*};arch=$(arch);py=${RAPIDS_PY_VERSION};dependencies=${RAPIDS_DEPENDENCIES};require_gpu_pytorch=true" \ -| tee ./torch-constraints.txt +| tee "${TORCH_CONSTRAINTS}" rapids-pip-retry download \ --isolated \ --prefer-binary \ --no-deps \ -d "${TORCH_WHEEL_DIR}" \ --constraint "${PIP_CONSTRAINT}" \ - --constraint ./torch-constraints.txt \ + --constraint "${TORCH_CONSTRAINTS}" \ 'torch'🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ci/download-torch-wheels.sh` around lines 27 - 40, The generated constraint file is written to ./torch-constraints.txt which leaves shared state; change the rapids-dependency-file-generator pipeline so tee writes into the per-run directory (use ${TORCH_WHEEL_DIR}/torch-constraints.txt) and update the subsequent rapids-pip-retry download --constraint argument to point to that file instead of ./torch-constraints.txt, keeping all references to torch-constraints.txt, rapids-dependency-file-generator, tee, ${TORCH_WHEEL_DIR}, and the rapids-pip-retry download --constraint option consistent.dependencies.yaml (1)
401-409: Mirror the oldest/latest split on the conda path or explain why conda can use relaxed version constraints.The caller passes
dependencies=${RAPIDS_DEPENDENCIES}to the generator for PyTorch conda, but the conda matrices ignore this selector. The requirements (wheel) path pins specific PyTorch versions fordependencies=oldest(e.g.,torch==2.9.0+cu129), whereas the conda path always uses relaxed constraints (pytorch-gpu>=2.9). Either add the same oldest/latest branching to the conda matrices, or document why conda can rely on the solver to handle any PyTorch >=2.9 version safely while wheels cannot.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@dependencies.yaml` around lines 401 - 409, The conda matrices under output_types: conda currently ignore the caller's dependencies selector (dependencies=${RAPIDS_DEPENDENCIES}) and always use relaxed package specs (packages: - pytorch-gpu>=2.9 / - pytorch>=2.9), which diverges from the wheel path that branches on oldest/latest and pins versions (e.g., torch==2.9.0+cu129); update the conda matrices to mirror the oldest/latest split (add separate matrix entries for dependencies=oldest that pin exact pytorch/pytorch-gpu versions and for dependencies=latest that keep >= constraints) or add a clear comment/documentation explaining why conda can safely use relaxed constraints and referencing the matrix keys (matrices, require_gpu_pytorch, packages) and the caller variable RAPIDS_DEPENDENCIES so reviewers can verify the intended behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@ci/test_wheel_integrations.sh`:
- Around line 40-45: The CI skip log in test_wheel_integrations.sh is out of
sync with the gate that checks CUDA_MAJOR/CUDA_MINOR (the conditional using
CUDA_MAJOR and CUDA_MINOR around the if block) — update the skip/message string
(the message printed around line 66) to match the actual gate: indicate "CUDA
12.9+ (for 12.x) or 13.0" instead of "12.6-12.9 or 13.0" so it accurately
reflects the condition in the if that checks CUDA_MAJOR and CUDA_MINOR.
---
Nitpick comments:
In `@ci/download-torch-wheels.sh`:
- Around line 27-40: The generated constraint file is written to
./torch-constraints.txt which leaves shared state; change the
rapids-dependency-file-generator pipeline so tee writes into the per-run
directory (use ${TORCH_WHEEL_DIR}/torch-constraints.txt) and update the
subsequent rapids-pip-retry download --constraint argument to point to that file
instead of ./torch-constraints.txt, keeping all references to
torch-constraints.txt, rapids-dependency-file-generator, tee,
${TORCH_WHEEL_DIR}, and the rapids-pip-retry download --constraint option
consistent.
In `@dependencies.yaml`:
- Around line 401-409: The conda matrices under output_types: conda currently
ignore the caller's dependencies selector (dependencies=${RAPIDS_DEPENDENCIES})
and always use relaxed package specs (packages: - pytorch-gpu>=2.9 / -
pytorch>=2.9), which diverges from the wheel path that branches on oldest/latest
and pins versions (e.g., torch==2.9.0+cu129); update the conda matrices to
mirror the oldest/latest split (add separate matrix entries for
dependencies=oldest that pin exact pytorch/pytorch-gpu versions and for
dependencies=latest that keep >= constraints) or add a clear
comment/documentation explaining why conda can safely use relaxed constraints
and referencing the matrix keys (matrices, require_gpu_pytorch, packages) and
the caller variable RAPIDS_DEPENDENCIES so reviewers can verify the intended
behavior.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 0ef2a43d-0a13-4967-a6ef-94f870a18ed8
📒 Files selected for processing (5)
ci/download-torch-wheels.shci/test_python_integrations.shci/test_wheel.shci/test_wheel_integrations.shdependencies.yaml
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@ci/test_wheel_integrations.sh`:
- Around line 40-45: The top comment stating "requires CUDA 12.8+" is now
inaccurate relative to the conditional that permits CUDA 12.9+ for 12.x and only
13.0 for 13.x; update that comment above the gating if-block (the block checking
CUDA_MAJOR and CUDA_MINOR) to accurately describe the new policy (e.g., require
CUDA 12.9+ on 12.x and CUDA 13.0 on 13.x) so future triage matches the condition
in the { [ "${CUDA_MAJOR}" -eq 12 ] && [ "${CUDA_MINOR}" -ge 9 ]; } || { [
"${CUDA_MAJOR}" -eq 13 ] && [ "${CUDA_MINOR}" -le 0 ]; } check.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 23206a42-586c-4680-b0ff-0c16836fa1f9
📒 Files selected for processing (1)
ci/test_wheel_integrations.sh
| -v \ | ||
| "${PIP_INSTALL_SHARED_ARGS[@]}" \ | ||
| -r test-pytorch-requirements.txt | ||
| "${TORCH_WHEEL_DIR}"/torch-*.whl |
There was a problem hiding this comment.
It looks to me like this is working and pulling in what we want!
CUDA 12.2.2, Python 3.11, arm64, ubuntu22.04, a100, latest-driver, latest-deps
RAPIDS logger » [03/06/26 19:41:00]
┌──────────────────────────────────────────────────────────────────────────┐
| Skipping PyTorch tests (requires CUDA 12.9+ or 13.0, found 12.2.2) |
└──────────────────────────────────────────────────────────────────────────┘
CUDA 12.9.1, Python 3.11, amd64, ubuntu22.04, l4, latest-driver, oldest-deps
Successfully installed ... torch-2.9.0+cu129 ...
CUDA 12.9.1, Python 3.14, amd64, ubuntu24.04, h100, latest-driver, latest-deps
Successfully installed ... torch-2.10.0+cu129 ...
CUDA 13.0.2, Python 3.12, amd64, ubuntu24.04, l4, latest-driver, latest-deps
Successfully installed ... torch-2.10.0+cu130 ...
CUDA 13.0.2, Python 3.12, arm64, rockylinux8, l4, latest-driver, latest-deps
Successfully installed ... torch-2.10.0+cu130 ...
CUDA 13.1.1, Python 3.13, amd64, rockylinux8, rtxpro6000, latest-driver, latest-deps
RAPIDS logger » [03/06/26 19:35:46]
┌──────────────────────────────────────────────────────────────────────────┐
| Skipping PyTorch tests (requires CUDA 12.9+ or 13.0, found 13.1.1) |
└──────────────────────────────────────────────────────────────────────────┘
CUDA 13.1.1, Python 3.14, amd64, ubuntu24.04, rtxpro6000, latest-driver, latest-deps
RAPIDS logger » [03/06/26 19:34:37]
┌──────────────────────────────────────────────────────────────────────────┐
| Skipping PyTorch tests (requires CUDA 12.9+ or 13.0, found 13.1.1) |
└──────────────────────────────────────────────────────────────────────────┘
CUDA 13.1.1, Python 3.14, arm64, ubuntu24.04, l4, latest-driver, latest-deps
RAPIDS logger » [03/06/26 19:36:06]
┌──────────────────────────────────────────────────────────────────────────┐
| Skipping PyTorch tests (requires CUDA 12.9+ or 13.0, found 13.1.1) |
└──────────────────────────────────────────────────────────────────────────┘
Description
Contributes to rapidsai/build-planning#256
Broken out from #2270
Proposes a stricter pattern for installing
torchwheels, to prevent bugs of the form "accidentally used a CPU-onlytorchfrom pypi.org". This should help us to catch compatibility issues, improving release confidence.Other small changes:
require_gpu_pytorchmatrix filter so conda jobs can explicitly requestpytorch-gpu(to similarly ensure solvers don't fall back to the GPU-only variant)rapids-generate-pip-constraintoutput to filePIP_CONSTRAINTpointsChecklist