Skip to content
Open
Show file tree
Hide file tree
Changes from 9 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .github/workflows/pr.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -257,6 +257,7 @@ jobs:
build_type: pull-request
script: ci/test_wheel_pylibwholegraph.sh
matrix_filter: map(select(.ARCH == "amd64"))
matrix_type: 'nightly'
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
matrix_type: 'nightly'

TODO: revert this. Just added here for testing, to confirm this will fix the issues we've been seeing in nightlies.

wheel-build-cugraph-pyg:
needs: checks
secrets: inherit
Expand All @@ -279,3 +280,4 @@ jobs:
build_type: pull-request
script: ci/test_wheel_cugraph-pyg.sh
matrix_filter: map(select(.ARCH == "amd64"))
matrix_type: 'nightly'
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
matrix_type: 'nightly'

Revert before merging.

2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,8 @@ wheels/
wheelhouse/
_skbuild/
cufile.log
*.tar.gz
*.whl

## Patching
*.diff
Expand Down
37 changes: 37 additions & 0 deletions ci/download-torch-wheels.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
#!/bin/bash
# SPDX-FileCopyrightText: Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0

# [description]
#
# Downloads a CUDA variant of 'torch' from the correct index, based on CUDA major version.
#
# This exists to avoid using 'pip --extra-index-url', which could allow for CPU-only 'torch'
# to be downloaded from pypi.org.
#

set -e -u -o pipefail

# Ensure CUDA-enabled 'torch' packages are always used.
#
# Downloading + adding the downloaded file to the constraint forces the use of this
# package, so we don't accidentally end up with a CPU-only 'torch' from 'pypi.org'
# (which can happen because --extra-index-url doesn't imply a priority).
rapids-logger "Downloading 'torch' wheel"
CUDA_MAJOR="${RAPIDS_CUDA_VERSION%%.*}"
if [[ "${CUDA_MAJOR}" == "12" ]]; then
PYTORCH_INDEX="https://download.pytorch.org/whl/cu126"
else
PYTORCH_INDEX="https://download.pytorch.org/whl/cu130"
fi

TORCH_WHEEL_DIR=$(mktemp -d)
rapids-pip-retry download \
--prefer-binary \
--no-deps \
-d "${TORCH_WHEEL_DIR}" \
--constraint "${PIP_CONSTRAINT}" \
--index-url "${PYTORCH_INDEX}" \
'torch'

echo "torch @ file://$(echo ${TORCH_WHEEL_DIR}/torch-*.whl)" >> "${PIP_CONSTRAINT}"
16 changes: 9 additions & 7 deletions ci/test_wheel_cugraph-pyg.sh
Original file line number Diff line number Diff line change
Expand Up @@ -15,13 +15,15 @@ LIBWHOLEGRAPH_WHEELHOUSE=$(RAPIDS_PY_WHEEL_NAME="libwholegraph_${RAPIDS_PY_CUDA_
PYLIBWHOLEGRAPH_WHEELHOUSE=$(rapids-download-from-github "$(rapids-package-name "wheel_python" pylibwholegraph --stable --cuda "$RAPIDS_CUDA_VERSION")")
CUGRAPH_PYG_WHEELHOUSE=$(RAPIDS_PY_WHEEL_NAME="${package_name}_${RAPIDS_PY_CUDA_SUFFIX}" RAPIDS_PY_WHEEL_PURE="1" rapids-download-wheels-from-github python)

CUDA_MAJOR="${RAPIDS_CUDA_VERSION%%.*}"
# generate constraints, accounting for 'oldest' and 'latest' dependencies
rapids-dependency-file-generator \
--output requirements \
--file-key "test_cugraph_pyg" \
--matrix "cuda=${RAPIDS_CUDA_VERSION%.*};arch=$(arch);py=${RAPIDS_PY_VERSION};dependencies=${RAPIDS_DEPENDENCIES};include_torch_extra_index=false" \
| tee "${PIP_CONSTRAINT}"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a new one for me 😭

× Dependency resolution exceeded maximum depth
╰─> Pip cannot resolve the current dependencies as the dependency graph is too complex for pip to solve efficiently.

hint: Try adding lower bounds to constrain your dependencies, for example: 'package>=2.0.0' instead of just 'package'.

(build link)

All cugraph-pyg wheel tests are failing like this, not only the oldest dependencies one.

Example constraints file (not including all the requirements of all these packages):

--extra-index-url=https://pypi.anaconda.org/rapidsai-wheels-nightly/simple
--extra-index-url=https://pypi.nvidia.com/
cudf==26.4.*,>=0.0.0a0
cugraph==26.4.*,>=0.0.0a0
cuml==26.4.*,>=0.0.0a0
ogb
pylibwholegraph==26.4.*,>=0.0.0a0
pytest-benchmark
pytest-cov
pytest-xdist
pytest<9.0.0
sentence-transformers
torch>=2.9.0

I'll try that advice from the error message, let's see if it'll help us get a little farther.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting further with local testing, adding more pins to force out some solver errors.

test code (click me)
docker run \
    --rm \
    --gpus all \
    --env GH_TOKEN=$(gh auth token) \
    --env RAPIDS_BUILD_TYPE="pull-request" \
    --env RAPIDS_REPOSITORY="rapidsai/cugraph-gnn" \
    -v $(pwd):/opt/work \
    -w /opt/work \
    -it rapidsai/citestwheel:26.04-cuda12.9.1-rockylinux8-py3.11 \
    bash

source rapids-init-pip

package_name="cugraph-pyg"

RAPIDS_PY_CUDA_SUFFIX="$(rapids-wheel-ctk-name-gen ${RAPIDS_CUDA_VERSION})"

# Download the libwholegraph, pylibwholegraph, and cugraph-pyg built in the previous step
COMMIT_ID=843296e5e99ebb017e3a4a63b046abfc672ce279

LIBWHOLEGRAPH_WHEELHOUSE=$(
  RAPIDS_PY_WHEEL_NAME="libwholegraph_${RAPIDS_PY_CUDA_SUFFIX}" rapids-get-pr-artifact cugraph-gnn 413 cpp wheel "${COMMIT_ID}"
)
PYLIBWHOLEGRAPH_WHEELHOUSE=$(
  rapids-get-pr-artifact cugraph-gnn 413 python wheel --pkg_name pylibwholegraph --stable "${COMMIT_ID}"
)
CUGRAPH_PYG_WHEELHOUSE=$(
    RAPIDS_PY_WHEEL_NAME="cugraph-pyg_cu12" RAPIDS_PY_WHEEL_PURE="1" rapids-get-pr-artifact cugraph-gnn 413 python wheel "${COMMIT_ID}"
)

# generate constraints, accounting for 'oldest' and 'latest' dependencies
rapids-dependency-file-generator \
    --output requirements \
    --file-key "test_cugraph_pyg" \
    --matrix "cuda=${RAPIDS_CUDA_VERSION%.*};arch=$(arch);py=${RAPIDS_PY_VERSION};dependencies=${RAPIDS_DEPENDENCIES};include_torch_extra_index=false" \
| tee "${PIP_CONSTRAINT}"

# ensure a CUDA variant of 'torch' is used
./ci/download-torch-wheels.sh

# notes:
#
#   * echo to expand wildcard before adding `[extra]` requires for pip
#   * '--extra-index-url pypi.nvidia.com' can be removed when 'cugraph' and
#     its dependencies are available from pypi.org
#
rapids-pip-retry install \
    --dry-run \
    -v \
    --constraint "${PIP_CONSTRAINT}" \
    --extra-index-url 'https://pypi.nvidia.com' \
    "${LIBWHOLEGRAPH_WHEELHOUSE}"/*.whl \
    "$(echo "${PYLIBWHOLEGRAPH_WHEELHOUSE}"/pylibwholegraph_"${RAPIDS_PY_CUDA_SUFFIX}"*.whl)" \
    "$(echo "${CUGRAPH_PYG_WHEELHOUSE}"/cugraph_pyg_"${RAPIDS_PY_CUDA_SUFFIX}"*.whl)[test]" \
    "cuda-bindings[all]==12.9.4" \
    "cudf-cu12==26.4.0a289" \
    "cugraph-cu12==26.4.0a30" \
    "cuml-cu12==26.4.0a77" \
    "dask-cuda==26.4.0a18" \
    "distributed-ucxx-cu12==0.49.0a20" \
    "libcudf-cu12==26.4.0a289" \
    "libcugraph-cu12==26.4.0a30" \
    "libcuml-cu12==26.4.0a77" \
    "libucxx-cu12==0.49.0a20" \
    "numba-cuda[cu12]==0.27.0" \
    "pylibcugraph-cu12==26.4.0a30" \
    "pylibcudf-cu12==26.4.0a289" \
    "pylibraft-cu12==26.4.0a34" \
    "raft-dask-cu12==26.4.0a33" \
    "rapids-dask-dependency==26.4.0a7" \
    "rmm-cu12==26.4.0a30" \
    "ucxx-cu12==0.49.0a20"

I think torch's very tight pinnings are leading to these expensive solves.

TORCH_WHEEL_DIR=$(mktemp -d)
rapids-pip-retry download \
  --prefer-binary \
  --no-deps \
  -d "${TORCH_WHEEL_DIR}" \
  --index-url "https://download.pytorch.org/whl/cu126" \
  'torch==2.10'

pushd "${TORCH_WHEEL_DIR}"
pip install pkginfo
$ pkginfo --json *.whl
    "cuda-bindings==12.9.4; platform_system == \"Linux\"",
    "nvidia-cuda-nvrtc-cu12==12.6.77; platform_system == \"Linux\"",
    "nvidia-cuda-runtime-cu12==12.6.77; platform_system == \"Linux\"",
    "nvidia-cuda-cupti-cu12==12.6.80; platform_system == \"Linux\"",
    "nvidia-cudnn-cu12==9.10.2.21; platform_system == \"Linux\"",
    "nvidia-cublas-cu12==12.6.4.1; platform_system == \"Linux\"",
    "nvidia-cufft-cu12==11.3.0.4; platform_system == \"Linux\"",
    "nvidia-curand-cu12==10.3.7.77; platform_system == \"Linux\"",
    "nvidia-cusolver-cu12==11.7.1.2; platform_system == \"Linux\"",
    "nvidia-cusparse-cu12==12.5.4.2; platform_system == \"Linux\"",
    "nvidia-cusparselt-cu12==0.7.1; platform_system == \"Linux\"",
    "nvidia-nccl-cu12==2.27.5; platform_system == \"Linux\"",
    "nvidia-nvshmem-cu12==3.4.5; platform_system == \"Linux\"",
    "nvidia-nvtx-cu12==12.6.77; platform_system == \"Linux\"",
    "nvidia-nvjitlink-cu12==12.6.85; platform_system == \"Linux\"",
    "nvidia-cufile-cu12==1.11.1.6; platform_system == \"Linux\"",
    "triton==3.6.0; platform_system == \"Linux\"",

Pinning to the latest versions of RAPIDS nightlies as well as a few other packages is yielding solver errors like this:

ERROR: Cannot install cuda-bindings[all]==12.9.4, cuda-toolkit[cccl,cudart,nvcc,nvjitlink,nvrtc]==12.0.0, cuda-toolkit[cccl,cudart,nvcc,nvjitlink,nvrtc]==12.0.1, cuda-toolkit[cccl,cudart,nvcc,nvjitlink,nvrtc]==12.1.0, cuda-toolkit[cccl,cudart,nvcc,nvjitlink,nvrtc]==12.1.1, cuda-toolkit[cccl,cudart,nvcc,nvjitlink,nvrtc]==12.2.0, cuda-toolkit[cccl,cudart,nvcc,nvjitlink,nvrtc]==12.2.1, cuda-toolkit[cccl,cudart,nvcc,nvjitlink,nvrtc]==12.2.2, cuda-toolkit[cccl,cudart,nvcc,nvjitlink,nvrtc]==12.9.1, cudf-cu12==26.4.0a289, cuml-cu12==26.4.0a77, libcuml-cu12==26.4.0a77 and numba-cuda[cu12]==0.27.0 because these package versions have conflicting dependencies.

The conflict is caused by:
    cudf-cu12 26.4.0a289 depends on cuda-toolkit==12.*
    cuml-cu12 26.4.0a77 depends on cuda-toolkit==12.*
    libcuml-cu12 26.4.0a77 depends on cuda-toolkit==12.*
    numba-cuda[cu12] 0.27.0 depends on cuda-toolkit==12.*; extra == "cu12"
    cuda-toolkit[cccl,cudart,nvcc,nvjitlink,nvrtc] 12.9.1 depends on cuda-toolkit 12.9.1 (from https://pypi.nvidia.com/cuda-toolkit/cuda_toolkit-12.9.1-py2.py3-none-any.whl#sha256=0c8636dfacbecfe9867a949a211864f080a805bc54023ce4a361aa4e1fd8738b (from https://pypi.nvidia.com/cuda-toolkit/))
    cuda-bindings[all] 12.9.4 depends on nvidia-nvjitlink-cu12>=12.3; extra == "all"
    cuda-toolkit[cccl,cudart,nvcc,nvjitlink,nvrtc] 12.2.2 depends on nvidia-nvjitlink-cu12==12.2.140.*; (sys_platform == "win32" or sys_platform == "linux") and extra == "nvjitlink"
    cuda-toolkit[cccl,cudart,nvcc,nvjitlink,nvrtc] 12.2.1 depends on nvidia-nvjitlink-cu12==12.2.128.*; (sys_platform == "win32" or sys_platform == "linux") and extra == "nvjitlink"
    cuda-toolkit[cccl,cudart,nvcc,nvjitlink,nvrtc] 12.2.0 depends on nvidia-nvjitlink-cu12==12.2.91.*; (sys_platform == "linux" or sys_platform == "win32") and extra == "nvjitlink"
    cuda-toolkit[cccl,cudart,nvcc,nvjitlink,nvrtc] 12.1.1 depends on nvidia-nvjitlink-cu12==12.1.105.*; (sys_platform == "win32" or sys_platform == "linux") and extra == "nvjitlink"
    cuda-toolkit[cccl,cudart,nvcc,nvjitlink,nvrtc] 12.1.0 depends on nvidia-nvjitlink-cu12==12.1.55.*; (sys_platform == "win32" or sys_platform == "linux") and extra == "nvjitlink"
    cuda-toolkit[cccl,cudart,nvcc,nvjitlink,nvrtc] 12.0.1 depends on nvidia-nvjitlink-cu12==12.0.140.*; (sys_platform == "linux" or sys_platform == "win32") and extra == "nvjitlink"
    cuda-toolkit[cccl,cudart,nvcc,nvjitlink,nvrtc] 12.0.0 depends on nvidia-nvjitlink-cu12==12.0.76.*; (sys_platform == "win32" or sys_platform == "linux") and extra == "nvjitlink"

Additionally, some packages in these conflicts have no matching distributions available for your environment:
    cuda-toolkit
    nvidia-nvjitlink-cu12

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip to attempt to solve the dependency conflict

ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts

Looks like in recent successful runs on main, the jobs are falling back to torch==2.9.1 wheels even though 2.10.0 wheels are available: https://github.com/rapidsai/cugraph-gnn/actions/runs/22192581186/job/64185894306#step:13:838

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pushed eb6be78 adding a ceiling of torch<2.10.

Let's just see if that allows all the environments to be solved. If it does, maybe it's worth putting that ceiling in place temporarily and handling removing it as a follow-up issue / PR (to at least get nightly tests working again here).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oy, this is brutal.

CI is still failing here and I see pip backtracking over a bunch of different versions of cuda-pathfinder, cuda-toolkit, and RAPIDS libraries.

I'm still testing locally, let's see if I can find a different path through this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! Ok had an idea, I think this gets us further along here.

Setting that locally-downloaded torch file as a constraint means it enters pip's resolution algorithm pretty late in the process. Passing it as a requirement upfront gets it and all of its requirements into pip's solution early, which makes the search space small enough that instead of resolution-too-deep, we get a more informative solver error.

Pushed a commit doing that: 4e923d4

Locally, I got something like this:

The conflict is caused by:
    cudf-cu12 26.4.0a289 depends on cuda-toolkit==12.*
    cuml-cu12 26.4.0a78 depends on cuda-toolkit==12.*
    libcuml-cu12 26.4.0a78 depends on cuda-toolkit==12.*
    libraft-cu12 26.4.0a33 depends on cuda-toolkit==12.*
    cuda-toolkit[cublas,cufft,curand,cusolver,cusparse,nvjitlink] 12.9.1 depends on cuda-toolkit 12.9.1 (from https://pypi.nvidia.com/cuda-toolkit/cuda_toolkit-12.9.1-py2.py3-none-any.whl#sha256=0c8636dfacbecfe9867a949a211864f080a805bc54023ce4a361aa4e1fd8738b (from https://pypi.nvidia.com/cuda-toolkit/))
    torch 2.9.1+cu126 depends on nvidia-cublas-cu12==12.6.4.1; platform_system == "Linux"
    nvidia-cudnn-cu12 9.10.2.21 depends on nvidia-cublas-cu12
    nvidia-cusolver-cu12 11.7.1.2 depends on nvidia-cublas-cu12
    cuda-toolkit[cublas,cufft,curand,cusolver,cusparse,nvjitlink] 12.9.0 depends on nvidia-cublas-cu12==12.9.0.13.*; (sys_platform == "win32" or sys_platform == "linux") and extra == "cublas"
...
    cuda-toolkit[cublas,cufft,curand,cusolver,cusparse,nvjitlink] 12.6.1 depends on nvidia-cublas-cu12==12.6.1.4.*; (sys_platform == "linux" or sys_platform == "win32") and extra == "cublas"
...
    cuda-toolkit[cublas,cufft,curand,cusolver,cusparse,nvjitlink] 12.0.0 depends on nvidia-cublas-cu12==12.0.1.189.*; (sys_platform == "win32" or sys_platform == "linux") and extra == "cublas"

Additionally, some packages in these conflicts have no matching distributions available for your environment:
    cuda-toolkit
    nvidia-cublas-cu12

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip to attempt to solve the dependency conflict

ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts

This is saying "you asked me to install cuda-toolkit==12.9.1, but its nvidia-cublas-cu12 pin is incompatible with torch's nvidia-cublas-cu12==12.6.4.1".

We can work with this! Just need to figure out where that cuda-toolkit==12.9.1 is coming from.

Copy link
Member Author

@jameslamb jameslamb Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok here's an interesting clue... looks like in recent successful cugraph-pyg runs, CUDA torch might have been getting replaced with a CPU-only one from pypi.org:

...
  Downloading http://pip-cache.local.gha-runners.nvidia.com/packages/56/be/76eaa36c9cd032d3b01b001e2c5a05943df75f26211f68fae79e62f87734/torch-2.9.1-cp311-cp311-manylinux_2_28_x86_64.whl.metadata (30 kB)
...

(build link)

That would explain why I'm not able to get the environment to solve with similar versions as were found in those jobs!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a known issue (I was just late to it), where some CUDA 12 torch wheels were not installable alongside ANY cuda-toolkit wheels because they mixed == pins across CTK versions.

Documented that here: rapidsai/build-planning#255

I've pushed commits here pinning to specific known-compatible, CUDA variant torch wheels in wheel testing... hopefully that will work.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok lots of wheel tests are passing now! All pylibwholegraph and CUDA 12 cugraph-pyg tests are looking good (using the nightly matrix).

Looks like there was another issue hiding in here though... cugraph-pyg CUDA 13 wheel tests are failing like this:

/__w/cugraph-gnn/cugraph-gnn/python/cugraph-pyg/cugraph_pyg /__w/cugraph-gnn/cugraph-gnn
ImportError while loading conftest '/__w/cugraph-gnn/cugraph-gnn/python/cugraph-pyg/cugraph_pyg/tests/conftest.py'.
tests/conftest.py:9: in <module>
    from pylibcugraph.comms import (
/pyenv/versions/3.12.12/lib/python3.12/site-packages/pylibcugraph/__init__.py:15: in <module>
    import pylibcugraph.comms
/pyenv/versions/3.12.12/lib/python3.12/site-packages/pylibcugraph/comms/__init__.py:4: in <module>
    from .comms_wrapper import init_subcomms
E   ImportError: libcugraph.so: cannot open shared object file: No such file or directory
Error: Process completed with exit code 4.

(build link)

Ignore "No such file or directory", that's misleading (we'll fix that in rapidsai/build-planning#119 at some point).

The real issue is that libcugraph.so cannot be loaded. I've opened an issue about it here: rapidsai/cugraph#5443


if [[ "${CUDA_MAJOR}" == "12" ]]; then
PYTORCH_INDEX="https://download.pytorch.org/whl/cu126"
else
PYTORCH_INDEX="https://download.pytorch.org/whl/cu130"
fi
# ensure a CUDA variant of 'torch' is used
./ci/download-torch-wheels.sh

# notes:
#
Expand All @@ -31,7 +33,7 @@ fi
#
rapids-pip-retry install \
-v \
--extra-index-url "${PYTORCH_INDEX}" \
--constraint "${PIP_CONSTRAINT}" \
--extra-index-url 'https://pypi.nvidia.com' \
"${LIBWHOLEGRAPH_WHEELHOUSE}"/*.whl \
"$(echo "${PYLIBWHOLEGRAPH_WHEELHOUSE}"/pylibwholegraph_"${RAPIDS_PY_CUDA_SUFFIX}"*.whl)" \
Expand Down
25 changes: 13 additions & 12 deletions ci/test_wheel_pylibwholegraph.sh
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,7 @@
# SPDX-FileCopyrightText: Copyright (c) 2023-2026, NVIDIA CORPORATION.
# SPDX-License-Identifier: Apache-2.0

set -e # abort the script on error
set -o pipefail # piped commands propagate their error
set -E # ERR traps are inherited by subcommands
set -euo pipefail

# Delete system libnccl.so to ensure the wheel is used.
# (but only do this in CI, to avoid breaking local dev environments)
Expand All @@ -18,23 +16,26 @@ RAPIDS_PY_CUDA_SUFFIX="$(rapids-wheel-ctk-name-gen ${RAPIDS_CUDA_VERSION})"
LIBWHOLEGRAPH_WHEELHOUSE=$(RAPIDS_PY_WHEEL_NAME="libwholegraph_${RAPIDS_PY_CUDA_SUFFIX}" rapids-download-wheels-from-github cpp)
PYLIBWHOLEGRAPH_WHEELHOUSE=$(rapids-download-from-github "$(rapids-package-name "wheel_python" pylibwholegraph --stable --cuda "$RAPIDS_CUDA_VERSION")")

# determine pytorch source
if [[ "${CUDA_MAJOR}" == "12" ]]; then
PYTORCH_INDEX="https://download.pytorch.org/whl/cu126"
else
PYTORCH_INDEX="https://download.pytorch.org/whl/cu130"
fi
RAPIDS_TESTS_DIR=${RAPIDS_TESTS_DIR:-"${PWD}/test-results"}
RAPIDS_COVERAGE_DIR=${RAPIDS_COVERAGE_DIR:-"${PWD}/coverage-results"}
mkdir -p "${RAPIDS_TESTS_DIR}" "${RAPIDS_COVERAGE_DIR}"

# generate constraints, accounting for 'oldest' and 'latest' dependencies
rapids-dependency-file-generator \
--output requirements \
--file-key "test_pylibwholegraph" \
--matrix "cuda=${RAPIDS_CUDA_VERSION%.*};arch=$(arch);py=${RAPIDS_PY_VERSION};dependencies=${RAPIDS_DEPENDENCIES};include_torch_extra_index=false" \
| tee "${PIP_CONSTRAINT}"

# ensure a CUDA variant of 'torch' is used
./ci/download-torch-wheels.sh

# echo to expand wildcard before adding `[extra]` requires for pip
rapids-logger "Installing Packages"
rapids-pip-retry install \
--extra-index-url ${PYTORCH_INDEX} \
--constraint "${PIP_CONSTRAINT}" \
"$(echo "${PYLIBWHOLEGRAPH_WHEELHOUSE}"/pylibwholegraph*.whl)[test]" \
"${LIBWHOLEGRAPH_WHEELHOUSE}"/*.whl \
'torch>=2.3'
"${LIBWHOLEGRAPH_WHEELHOUSE}"/*.whl

rapids-logger "pytest pylibwholegraph"
cd python/pylibwholegraph/pylibwholegraph/tests
Expand Down
33 changes: 27 additions & 6 deletions dependencies.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -135,6 +135,7 @@ files:
table: project.optional-dependencies
key: test
includes:
- depends_on_pytorch
- test_python_common
- test_python_pylibwholegraph
py_build_cugraph_pyg:
Expand Down Expand Up @@ -324,6 +325,10 @@ dependencies:
- *cmake_ver
test_python_common:
common:
- output_types: [conda]
packages:
- torchdata
- pydantic
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just moving this here so depends_on_pytorch only ever contains torch / pytorch.

This test_python_common group is used everywhere that depends_on_pytorch is.

- output_types: [conda, pyproject, requirements]
packages:
- pytest<9.0.0
Expand All @@ -343,14 +348,15 @@ dependencies:
- pytest-forked
- scipy
depends_on_pytorch:
common:
- output_types: [conda]
packages:
- torchdata
- pydantic
specific:
- output_types: [requirements]
matrices:
# If 'include_torch_extra_index=false' is passed, avoid these --extra-index-url.
# (useful in CI scripts where we want to tightly control which indices 'pip' uses).
- matrix:
include_torch_extra_index: "false"
packages:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rapids-dependency-file-generator uses the first matching matrix (see https://github.com/rapidsai/dependency-file-generator?tab=readme-ov-file#how-dependency-lists-are-merged).

This will only affect cases where include_torch_extra_index=false is passed (as in CI here). Other cases (like RAPIDS devcontainers) will fall through to othe groups that pull in --extra-index-url lines.

So this should not break any other uses of this file.

# otherwise, choose a CUDA-version-specific index
- matrix:
cuda: "12.*"
packages:
Expand All @@ -374,13 +380,28 @@ dependencies:
- matrix:
packages:
- *pytorch_pip
- output_types: [requirements]
matrices:
- matrix:
cuda: "12.*"
dependencies: "oldest"
packages:
# 2.6.0 is the oldest version on https://download.pytorch.org/whl/cu126 with CUDA wheels
- torch==2.6.0
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexbarghi-nv see this note.

There aren't CUDA 12 wheels available for PyTorch older than 2.6.0.

pip download \
  --isolated \
  --no-deps \
  --index-url=https://download.pytorch.org/whl/cu126 \
  'torch==2.3.0'

# ERROR: Could not find a version that satisfies the requirement torch==2.3.0
# (from versions: 2.6.0+cu126, 2.7.0+cu126, 2.7.1+cu126, 2.8.0+cu126, 2.9.0+cu126, 2.9.1+cu126, 2.10.0+cu126)

Do you want to bump the floor in dependency metadata here to >=2.6.0? Or to leave it at >=2.3 so that these libraries are still installable alongside older PyTorch releases (for example, if people build PyTorch 2.4 from source)?

Your call.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would vote for bumping the floor to >=2.6.0. It's a little over a year old at this point. https://github.com/pytorch/pytorch/releases/tag/v2.6.0

- matrix:
cuda: "13.*"
dependencies: "oldest"
packages:
- torch==2.9.0
- matrix:
packages:
- output_types: [conda]
matrices:
# Prevent fallback to CPU-only pytorch when we want a CUDA variant.
- matrix:
require_gpu: "true"
packages:
- pytorch-gpu
- pytorch-gpu >=2.3
# Default to falling back to whatever 'pytorch' is pulled in via cugraph-pyg's dependencies.
- matrix:
packages:
Expand Down
1 change: 1 addition & 0 deletions python/pylibwholegraph/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ test = [
"pytest-xdist",
"pytest<9.0.0",
"scipy",
"torch>=2.9.0",
] # This list was generated by `rapids-dependency-file-generator`. To make changes, edit ../../dependencies.yaml and run `rapids-dependency-file-generator`.

[tool.rapids-build-backend]
Expand Down
Loading