Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .github/workflows/pr.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -257,6 +257,7 @@ jobs:
build_type: pull-request
script: ci/test_wheel_pylibwholegraph.sh
matrix_filter: map(select(.ARCH == "amd64"))
matrix_type: 'nightly'
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
matrix_type: 'nightly'

TODO: revert this. Just added here for testing, to confirm this will fix the issues we've been seeing in nightlies.

wheel-build-cugraph-pyg:
needs: checks
secrets: inherit
Expand All @@ -279,3 +280,4 @@ jobs:
build_type: pull-request
script: ci/test_wheel_cugraph-pyg.sh
matrix_filter: map(select(.ARCH == "amd64"))
matrix_type: 'nightly'
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
matrix_type: 'nightly'

Revert before merging.

2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,8 @@ wheels/
wheelhouse/
_skbuild/
cufile.log
*.tar.gz
*.whl

## Patching
*.diff
Expand Down
34 changes: 34 additions & 0 deletions ci/download-torch-wheels.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
#!/bin/bash
# SPDX-FileCopyrightText: Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0

# [description]
#
# Downloads a CUDA variant of 'torch' from the correct index, based on CUDA major version.
#
# This exists to avoid using 'pip --extra-index-url', which could allow for CPU-only 'torch'
# to be downloaded from pypi.org.
#

set -e -u -o pipefail

TORCH_WHEEL_DIR="${1}"

# Ensure CUDA-enabled 'torch' packages are always used.
#
# Downloading + passing the downloaded file as a requirement forces the use of this
# package, so we don't accidentally end up with a CPU-only 'torch' from 'pypi.org'
# (which can happen because pip doesn't support index priority).
rapids-dependency-file-generator \
--output requirements \
--file-key "torch_only" \
--matrix "cuda=${RAPIDS_CUDA_VERSION%.*};arch=$(arch);py=${RAPIDS_PY_VERSION};dependencies=${RAPIDS_DEPENDENCIES};include_torch_extra_index=false;require_gpu=true" \
| tee "${PIP_CONSTRAINT}"

rapids-pip-retry download \
--isolated \
--prefer-binary \
--no-deps \
-d "${TORCH_WHEEL_DIR}" \
--constraint "${PIP_CONSTRAINT}" \
'torch'
22 changes: 13 additions & 9 deletions ci/test_wheel_cugraph-pyg.sh
Original file line number Diff line number Diff line change
Expand Up @@ -15,13 +15,16 @@ LIBWHOLEGRAPH_WHEELHOUSE=$(RAPIDS_PY_WHEEL_NAME="libwholegraph_${RAPIDS_PY_CUDA_
PYLIBWHOLEGRAPH_WHEELHOUSE=$(rapids-download-from-github "$(rapids-package-name "wheel_python" pylibwholegraph --stable --cuda "$RAPIDS_CUDA_VERSION")")
CUGRAPH_PYG_WHEELHOUSE=$(RAPIDS_PY_WHEEL_NAME="${package_name}_${RAPIDS_PY_CUDA_SUFFIX}" RAPIDS_PY_WHEEL_PURE="1" rapids-download-wheels-from-github python)

CUDA_MAJOR="${RAPIDS_CUDA_VERSION%%.*}"
# generate constraints, accounting for 'oldest' and 'latest' dependencies
rapids-dependency-file-generator \
--output requirements \
--file-key "test_cugraph_pyg" \
--matrix "cuda=${RAPIDS_CUDA_VERSION%.*};arch=$(arch);py=${RAPIDS_PY_VERSION};dependencies=${RAPIDS_DEPENDENCIES};include_torch_extra_index=false" \
| tee "${PIP_CONSTRAINT}"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a new one for me 😭

× Dependency resolution exceeded maximum depth
╰─> Pip cannot resolve the current dependencies as the dependency graph is too complex for pip to solve efficiently.

hint: Try adding lower bounds to constrain your dependencies, for example: 'package>=2.0.0' instead of just 'package'.

(build link)

All cugraph-pyg wheel tests are failing like this, not only the oldest dependencies one.

Example constraints file (not including all the requirements of all these packages):

--extra-index-url=https://pypi.anaconda.org/rapidsai-wheels-nightly/simple
--extra-index-url=https://pypi.nvidia.com/
cudf==26.4.*,>=0.0.0a0
cugraph==26.4.*,>=0.0.0a0
cuml==26.4.*,>=0.0.0a0
ogb
pylibwholegraph==26.4.*,>=0.0.0a0
pytest-benchmark
pytest-cov
pytest-xdist
pytest<9.0.0
sentence-transformers
torch>=2.9.0

I'll try that advice from the error message, let's see if it'll help us get a little farther.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting further with local testing, adding more pins to force out some solver errors.

test code (click me)
docker run \
    --rm \
    --gpus all \
    --env GH_TOKEN=$(gh auth token) \
    --env RAPIDS_BUILD_TYPE="pull-request" \
    --env RAPIDS_REPOSITORY="rapidsai/cugraph-gnn" \
    -v $(pwd):/opt/work \
    -w /opt/work \
    -it rapidsai/citestwheel:26.04-cuda12.9.1-rockylinux8-py3.11 \
    bash

source rapids-init-pip

package_name="cugraph-pyg"

RAPIDS_PY_CUDA_SUFFIX="$(rapids-wheel-ctk-name-gen ${RAPIDS_CUDA_VERSION})"

# Download the libwholegraph, pylibwholegraph, and cugraph-pyg built in the previous step
COMMIT_ID=843296e5e99ebb017e3a4a63b046abfc672ce279

LIBWHOLEGRAPH_WHEELHOUSE=$(
  RAPIDS_PY_WHEEL_NAME="libwholegraph_${RAPIDS_PY_CUDA_SUFFIX}" rapids-get-pr-artifact cugraph-gnn 413 cpp wheel "${COMMIT_ID}"
)
PYLIBWHOLEGRAPH_WHEELHOUSE=$(
  rapids-get-pr-artifact cugraph-gnn 413 python wheel --pkg_name pylibwholegraph --stable "${COMMIT_ID}"
)
CUGRAPH_PYG_WHEELHOUSE=$(
    RAPIDS_PY_WHEEL_NAME="cugraph-pyg_cu12" RAPIDS_PY_WHEEL_PURE="1" rapids-get-pr-artifact cugraph-gnn 413 python wheel "${COMMIT_ID}"
)

# generate constraints, accounting for 'oldest' and 'latest' dependencies
rapids-dependency-file-generator \
    --output requirements \
    --file-key "test_cugraph_pyg" \
    --matrix "cuda=${RAPIDS_CUDA_VERSION%.*};arch=$(arch);py=${RAPIDS_PY_VERSION};dependencies=${RAPIDS_DEPENDENCIES};include_torch_extra_index=false" \
| tee "${PIP_CONSTRAINT}"

# ensure a CUDA variant of 'torch' is used
./ci/download-torch-wheels.sh

# notes:
#
#   * echo to expand wildcard before adding `[extra]` requires for pip
#   * '--extra-index-url pypi.nvidia.com' can be removed when 'cugraph' and
#     its dependencies are available from pypi.org
#
rapids-pip-retry install \
    --dry-run \
    -v \
    --constraint "${PIP_CONSTRAINT}" \
    --extra-index-url 'https://pypi.nvidia.com' \
    "${LIBWHOLEGRAPH_WHEELHOUSE}"/*.whl \
    "$(echo "${PYLIBWHOLEGRAPH_WHEELHOUSE}"/pylibwholegraph_"${RAPIDS_PY_CUDA_SUFFIX}"*.whl)" \
    "$(echo "${CUGRAPH_PYG_WHEELHOUSE}"/cugraph_pyg_"${RAPIDS_PY_CUDA_SUFFIX}"*.whl)[test]" \
    "cuda-bindings[all]==12.9.4" \
    "cudf-cu12==26.4.0a289" \
    "cugraph-cu12==26.4.0a30" \
    "cuml-cu12==26.4.0a77" \
    "dask-cuda==26.4.0a18" \
    "distributed-ucxx-cu12==0.49.0a20" \
    "libcudf-cu12==26.4.0a289" \
    "libcugraph-cu12==26.4.0a30" \
    "libcuml-cu12==26.4.0a77" \
    "libucxx-cu12==0.49.0a20" \
    "numba-cuda[cu12]==0.27.0" \
    "pylibcugraph-cu12==26.4.0a30" \
    "pylibcudf-cu12==26.4.0a289" \
    "pylibraft-cu12==26.4.0a34" \
    "raft-dask-cu12==26.4.0a33" \
    "rapids-dask-dependency==26.4.0a7" \
    "rmm-cu12==26.4.0a30" \
    "ucxx-cu12==0.49.0a20"

I think torch's very tight pinnings are leading to these expensive solves.

TORCH_WHEEL_DIR=$(mktemp -d)
rapids-pip-retry download \
  --prefer-binary \
  --no-deps \
  -d "${TORCH_WHEEL_DIR}" \
  --index-url "https://download.pytorch.org/whl/cu126" \
  'torch==2.10'

pushd "${TORCH_WHEEL_DIR}"
pip install pkginfo
$ pkginfo --json *.whl
    "cuda-bindings==12.9.4; platform_system == \"Linux\"",
    "nvidia-cuda-nvrtc-cu12==12.6.77; platform_system == \"Linux\"",
    "nvidia-cuda-runtime-cu12==12.6.77; platform_system == \"Linux\"",
    "nvidia-cuda-cupti-cu12==12.6.80; platform_system == \"Linux\"",
    "nvidia-cudnn-cu12==9.10.2.21; platform_system == \"Linux\"",
    "nvidia-cublas-cu12==12.6.4.1; platform_system == \"Linux\"",
    "nvidia-cufft-cu12==11.3.0.4; platform_system == \"Linux\"",
    "nvidia-curand-cu12==10.3.7.77; platform_system == \"Linux\"",
    "nvidia-cusolver-cu12==11.7.1.2; platform_system == \"Linux\"",
    "nvidia-cusparse-cu12==12.5.4.2; platform_system == \"Linux\"",
    "nvidia-cusparselt-cu12==0.7.1; platform_system == \"Linux\"",
    "nvidia-nccl-cu12==2.27.5; platform_system == \"Linux\"",
    "nvidia-nvshmem-cu12==3.4.5; platform_system == \"Linux\"",
    "nvidia-nvtx-cu12==12.6.77; platform_system == \"Linux\"",
    "nvidia-nvjitlink-cu12==12.6.85; platform_system == \"Linux\"",
    "nvidia-cufile-cu12==1.11.1.6; platform_system == \"Linux\"",
    "triton==3.6.0; platform_system == \"Linux\"",

Pinning to the latest versions of RAPIDS nightlies as well as a few other packages is yielding solver errors like this:

ERROR: Cannot install cuda-bindings[all]==12.9.4, cuda-toolkit[cccl,cudart,nvcc,nvjitlink,nvrtc]==12.0.0, cuda-toolkit[cccl,cudart,nvcc,nvjitlink,nvrtc]==12.0.1, cuda-toolkit[cccl,cudart,nvcc,nvjitlink,nvrtc]==12.1.0, cuda-toolkit[cccl,cudart,nvcc,nvjitlink,nvrtc]==12.1.1, cuda-toolkit[cccl,cudart,nvcc,nvjitlink,nvrtc]==12.2.0, cuda-toolkit[cccl,cudart,nvcc,nvjitlink,nvrtc]==12.2.1, cuda-toolkit[cccl,cudart,nvcc,nvjitlink,nvrtc]==12.2.2, cuda-toolkit[cccl,cudart,nvcc,nvjitlink,nvrtc]==12.9.1, cudf-cu12==26.4.0a289, cuml-cu12==26.4.0a77, libcuml-cu12==26.4.0a77 and numba-cuda[cu12]==0.27.0 because these package versions have conflicting dependencies.

The conflict is caused by:
    cudf-cu12 26.4.0a289 depends on cuda-toolkit==12.*
    cuml-cu12 26.4.0a77 depends on cuda-toolkit==12.*
    libcuml-cu12 26.4.0a77 depends on cuda-toolkit==12.*
    numba-cuda[cu12] 0.27.0 depends on cuda-toolkit==12.*; extra == "cu12"
    cuda-toolkit[cccl,cudart,nvcc,nvjitlink,nvrtc] 12.9.1 depends on cuda-toolkit 12.9.1 (from https://pypi.nvidia.com/cuda-toolkit/cuda_toolkit-12.9.1-py2.py3-none-any.whl#sha256=0c8636dfacbecfe9867a949a211864f080a805bc54023ce4a361aa4e1fd8738b (from https://pypi.nvidia.com/cuda-toolkit/))
    cuda-bindings[all] 12.9.4 depends on nvidia-nvjitlink-cu12>=12.3; extra == "all"
    cuda-toolkit[cccl,cudart,nvcc,nvjitlink,nvrtc] 12.2.2 depends on nvidia-nvjitlink-cu12==12.2.140.*; (sys_platform == "win32" or sys_platform == "linux") and extra == "nvjitlink"
    cuda-toolkit[cccl,cudart,nvcc,nvjitlink,nvrtc] 12.2.1 depends on nvidia-nvjitlink-cu12==12.2.128.*; (sys_platform == "win32" or sys_platform == "linux") and extra == "nvjitlink"
    cuda-toolkit[cccl,cudart,nvcc,nvjitlink,nvrtc] 12.2.0 depends on nvidia-nvjitlink-cu12==12.2.91.*; (sys_platform == "linux" or sys_platform == "win32") and extra == "nvjitlink"
    cuda-toolkit[cccl,cudart,nvcc,nvjitlink,nvrtc] 12.1.1 depends on nvidia-nvjitlink-cu12==12.1.105.*; (sys_platform == "win32" or sys_platform == "linux") and extra == "nvjitlink"
    cuda-toolkit[cccl,cudart,nvcc,nvjitlink,nvrtc] 12.1.0 depends on nvidia-nvjitlink-cu12==12.1.55.*; (sys_platform == "win32" or sys_platform == "linux") and extra == "nvjitlink"
    cuda-toolkit[cccl,cudart,nvcc,nvjitlink,nvrtc] 12.0.1 depends on nvidia-nvjitlink-cu12==12.0.140.*; (sys_platform == "linux" or sys_platform == "win32") and extra == "nvjitlink"
    cuda-toolkit[cccl,cudart,nvcc,nvjitlink,nvrtc] 12.0.0 depends on nvidia-nvjitlink-cu12==12.0.76.*; (sys_platform == "win32" or sys_platform == "linux") and extra == "nvjitlink"

Additionally, some packages in these conflicts have no matching distributions available for your environment:
    cuda-toolkit
    nvidia-nvjitlink-cu12

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip to attempt to solve the dependency conflict

ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts

Looks like in recent successful runs on main, the jobs are falling back to torch==2.9.1 wheels even though 2.10.0 wheels are available: https://github.com/rapidsai/cugraph-gnn/actions/runs/22192581186/job/64185894306#step:13:838

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pushed eb6be78 adding a ceiling of torch<2.10.

Let's just see if that allows all the environments to be solved. If it does, maybe it's worth putting that ceiling in place temporarily and handling removing it as a follow-up issue / PR (to at least get nightly tests working again here).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oy, this is brutal.

CI is still failing here and I see pip backtracking over a bunch of different versions of cuda-pathfinder, cuda-toolkit, and RAPIDS libraries.

I'm still testing locally, let's see if I can find a different path through this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! Ok had an idea, I think this gets us further along here.

Setting that locally-downloaded torch file as a constraint means it enters pip's resolution algorithm pretty late in the process. Passing it as a requirement upfront gets it and all of its requirements into pip's solution early, which makes the search space small enough that instead of resolution-too-deep, we get a more informative solver error.

Pushed a commit doing that: 4e923d4

Locally, I got something like this:

The conflict is caused by:
    cudf-cu12 26.4.0a289 depends on cuda-toolkit==12.*
    cuml-cu12 26.4.0a78 depends on cuda-toolkit==12.*
    libcuml-cu12 26.4.0a78 depends on cuda-toolkit==12.*
    libraft-cu12 26.4.0a33 depends on cuda-toolkit==12.*
    cuda-toolkit[cublas,cufft,curand,cusolver,cusparse,nvjitlink] 12.9.1 depends on cuda-toolkit 12.9.1 (from https://pypi.nvidia.com/cuda-toolkit/cuda_toolkit-12.9.1-py2.py3-none-any.whl#sha256=0c8636dfacbecfe9867a949a211864f080a805bc54023ce4a361aa4e1fd8738b (from https://pypi.nvidia.com/cuda-toolkit/))
    torch 2.9.1+cu126 depends on nvidia-cublas-cu12==12.6.4.1; platform_system == "Linux"
    nvidia-cudnn-cu12 9.10.2.21 depends on nvidia-cublas-cu12
    nvidia-cusolver-cu12 11.7.1.2 depends on nvidia-cublas-cu12
    cuda-toolkit[cublas,cufft,curand,cusolver,cusparse,nvjitlink] 12.9.0 depends on nvidia-cublas-cu12==12.9.0.13.*; (sys_platform == "win32" or sys_platform == "linux") and extra == "cublas"
...
    cuda-toolkit[cublas,cufft,curand,cusolver,cusparse,nvjitlink] 12.6.1 depends on nvidia-cublas-cu12==12.6.1.4.*; (sys_platform == "linux" or sys_platform == "win32") and extra == "cublas"
...
    cuda-toolkit[cublas,cufft,curand,cusolver,cusparse,nvjitlink] 12.0.0 depends on nvidia-cublas-cu12==12.0.1.189.*; (sys_platform == "win32" or sys_platform == "linux") and extra == "cublas"

Additionally, some packages in these conflicts have no matching distributions available for your environment:
    cuda-toolkit
    nvidia-cublas-cu12

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip to attempt to solve the dependency conflict

ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts

This is saying "you asked me to install cuda-toolkit==12.9.1, but its nvidia-cublas-cu12 pin is incompatible with torch's nvidia-cublas-cu12==12.6.4.1".

We can work with this! Just need to figure out where that cuda-toolkit==12.9.1 is coming from.

Copy link
Member Author

@jameslamb jameslamb Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok here's an interesting clue... looks like in recent successful cugraph-pyg runs, CUDA torch might have been getting replaced with a CPU-only one from pypi.org:

...
  Downloading http://pip-cache.local.gha-runners.nvidia.com/packages/56/be/76eaa36c9cd032d3b01b001e2c5a05943df75f26211f68fae79e62f87734/torch-2.9.1-cp311-cp311-manylinux_2_28_x86_64.whl.metadata (30 kB)
...

(build link)

That would explain why I'm not able to get the environment to solve with similar versions as were found in those jobs!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a known issue (I was just late to it), where some CUDA 12 torch wheels were not installable alongside ANY cuda-toolkit wheels because they mixed == pins across CTK versions.

Documented that here: rapidsai/build-planning#255

I've pushed commits here pinning to specific known-compatible, CUDA variant torch wheels in wheel testing... hopefully that will work.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok lots of wheel tests are passing now! All pylibwholegraph and CUDA 12 cugraph-pyg tests are looking good (using the nightly matrix).

Looks like there was another issue hiding in here though... cugraph-pyg CUDA 13 wheel tests are failing like this:

/__w/cugraph-gnn/cugraph-gnn/python/cugraph-pyg/cugraph_pyg /__w/cugraph-gnn/cugraph-gnn
ImportError while loading conftest '/__w/cugraph-gnn/cugraph-gnn/python/cugraph-pyg/cugraph_pyg/tests/conftest.py'.
tests/conftest.py:9: in <module>
    from pylibcugraph.comms import (
/pyenv/versions/3.12.12/lib/python3.12/site-packages/pylibcugraph/__init__.py:15: in <module>
    import pylibcugraph.comms
/pyenv/versions/3.12.12/lib/python3.12/site-packages/pylibcugraph/comms/__init__.py:4: in <module>
    from .comms_wrapper import init_subcomms
E   ImportError: libcugraph.so: cannot open shared object file: No such file or directory
Error: Process completed with exit code 4.

(build link)

Ignore "No such file or directory", that's misleading (we'll fix that in rapidsai/build-planning#119 at some point).

The real issue is that libcugraph.so cannot be loaded. I've opened an issue about it here: rapidsai/cugraph#5443


if [[ "${CUDA_MAJOR}" == "12" ]]; then
PYTORCH_INDEX="https://download.pytorch.org/whl/cu126"
else
PYTORCH_INDEX="https://download.pytorch.org/whl/cu130"
fi
# ensure a CUDA variant of 'torch' is used
TORCH_WHEEL_DIR="$(mktemp -d)"
./ci/download-torch-wheels.sh "${TORCH_WHEEL_DIR}"

# notes:
#
Expand All @@ -30,12 +33,13 @@ fi
# its dependencies are available from pypi.org
#
rapids-pip-retry install \
-v \
--extra-index-url "${PYTORCH_INDEX}" \
--prefer-binary \
--constraint "${PIP_CONSTRAINT}" \
--extra-index-url 'https://pypi.nvidia.com' \
"${LIBWHOLEGRAPH_WHEELHOUSE}"/*.whl \
"$(echo "${PYLIBWHOLEGRAPH_WHEELHOUSE}"/pylibwholegraph_"${RAPIDS_PY_CUDA_SUFFIX}"*.whl)" \
"$(echo "${CUGRAPH_PYG_WHEELHOUSE}"/cugraph_pyg_"${RAPIDS_PY_CUDA_SUFFIX}"*.whl)[test]"
"$(echo "${CUGRAPH_PYG_WHEELHOUSE}"/cugraph_pyg_"${RAPIDS_PY_CUDA_SUFFIX}"*.whl)[test]" \
"${TORCH_WHEEL_DIR}"/torch-*.whl

# RAPIDS_DATASET_ROOT_DIR is used by test scripts
export RAPIDS_DATASET_ROOT_DIR="$(realpath datasets)"
Expand Down
26 changes: 15 additions & 11 deletions ci/test_wheel_pylibwholegraph.sh
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,7 @@
# SPDX-FileCopyrightText: Copyright (c) 2023-2026, NVIDIA CORPORATION.
# SPDX-License-Identifier: Apache-2.0

set -e # abort the script on error
set -o pipefail # piped commands propagate their error
set -E # ERR traps are inherited by subcommands
set -euo pipefail

# Delete system libnccl.so to ensure the wheel is used.
# (but only do this in CI, to avoid breaking local dev environments)
Expand All @@ -18,23 +16,29 @@ RAPIDS_PY_CUDA_SUFFIX="$(rapids-wheel-ctk-name-gen ${RAPIDS_CUDA_VERSION})"
LIBWHOLEGRAPH_WHEELHOUSE=$(RAPIDS_PY_WHEEL_NAME="libwholegraph_${RAPIDS_PY_CUDA_SUFFIX}" rapids-download-wheels-from-github cpp)
PYLIBWHOLEGRAPH_WHEELHOUSE=$(rapids-download-from-github "$(rapids-package-name "wheel_python" pylibwholegraph --stable --cuda "$RAPIDS_CUDA_VERSION")")

# determine pytorch source
if [[ "${CUDA_MAJOR}" == "12" ]]; then
PYTORCH_INDEX="https://download.pytorch.org/whl/cu126"
else
PYTORCH_INDEX="https://download.pytorch.org/whl/cu130"
fi
RAPIDS_TESTS_DIR=${RAPIDS_TESTS_DIR:-"${PWD}/test-results"}
RAPIDS_COVERAGE_DIR=${RAPIDS_COVERAGE_DIR:-"${PWD}/coverage-results"}
mkdir -p "${RAPIDS_TESTS_DIR}" "${RAPIDS_COVERAGE_DIR}"

# generate constraints, accounting for 'oldest' and 'latest' dependencies
rapids-dependency-file-generator \
--output requirements \
--file-key "test_pylibwholegraph" \
--matrix "cuda=${RAPIDS_CUDA_VERSION%.*};arch=$(arch);py=${RAPIDS_PY_VERSION};dependencies=${RAPIDS_DEPENDENCIES};include_torch_extra_index=false" \
| tee "${PIP_CONSTRAINT}"

# ensure a CUDA variant of 'torch' is used
TORCH_WHEEL_DIR="$(mktemp -d)"
./ci/download-torch-wheels.sh "${TORCH_WHEEL_DIR}"

# echo to expand wildcard before adding `[extra]` requires for pip
rapids-logger "Installing Packages"
rapids-pip-retry install \
--extra-index-url ${PYTORCH_INDEX} \
--prefer-binary \
--constraint "${PIP_CONSTRAINT}" \
"$(echo "${PYLIBWHOLEGRAPH_WHEELHOUSE}"/pylibwholegraph*.whl)[test]" \
"${LIBWHOLEGRAPH_WHEELHOUSE}"/*.whl \
'torch>=2.3'
"${TORCH_WHEEL_DIR}"/torch-*.whl

rapids-logger "pytest pylibwholegraph"
cd python/pylibwholegraph/pylibwholegraph/tests
Expand Down
2 changes: 1 addition & 1 deletion conda/recipes/cugraph-pyg/recipe.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ requirements:
# This is intentionally spelled 'pytorch' (not 'pytorch-gpu' and not using build string selectors)
# because we want it to be possible to at least install `cugraph-pyg` in an environment without a GPU,
# to support use cases like building container images.
- pytorch >=2.3
- pytorch >=2.4
- pytorch_geometric >=2.5,<2.8

tests:
Expand Down
105 changes: 82 additions & 23 deletions dependencies.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -60,12 +60,11 @@ files:
- depends_on_cudf
- depends_on_pytorch
- depends_on_cuml
- depends_on_ogb
- depends_on_sentence_transformers
- py_version
- test_python_common
- depends_on_pylibwholegraph
- depends_on_cugraph_pyg
- test_python_cugraph_pyg
test_pylibwholegraph:
output: none
includes:
Expand All @@ -76,6 +75,10 @@ files:
- test_python_common
- depends_on_pylibwholegraph
- test_python_pylibwholegraph
torch_only:
output: none
includes:
- depends_on_pytorch
py_build_libwholegraph:
output: pyproject
pyproject_dir: python/libwholegraph
Expand Down Expand Up @@ -135,6 +138,7 @@ files:
table: project.optional-dependencies
key: test
includes:
- depends_on_pytorch
- test_python_common
- test_python_pylibwholegraph
py_build_cugraph_pyg:
Expand Down Expand Up @@ -324,6 +328,10 @@ dependencies:
- *cmake_ver
test_python_common:
common:
- output_types: [conda]
packages:
- torchdata
- pydantic
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just moving this here so depends_on_pytorch only ever contains torch / pytorch.

This test_python_common group is used everywhere that depends_on_pytorch is.

- output_types: [conda, pyproject, requirements]
packages:
- pytest<9.0.0
Expand All @@ -335,6 +343,7 @@ dependencies:
- output_types: [conda, pyproject, requirements]
packages:
- ogb
# for MovieLens example
- sentence-transformers
test_python_pylibwholegraph:
common:
Expand All @@ -343,45 +352,106 @@ dependencies:
- pytest-forked
- scipy
depends_on_pytorch:
common:
- output_types: [conda]
packages:
- torchdata
- pydantic
specific:
- output_types: [requirements]
matrices:
# If 'include_torch_extra_index=false' is passed, avoid these --extra-index-url.
# (useful in CI scripts where we want to tightly control which indices 'pip' uses).
- matrix:
cuda: "12.*"
include_torch_extra_index: "false"
packages:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rapids-dependency-file-generator uses the first matching matrix (see https://github.com/rapidsai/dependency-file-generator?tab=readme-ov-file#how-dependency-lists-are-merged).

This will only affect cases where include_torch_extra_index=false is passed (as in CI here). Other cases (like RAPIDS devcontainers) will fall through to othe groups that pull in --extra-index-url lines.

So this should not break any other uses of this file.

# otherwise, choose a CUDA-version-specific index
- matrix:
cuda: "12.[234]"
packages:
# oldest torch: 2.4.0+cu124
- &oldest_torch_index_cu12 --extra-index-url=https://download.pytorch.org/whl/cu124
- matrix:
cuda: "12.[56]"
packages:
# oldest torch: 2.6.0+cu126
- --extra-index-url=https://download.pytorch.org/whl/cu126
- matrix:
cuda: "12.[78]"
packages:
# oldest torch: 2.7.0+cu128
- --extra-index-url=https://download.pytorch.org/whl/cu128
- matrix:
cuda: "12.*"
packages:
# oldest torch: 2.8.0+cu129
- &latest_torch_index_cu12 --extra-index-url=https://download.pytorch.org/whl/cu129
- matrix:
cuda: "13.*"
packages:
- --extra-index-url=https://download.pytorch.org/whl/cu130
# oldest torch: 2.9.0+cu130
- &torch_index_cu13 --extra-index-url=https://download.pytorch.org/whl/cu130
- matrix:
packages:
# For pyproject.toml (and therefore wheel metadata), avoid --extra-index-url
# and use pins that don't have CUDA-specific version modifiers (so installing alongside
# CPU-only torch is technically possible).
#
# --extra-index-url for requirements is handled by other lists here.
- output_types: [requirements, pyproject]
matrices:
- matrix:
cuda: "12.*"
packages:
- torch>=2.3
- torch>=2.4
- matrix:
cuda: "13.*"
packages:
- &pytorch_pip torch>=2.9.0
- matrix:
packages:
- *pytorch_pip
# For [requirements], include --extra-index-url and CUDA-specific version modifiers
# to ensure we get CUDA builds at test time.
- output_types: [requirements]
matrices:
- matrix:
cuda: "12.*"
dependencies: "oldest"
require_gpu: "true"
packages:
- *oldest_torch_index_cu12
- torch==2.4.0+cu124
- matrix:
cuda: "13.*"
dependencies: "oldest"
require_gpu: "true"
packages:
- *torch_index_cu13
- torch==2.9.0+cu130
- matrix:
cuda: "12.*"
require_gpu: "true"
packages:
- *latest_torch_index_cu12
- torch==2.8.0+cu129
- matrix:
cuda: "13.*"
require_gpu: "true"
packages:
- *torch_index_cu13
- torch==2.10.0+cu130
# Nothing above matches, don't add a CUDA-specific 'torch' requirement.
#
# This keeps these tight pins out of [test] extras in wheels.
#
# Also useful for cases like RAPIDS DLFW builds, where 'torch' is provided a different
# way and we want to avoid installing it from external repos.
- matrix:
packages:
- output_types: [conda]
matrices:
# Prevent fallback to CPU-only pytorch when we want a CUDA variant.
- matrix:
require_gpu: "true"
packages:
- pytorch-gpu
# Default to falling back to whatever 'pytorch' is pulled in via cugraph-pyg's dependencies.
- pytorch-gpu >=2.4
# Default to falling back to whatever 'pytorch' is pulled in via cugraph-pyg's / pylibwholegraph's dependencies.
- matrix:
packages:
depends_on_nccl:
Expand All @@ -399,17 +469,6 @@ dependencies:
- nvidia-nccl-cu12>=2.19
- matrix:
packages:
depends_on_ogb:
common:
- output_types: [conda, requirements, pyproject]
packages:
- ogb
# for MovieLens example
depends_on_sentence_transformers:
common:
- output_types: [conda, requirements, pyproject]
packages:
- sentence-transformers
depends_on_pyg:
common:
- output_types: [conda]
Expand Down
1 change: 1 addition & 0 deletions python/pylibwholegraph/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ test = [
"pytest-xdist",
"pytest<9.0.0",
"scipy",
"torch>=2.9.0",
] # This list was generated by `rapids-dependency-file-generator`. To make changes, edit ../../dependencies.yaml and run `rapids-dependency-file-generator`.

[tool.rapids-build-backend]
Expand Down
Loading