Skip to content

wheels CI: stricter torch index selection, test oldest versions of dependencies#413

Open
jameslamb wants to merge 17 commits intorapidsai:mainfrom
jameslamb:fix/index-selection
Open

wheels CI: stricter torch index selection, test oldest versions of dependencies#413
jameslamb wants to merge 17 commits intorapidsai:mainfrom
jameslamb:fix/index-selection

Conversation

@jameslamb
Copy link
Member

@jameslamb jameslamb commented Feb 23, 2026

Fixes #410

There, some nightly wheels tests were failing because CUDA 13 packages were being installed but testing against CUDA 12 pylibwholegraph packages. This fixes that, along with some other improvements to wheel testing:

  • ensures CUDA variants of torch are always installed (no fallback to pypi.org CPU-only packages)
  • ensures the correct torch index (based on CUDA major version) is used
  • adds coverage of "oldest" dependencies group, to check that lower bounds on dependencies are correct

@copy-pr-bot

This comment was marked as resolved.

@jameslamb jameslamb added improvement Improves an existing functionality non-breaking Introduces a non-breaking change labels Feb 23, 2026
# (useful in CI scripts where we want to tightly which indices 'pip' uses).
- matrix:
include_torch_extra_index: "false"
packages:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rapids-dependency-file-generator uses the first matching matrix (see https://github.com/rapidsai/dependency-file-generator?tab=readme-ov-file#how-dependency-lists-are-merged).

This will only affect cases where include_torch_extra_index=false is passed (as in CI here). Other cases (like RAPIDS devcontainers) will fall through to othe groups that pull in --extra-index-url lines.

So this should not break any other uses of this file.

build_type: pull-request
script: ci/test_wheel_pylibwholegraph.sh
matrix_filter: map(select(.ARCH == "amd64"))
matrix_type: 'nightly'
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
matrix_type: 'nightly'

TODO: revert this. Just added here for testing, to confirm this will fix the issues we've been seeing in nightlies.

build_type: pull-request
script: ci/test_wheel_cugraph-pyg.sh
matrix_filter: map(select(.ARCH == "amd64"))
matrix_type: 'nightly'
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
matrix_type: 'nightly'

Revert before merging.

@jameslamb jameslamb changed the title WIP: [NOT READY FOR REVIEW] wheels CI: stricter torch index selection, test oldest versions of dependencies wheels CI: stricter torch index selection, test oldest versions of dependencies Feb 23, 2026
@jameslamb jameslamb marked this pull request as ready for review February 23, 2026 22:57
@jameslamb jameslamb requested review from a team as code owners February 23, 2026 22:57
@greptile-apps

This comment was marked as resolved.

greptile-apps[bot]

This comment was marked as resolved.

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
greptile-apps[bot]

This comment was marked as off-topic.

greptile-apps[bot]

This comment was marked as resolved.

Co-authored-by: Bradley Dice <bdice@bradleydice.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
greptile-apps[bot]

This comment was marked as resolved.

greptile-apps[bot]

This comment was marked as off-topic.

- output_types: [conda]
packages:
- torchdata
- pydantic
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just moving this here so depends_on_pytorch only ever contains torch / pytorch.

This test_python_common group is used everywhere that depends_on_pytorch is.

Comment on lines +389 to +390
# 2.6.0 is the oldest version on https://download.pytorch.org/whl/cu126 with CUDA wheels
- torch==2.6.0
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexbarghi-nv see this note.

There aren't CUDA 12 wheels available for PyTorch older than 2.6.0.

pip download \
  --isolated \
  --no-deps \
  --index-url=https://download.pytorch.org/whl/cu126 \
  'torch==2.3.0'

# ERROR: Could not find a version that satisfies the requirement torch==2.3.0
# (from versions: 2.6.0+cu126, 2.7.0+cu126, 2.7.1+cu126, 2.8.0+cu126, 2.9.0+cu126, 2.9.1+cu126, 2.10.0+cu126)

Do you want to bump the floor in dependency metadata here to >=2.6.0? Or to leave it at >=2.3 so that these libraries are still installable alongside older PyTorch releases (for example, if people build PyTorch 2.4 from source)?

Your call.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would vote for bumping the floor to >=2.6.0. It's a little over a year old at this point. https://github.com/pytorch/pytorch/releases/tag/v2.6.0

--output requirements \
--file-key "test_cugraph_pyg" \
--matrix "cuda=${RAPIDS_CUDA_VERSION%.*};arch=$(arch);py=${RAPIDS_PY_VERSION};dependencies=${RAPIDS_DEPENDENCIES};include_torch_extra_index=false" \
| tee "${PIP_CONSTRAINT}"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a new one for me 😭

× Dependency resolution exceeded maximum depth
╰─> Pip cannot resolve the current dependencies as the dependency graph is too complex for pip to solve efficiently.

hint: Try adding lower bounds to constrain your dependencies, for example: 'package>=2.0.0' instead of just 'package'.

(build link)

All cugraph-pyg wheel tests are failing like this, not only the oldest dependencies one.

Example constraints file (not including all the requirements of all these packages):

--extra-index-url=https://pypi.anaconda.org/rapidsai-wheels-nightly/simple
--extra-index-url=https://pypi.nvidia.com/
cudf==26.4.*,>=0.0.0a0
cugraph==26.4.*,>=0.0.0a0
cuml==26.4.*,>=0.0.0a0
ogb
pylibwholegraph==26.4.*,>=0.0.0a0
pytest-benchmark
pytest-cov
pytest-xdist
pytest<9.0.0
sentence-transformers
torch>=2.9.0

I'll try that advice from the error message, let's see if it'll help us get a little farther.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting further with local testing, adding more pins to force out some solver errors.

test code (click me)
docker run \
    --rm \
    --gpus all \
    --env GH_TOKEN=$(gh auth token) \
    --env RAPIDS_BUILD_TYPE="pull-request" \
    --env RAPIDS_REPOSITORY="rapidsai/cugraph-gnn" \
    -v $(pwd):/opt/work \
    -w /opt/work \
    -it rapidsai/citestwheel:26.04-cuda12.9.1-rockylinux8-py3.11 \
    bash

source rapids-init-pip

package_name="cugraph-pyg"

RAPIDS_PY_CUDA_SUFFIX="$(rapids-wheel-ctk-name-gen ${RAPIDS_CUDA_VERSION})"

# Download the libwholegraph, pylibwholegraph, and cugraph-pyg built in the previous step
COMMIT_ID=843296e5e99ebb017e3a4a63b046abfc672ce279

LIBWHOLEGRAPH_WHEELHOUSE=$(
  RAPIDS_PY_WHEEL_NAME="libwholegraph_${RAPIDS_PY_CUDA_SUFFIX}" rapids-get-pr-artifact cugraph-gnn 413 cpp wheel "${COMMIT_ID}"
)
PYLIBWHOLEGRAPH_WHEELHOUSE=$(
  rapids-get-pr-artifact cugraph-gnn 413 python wheel --pkg_name pylibwholegraph --stable "${COMMIT_ID}"
)
CUGRAPH_PYG_WHEELHOUSE=$(
    RAPIDS_PY_WHEEL_NAME="cugraph-pyg_cu12" RAPIDS_PY_WHEEL_PURE="1" rapids-get-pr-artifact cugraph-gnn 413 python wheel "${COMMIT_ID}"
)

# generate constraints, accounting for 'oldest' and 'latest' dependencies
rapids-dependency-file-generator \
    --output requirements \
    --file-key "test_cugraph_pyg" \
    --matrix "cuda=${RAPIDS_CUDA_VERSION%.*};arch=$(arch);py=${RAPIDS_PY_VERSION};dependencies=${RAPIDS_DEPENDENCIES};include_torch_extra_index=false" \
| tee "${PIP_CONSTRAINT}"

# ensure a CUDA variant of 'torch' is used
./ci/download-torch-wheels.sh

# notes:
#
#   * echo to expand wildcard before adding `[extra]` requires for pip
#   * '--extra-index-url pypi.nvidia.com' can be removed when 'cugraph' and
#     its dependencies are available from pypi.org
#
rapids-pip-retry install \
    --dry-run \
    -v \
    --constraint "${PIP_CONSTRAINT}" \
    --extra-index-url 'https://pypi.nvidia.com' \
    "${LIBWHOLEGRAPH_WHEELHOUSE}"/*.whl \
    "$(echo "${PYLIBWHOLEGRAPH_WHEELHOUSE}"/pylibwholegraph_"${RAPIDS_PY_CUDA_SUFFIX}"*.whl)" \
    "$(echo "${CUGRAPH_PYG_WHEELHOUSE}"/cugraph_pyg_"${RAPIDS_PY_CUDA_SUFFIX}"*.whl)[test]" \
    "cuda-bindings[all]==12.9.4" \
    "cudf-cu12==26.4.0a289" \
    "cugraph-cu12==26.4.0a30" \
    "cuml-cu12==26.4.0a77" \
    "dask-cuda==26.4.0a18" \
    "distributed-ucxx-cu12==0.49.0a20" \
    "libcudf-cu12==26.4.0a289" \
    "libcugraph-cu12==26.4.0a30" \
    "libcuml-cu12==26.4.0a77" \
    "libucxx-cu12==0.49.0a20" \
    "numba-cuda[cu12]==0.27.0" \
    "pylibcugraph-cu12==26.4.0a30" \
    "pylibcudf-cu12==26.4.0a289" \
    "pylibraft-cu12==26.4.0a34" \
    "raft-dask-cu12==26.4.0a33" \
    "rapids-dask-dependency==26.4.0a7" \
    "rmm-cu12==26.4.0a30" \
    "ucxx-cu12==0.49.0a20"

I think torch's very tight pinnings are leading to these expensive solves.

TORCH_WHEEL_DIR=$(mktemp -d)
rapids-pip-retry download \
  --prefer-binary \
  --no-deps \
  -d "${TORCH_WHEEL_DIR}" \
  --index-url "https://download.pytorch.org/whl/cu126" \
  'torch==2.10'

pushd "${TORCH_WHEEL_DIR}"
pip install pkginfo
$ pkginfo --json *.whl
    "cuda-bindings==12.9.4; platform_system == \"Linux\"",
    "nvidia-cuda-nvrtc-cu12==12.6.77; platform_system == \"Linux\"",
    "nvidia-cuda-runtime-cu12==12.6.77; platform_system == \"Linux\"",
    "nvidia-cuda-cupti-cu12==12.6.80; platform_system == \"Linux\"",
    "nvidia-cudnn-cu12==9.10.2.21; platform_system == \"Linux\"",
    "nvidia-cublas-cu12==12.6.4.1; platform_system == \"Linux\"",
    "nvidia-cufft-cu12==11.3.0.4; platform_system == \"Linux\"",
    "nvidia-curand-cu12==10.3.7.77; platform_system == \"Linux\"",
    "nvidia-cusolver-cu12==11.7.1.2; platform_system == \"Linux\"",
    "nvidia-cusparse-cu12==12.5.4.2; platform_system == \"Linux\"",
    "nvidia-cusparselt-cu12==0.7.1; platform_system == \"Linux\"",
    "nvidia-nccl-cu12==2.27.5; platform_system == \"Linux\"",
    "nvidia-nvshmem-cu12==3.4.5; platform_system == \"Linux\"",
    "nvidia-nvtx-cu12==12.6.77; platform_system == \"Linux\"",
    "nvidia-nvjitlink-cu12==12.6.85; platform_system == \"Linux\"",
    "nvidia-cufile-cu12==1.11.1.6; platform_system == \"Linux\"",
    "triton==3.6.0; platform_system == \"Linux\"",

Pinning to the latest versions of RAPIDS nightlies as well as a few other packages is yielding solver errors like this:

ERROR: Cannot install cuda-bindings[all]==12.9.4, cuda-toolkit[cccl,cudart,nvcc,nvjitlink,nvrtc]==12.0.0, cuda-toolkit[cccl,cudart,nvcc,nvjitlink,nvrtc]==12.0.1, cuda-toolkit[cccl,cudart,nvcc,nvjitlink,nvrtc]==12.1.0, cuda-toolkit[cccl,cudart,nvcc,nvjitlink,nvrtc]==12.1.1, cuda-toolkit[cccl,cudart,nvcc,nvjitlink,nvrtc]==12.2.0, cuda-toolkit[cccl,cudart,nvcc,nvjitlink,nvrtc]==12.2.1, cuda-toolkit[cccl,cudart,nvcc,nvjitlink,nvrtc]==12.2.2, cuda-toolkit[cccl,cudart,nvcc,nvjitlink,nvrtc]==12.9.1, cudf-cu12==26.4.0a289, cuml-cu12==26.4.0a77, libcuml-cu12==26.4.0a77 and numba-cuda[cu12]==0.27.0 because these package versions have conflicting dependencies.

The conflict is caused by:
    cudf-cu12 26.4.0a289 depends on cuda-toolkit==12.*
    cuml-cu12 26.4.0a77 depends on cuda-toolkit==12.*
    libcuml-cu12 26.4.0a77 depends on cuda-toolkit==12.*
    numba-cuda[cu12] 0.27.0 depends on cuda-toolkit==12.*; extra == "cu12"
    cuda-toolkit[cccl,cudart,nvcc,nvjitlink,nvrtc] 12.9.1 depends on cuda-toolkit 12.9.1 (from https://pypi.nvidia.com/cuda-toolkit/cuda_toolkit-12.9.1-py2.py3-none-any.whl#sha256=0c8636dfacbecfe9867a949a211864f080a805bc54023ce4a361aa4e1fd8738b (from https://pypi.nvidia.com/cuda-toolkit/))
    cuda-bindings[all] 12.9.4 depends on nvidia-nvjitlink-cu12>=12.3; extra == "all"
    cuda-toolkit[cccl,cudart,nvcc,nvjitlink,nvrtc] 12.2.2 depends on nvidia-nvjitlink-cu12==12.2.140.*; (sys_platform == "win32" or sys_platform == "linux") and extra == "nvjitlink"
    cuda-toolkit[cccl,cudart,nvcc,nvjitlink,nvrtc] 12.2.1 depends on nvidia-nvjitlink-cu12==12.2.128.*; (sys_platform == "win32" or sys_platform == "linux") and extra == "nvjitlink"
    cuda-toolkit[cccl,cudart,nvcc,nvjitlink,nvrtc] 12.2.0 depends on nvidia-nvjitlink-cu12==12.2.91.*; (sys_platform == "linux" or sys_platform == "win32") and extra == "nvjitlink"
    cuda-toolkit[cccl,cudart,nvcc,nvjitlink,nvrtc] 12.1.1 depends on nvidia-nvjitlink-cu12==12.1.105.*; (sys_platform == "win32" or sys_platform == "linux") and extra == "nvjitlink"
    cuda-toolkit[cccl,cudart,nvcc,nvjitlink,nvrtc] 12.1.0 depends on nvidia-nvjitlink-cu12==12.1.55.*; (sys_platform == "win32" or sys_platform == "linux") and extra == "nvjitlink"
    cuda-toolkit[cccl,cudart,nvcc,nvjitlink,nvrtc] 12.0.1 depends on nvidia-nvjitlink-cu12==12.0.140.*; (sys_platform == "linux" or sys_platform == "win32") and extra == "nvjitlink"
    cuda-toolkit[cccl,cudart,nvcc,nvjitlink,nvrtc] 12.0.0 depends on nvidia-nvjitlink-cu12==12.0.76.*; (sys_platform == "win32" or sys_platform == "linux") and extra == "nvjitlink"

Additionally, some packages in these conflicts have no matching distributions available for your environment:
    cuda-toolkit
    nvidia-nvjitlink-cu12

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip to attempt to solve the dependency conflict

ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts

Looks like in recent successful runs on main, the jobs are falling back to torch==2.9.1 wheels even though 2.10.0 wheels are available: https://github.com/rapidsai/cugraph-gnn/actions/runs/22192581186/job/64185894306#step:13:838

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pushed eb6be78 adding a ceiling of torch<2.10.

Let's just see if that allows all the environments to be solved. If it does, maybe it's worth putting that ceiling in place temporarily and handling removing it as a follow-up issue / PR (to at least get nightly tests working again here).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oy, this is brutal.

CI is still failing here and I see pip backtracking over a bunch of different versions of cuda-pathfinder, cuda-toolkit, and RAPIDS libraries.

I'm still testing locally, let's see if I can find a different path through this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! Ok had an idea, I think this gets us further along here.

Setting that locally-downloaded torch file as a constraint means it enters pip's resolution algorithm pretty late in the process. Passing it as a requirement upfront gets it and all of its requirements into pip's solution early, which makes the search space small enough that instead of resolution-too-deep, we get a more informative solver error.

Pushed a commit doing that: 4e923d4

Locally, I got something like this:

The conflict is caused by:
    cudf-cu12 26.4.0a289 depends on cuda-toolkit==12.*
    cuml-cu12 26.4.0a78 depends on cuda-toolkit==12.*
    libcuml-cu12 26.4.0a78 depends on cuda-toolkit==12.*
    libraft-cu12 26.4.0a33 depends on cuda-toolkit==12.*
    cuda-toolkit[cublas,cufft,curand,cusolver,cusparse,nvjitlink] 12.9.1 depends on cuda-toolkit 12.9.1 (from https://pypi.nvidia.com/cuda-toolkit/cuda_toolkit-12.9.1-py2.py3-none-any.whl#sha256=0c8636dfacbecfe9867a949a211864f080a805bc54023ce4a361aa4e1fd8738b (from https://pypi.nvidia.com/cuda-toolkit/))
    torch 2.9.1+cu126 depends on nvidia-cublas-cu12==12.6.4.1; platform_system == "Linux"
    nvidia-cudnn-cu12 9.10.2.21 depends on nvidia-cublas-cu12
    nvidia-cusolver-cu12 11.7.1.2 depends on nvidia-cublas-cu12
    cuda-toolkit[cublas,cufft,curand,cusolver,cusparse,nvjitlink] 12.9.0 depends on nvidia-cublas-cu12==12.9.0.13.*; (sys_platform == "win32" or sys_platform == "linux") and extra == "cublas"
...
    cuda-toolkit[cublas,cufft,curand,cusolver,cusparse,nvjitlink] 12.6.1 depends on nvidia-cublas-cu12==12.6.1.4.*; (sys_platform == "linux" or sys_platform == "win32") and extra == "cublas"
...
    cuda-toolkit[cublas,cufft,curand,cusolver,cusparse,nvjitlink] 12.0.0 depends on nvidia-cublas-cu12==12.0.1.189.*; (sys_platform == "win32" or sys_platform == "linux") and extra == "cublas"

Additionally, some packages in these conflicts have no matching distributions available for your environment:
    cuda-toolkit
    nvidia-cublas-cu12

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip to attempt to solve the dependency conflict

ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts

This is saying "you asked me to install cuda-toolkit==12.9.1, but its nvidia-cublas-cu12 pin is incompatible with torch's nvidia-cublas-cu12==12.6.4.1".

We can work with this! Just need to figure out where that cuda-toolkit==12.9.1 is coming from.

Copy link
Member Author

@jameslamb jameslamb Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok here's an interesting clue... looks like in recent successful cugraph-pyg runs, CUDA torch might have been getting replaced with a CPU-only one from pypi.org:

...
  Downloading http://pip-cache.local.gha-runners.nvidia.com/packages/56/be/76eaa36c9cd032d3b01b001e2c5a05943df75f26211f68fae79e62f87734/torch-2.9.1-cp311-cp311-manylinux_2_28_x86_64.whl.metadata (30 kB)
...

(build link)

That would explain why I'm not able to get the environment to solve with similar versions as were found in those jobs!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a known issue (I was just late to it), where some CUDA 12 torch wheels were not installable alongside ANY cuda-toolkit wheels because they mixed == pins across CTK versions.

Documented that here: rapidsai/build-planning#255

I've pushed commits here pinning to specific known-compatible, CUDA variant torch wheels in wheel testing... hopefully that will work.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok lots of wheel tests are passing now! All pylibwholegraph and CUDA 12 cugraph-pyg tests are looking good (using the nightly matrix).

Looks like there was another issue hiding in here though... cugraph-pyg CUDA 13 wheel tests are failing like this:

/__w/cugraph-gnn/cugraph-gnn/python/cugraph-pyg/cugraph_pyg /__w/cugraph-gnn/cugraph-gnn
ImportError while loading conftest '/__w/cugraph-gnn/cugraph-gnn/python/cugraph-pyg/cugraph_pyg/tests/conftest.py'.
tests/conftest.py:9: in <module>
    from pylibcugraph.comms import (
/pyenv/versions/3.12.12/lib/python3.12/site-packages/pylibcugraph/__init__.py:15: in <module>
    import pylibcugraph.comms
/pyenv/versions/3.12.12/lib/python3.12/site-packages/pylibcugraph/comms/__init__.py:4: in <module>
    from .comms_wrapper import init_subcomms
E   ImportError: libcugraph.so: cannot open shared object file: No such file or directory
Error: Process completed with exit code 4.

(build link)

Ignore "No such file or directory", that's misleading (we'll fix that in rapidsai/build-planning#119 at some point).

The real issue is that libcugraph.so cannot be loaded. I've opened an issue about it here: rapidsai/cugraph#5443

greptile-apps[bot]

This comment was marked as off-topic.

greptile-apps[bot]

This comment was marked as off-topic.

greptile-apps[bot]

This comment was marked as off-topic.

greptile-apps[bot]

This comment was marked as resolved.

greptile-apps[bot]

This comment was marked as resolved.

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
greptile-apps[bot]

This comment was marked as resolved.

jameslamb and others added 2 commits February 25, 2026 12:11
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

8 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

rapids-bot bot pushed a commit that referenced this pull request Mar 4, 2026
Nightly CI here has been failing for a couple weeks, and the root cause is "some jobs are installing incorrect `torch` wheels".

That's tracked in #410 and being worked on in #413. That work unfortunately uncovered some other significant compatibility issues that will require RAPIDS-wide fixes:

* rapidsai/build-planning#256
* rapidsai/build-planning#257
* rapidsai/cugraph#5443

As a short-term patch, this proposes allowing `cugraph-gnn` nightlies to fail for a few more weeks, so regular PR CI can be unblocked while we focus on the more permanent fix.

Targeting the more permanent fix (and reverting this back to 7 days) for the 26.04 release (so over the next few weeks).

Authors:
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - Alex Barghi (https://github.com/alexbarghi-nv)
  - Gil Forsyth (https://github.com/gforsyth)

URL: #419
rapids-bot bot pushed a commit to rapidsai/cugraph that referenced this pull request Mar 4, 2026
Contributes to #5443

Related to rapidsai/build-planning#143 

`libcugraph.so` dynamically links to several CUDA Toolkit libraries

```console
$ ldd /pyenv/versions/3.11.14/lib/python3.11/site-packages/libcugraph/lib64/libcugraph.so
        ...
        libcusolver.so.12 => /usr/local/cuda/lib64/libcusolver.so.12 (0x00007c616aba7000)
        libcublas.so.13 => /usr/local/cuda/lib64/libcublas.so.13 (0x00007c61675d5000)
        libcublasLt.so.13 => /usr/local/cuda/lib64/libcublasLt.so.13 (0x00007c6143c83000)
        libcusparse.so.12 => /usr/local/cuda/lib64/libcusparse.so.12 (0x00007c613987c000)
        libcurand.so.10 => /usr/local/cuda/lib64/libcurand.so.10 (0x00007c6131161000)
        ...
        libnvJitLink.so.13 => /usr/local/cuda/lib64/libnvJitLink.so.13 (0x00007c612af5f000)
        ...
```

This proposes getting them from `cuda-toolkit` wheels, instead of system installations.

## Notes for Reviewers

### Benefits of this change

* reduces the risk of multiple copies of the same library being loaded
* allows the use of Python package versioning to manage compatibility
* consistency with other RAPIDS libraries (see rapidsai/build-planning#35)
* reduces the risk of runtime issues with other libraries that use CTK wheels, like `torch` (rapidsai/cugraph-gnn#413 (comment))

Authors:
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - Kyle Edwards (https://github.com/KyleFromNVIDIA)

URL: #5444
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improves an existing functionality non-breaking Introduces a non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] pylibwholegraph: nightly wheel tests failing: "driver on your system is too old"

2 participants