-
Notifications
You must be signed in to change notification settings - Fork 36
wheels CI: stricter torch index selection, test oldest versions of dependencies #413
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 9 commits
974c79a
d705541
7e15156
43efc44
34b8853
25e4fd7
12ec8ca
7a5dd11
843296e
d246b1a
eb6be78
4e923d4
c054b60
29e1769
c39b7b1
fef3fe1
333d00b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||
|---|---|---|---|---|
|
|
@@ -257,6 +257,7 @@ jobs: | |||
| build_type: pull-request | ||||
| script: ci/test_wheel_pylibwholegraph.sh | ||||
| matrix_filter: map(select(.ARCH == "amd64")) | ||||
| matrix_type: 'nightly' | ||||
| wheel-build-cugraph-pyg: | ||||
| needs: checks | ||||
| secrets: inherit | ||||
|
|
@@ -279,3 +280,4 @@ jobs: | |||
| build_type: pull-request | ||||
| script: ci/test_wheel_cugraph-pyg.sh | ||||
| matrix_filter: map(select(.ARCH == "amd64")) | ||||
| matrix_type: 'nightly' | ||||
|
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
Revert before merging. |
||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -40,6 +40,8 @@ wheels/ | |
| wheelhouse/ | ||
| _skbuild/ | ||
| cufile.log | ||
| *.tar.gz | ||
| *.whl | ||
|
|
||
| ## Patching | ||
| *.diff | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,37 @@ | ||
| #!/bin/bash | ||
| # SPDX-FileCopyrightText: Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved. | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
|
|
||
| # [description] | ||
| # | ||
| # Downloads a CUDA variant of 'torch' from the correct index, based on CUDA major version. | ||
| # | ||
| # This exists to avoid using 'pip --extra-index-url', which could allow for CPU-only 'torch' | ||
| # to be downloaded from pypi.org. | ||
| # | ||
|
|
||
| set -e -u -o pipefail | ||
|
|
||
| # Ensure CUDA-enabled 'torch' packages are always used. | ||
| # | ||
| # Downloading + adding the downloaded file to the constraint forces the use of this | ||
| # package, so we don't accidentally end up with a CPU-only 'torch' from 'pypi.org' | ||
| # (which can happen because --extra-index-url doesn't imply a priority). | ||
| rapids-logger "Downloading 'torch' wheel" | ||
| CUDA_MAJOR="${RAPIDS_CUDA_VERSION%%.*}" | ||
| if [[ "${CUDA_MAJOR}" == "12" ]]; then | ||
| PYTORCH_INDEX="https://download.pytorch.org/whl/cu126" | ||
| else | ||
| PYTORCH_INDEX="https://download.pytorch.org/whl/cu130" | ||
| fi | ||
|
|
||
| TORCH_WHEEL_DIR=$(mktemp -d) | ||
| rapids-pip-retry download \ | ||
| --prefer-binary \ | ||
| --no-deps \ | ||
| -d "${TORCH_WHEEL_DIR}" \ | ||
| --constraint "${PIP_CONSTRAINT}" \ | ||
| --index-url "${PYTORCH_INDEX}" \ | ||
| 'torch' | ||
|
|
||
| echo "torch @ file://$(echo ${TORCH_WHEEL_DIR}/torch-*.whl)" >> "${PIP_CONSTRAINT}" | ||
jameslamb marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
jameslamb marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -15,13 +15,15 @@ LIBWHOLEGRAPH_WHEELHOUSE=$(RAPIDS_PY_WHEEL_NAME="libwholegraph_${RAPIDS_PY_CUDA_ | |
| PYLIBWHOLEGRAPH_WHEELHOUSE=$(rapids-download-from-github "$(rapids-package-name "wheel_python" pylibwholegraph --stable --cuda "$RAPIDS_CUDA_VERSION")") | ||
| CUGRAPH_PYG_WHEELHOUSE=$(RAPIDS_PY_WHEEL_NAME="${package_name}_${RAPIDS_PY_CUDA_SUFFIX}" RAPIDS_PY_WHEEL_PURE="1" rapids-download-wheels-from-github python) | ||
|
|
||
| CUDA_MAJOR="${RAPIDS_CUDA_VERSION%%.*}" | ||
| # generate constraints, accounting for 'oldest' and 'latest' dependencies | ||
| rapids-dependency-file-generator \ | ||
| --output requirements \ | ||
| --file-key "test_cugraph_pyg" \ | ||
| --matrix "cuda=${RAPIDS_CUDA_VERSION%.*};arch=$(arch);py=${RAPIDS_PY_VERSION};dependencies=${RAPIDS_DEPENDENCIES};include_torch_extra_index=false" \ | ||
| | tee "${PIP_CONSTRAINT}" | ||
|
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a new one for me 😭
All Example constraints file (not including all the requirements of all these packages): I'll try that advice from the error message, let's see if it'll help us get a little farther.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Getting further with local testing, adding more pins to force out some solver errors. test code (click me)docker run \
--rm \
--gpus all \
--env GH_TOKEN=$(gh auth token) \
--env RAPIDS_BUILD_TYPE="pull-request" \
--env RAPIDS_REPOSITORY="rapidsai/cugraph-gnn" \
-v $(pwd):/opt/work \
-w /opt/work \
-it rapidsai/citestwheel:26.04-cuda12.9.1-rockylinux8-py3.11 \
bash
source rapids-init-pip
package_name="cugraph-pyg"
RAPIDS_PY_CUDA_SUFFIX="$(rapids-wheel-ctk-name-gen ${RAPIDS_CUDA_VERSION})"
# Download the libwholegraph, pylibwholegraph, and cugraph-pyg built in the previous step
COMMIT_ID=843296e5e99ebb017e3a4a63b046abfc672ce279
LIBWHOLEGRAPH_WHEELHOUSE=$(
RAPIDS_PY_WHEEL_NAME="libwholegraph_${RAPIDS_PY_CUDA_SUFFIX}" rapids-get-pr-artifact cugraph-gnn 413 cpp wheel "${COMMIT_ID}"
)
PYLIBWHOLEGRAPH_WHEELHOUSE=$(
rapids-get-pr-artifact cugraph-gnn 413 python wheel --pkg_name pylibwholegraph --stable "${COMMIT_ID}"
)
CUGRAPH_PYG_WHEELHOUSE=$(
RAPIDS_PY_WHEEL_NAME="cugraph-pyg_cu12" RAPIDS_PY_WHEEL_PURE="1" rapids-get-pr-artifact cugraph-gnn 413 python wheel "${COMMIT_ID}"
)
# generate constraints, accounting for 'oldest' and 'latest' dependencies
rapids-dependency-file-generator \
--output requirements \
--file-key "test_cugraph_pyg" \
--matrix "cuda=${RAPIDS_CUDA_VERSION%.*};arch=$(arch);py=${RAPIDS_PY_VERSION};dependencies=${RAPIDS_DEPENDENCIES};include_torch_extra_index=false" \
| tee "${PIP_CONSTRAINT}"
# ensure a CUDA variant of 'torch' is used
./ci/download-torch-wheels.sh
# notes:
#
# * echo to expand wildcard before adding `[extra]` requires for pip
# * '--extra-index-url pypi.nvidia.com' can be removed when 'cugraph' and
# its dependencies are available from pypi.org
#
rapids-pip-retry install \
--dry-run \
-v \
--constraint "${PIP_CONSTRAINT}" \
--extra-index-url 'https://pypi.nvidia.com' \
"${LIBWHOLEGRAPH_WHEELHOUSE}"/*.whl \
"$(echo "${PYLIBWHOLEGRAPH_WHEELHOUSE}"/pylibwholegraph_"${RAPIDS_PY_CUDA_SUFFIX}"*.whl)" \
"$(echo "${CUGRAPH_PYG_WHEELHOUSE}"/cugraph_pyg_"${RAPIDS_PY_CUDA_SUFFIX}"*.whl)[test]" \
"cuda-bindings[all]==12.9.4" \
"cudf-cu12==26.4.0a289" \
"cugraph-cu12==26.4.0a30" \
"cuml-cu12==26.4.0a77" \
"dask-cuda==26.4.0a18" \
"distributed-ucxx-cu12==0.49.0a20" \
"libcudf-cu12==26.4.0a289" \
"libcugraph-cu12==26.4.0a30" \
"libcuml-cu12==26.4.0a77" \
"libucxx-cu12==0.49.0a20" \
"numba-cuda[cu12]==0.27.0" \
"pylibcugraph-cu12==26.4.0a30" \
"pylibcudf-cu12==26.4.0a289" \
"pylibraft-cu12==26.4.0a34" \
"raft-dask-cu12==26.4.0a33" \
"rapids-dask-dependency==26.4.0a7" \
"rmm-cu12==26.4.0a30" \
"ucxx-cu12==0.49.0a20"I think TORCH_WHEEL_DIR=$(mktemp -d)
rapids-pip-retry download \
--prefer-binary \
--no-deps \
-d "${TORCH_WHEEL_DIR}" \
--index-url "https://download.pytorch.org/whl/cu126" \
'torch==2.10'
pushd "${TORCH_WHEEL_DIR}"
pip install pkginfo$ pkginfo --json *.whl
"cuda-bindings==12.9.4; platform_system == \"Linux\"",
"nvidia-cuda-nvrtc-cu12==12.6.77; platform_system == \"Linux\"",
"nvidia-cuda-runtime-cu12==12.6.77; platform_system == \"Linux\"",
"nvidia-cuda-cupti-cu12==12.6.80; platform_system == \"Linux\"",
"nvidia-cudnn-cu12==9.10.2.21; platform_system == \"Linux\"",
"nvidia-cublas-cu12==12.6.4.1; platform_system == \"Linux\"",
"nvidia-cufft-cu12==11.3.0.4; platform_system == \"Linux\"",
"nvidia-curand-cu12==10.3.7.77; platform_system == \"Linux\"",
"nvidia-cusolver-cu12==11.7.1.2; platform_system == \"Linux\"",
"nvidia-cusparse-cu12==12.5.4.2; platform_system == \"Linux\"",
"nvidia-cusparselt-cu12==0.7.1; platform_system == \"Linux\"",
"nvidia-nccl-cu12==2.27.5; platform_system == \"Linux\"",
"nvidia-nvshmem-cu12==3.4.5; platform_system == \"Linux\"",
"nvidia-nvtx-cu12==12.6.77; platform_system == \"Linux\"",
"nvidia-nvjitlink-cu12==12.6.85; platform_system == \"Linux\"",
"nvidia-cufile-cu12==1.11.1.6; platform_system == \"Linux\"",
"triton==3.6.0; platform_system == \"Linux\"",Pinning to the latest versions of RAPIDS nightlies as well as a few other packages is yielding solver errors like this: Looks like in recent successful runs on
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've pushed eb6be78 adding a ceiling of Let's just see if that allows all the environments to be solved. If it does, maybe it's worth putting that ceiling in place temporarily and handling removing it as a follow-up issue / PR (to at least get nightly tests working again here).
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oy, this is brutal. CI is still failing here and I see I'm still testing locally, let's see if I can find a different path through this.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah! Ok had an idea, I think this gets us further along here. Setting that locally-downloaded Pushed a commit doing that: 4e923d4 Locally, I got something like this: This is saying "you asked me to install We can work with this! Just need to figure out where that
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok here's an interesting clue... looks like in recent successful That would explain why I'm not able to get the environment to solve with similar versions as were found in those jobs!
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a known issue (I was just late to it), where some CUDA 12 Documented that here: rapidsai/build-planning#255 I've pushed commits here pinning to specific known-compatible, CUDA variant
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok lots of wheel tests are passing now! All Looks like there was another issue hiding in here though... Ignore "No such file or directory", that's misleading (we'll fix that in rapidsai/build-planning#119 at some point). The real issue is that |
||
|
|
||
| if [[ "${CUDA_MAJOR}" == "12" ]]; then | ||
| PYTORCH_INDEX="https://download.pytorch.org/whl/cu126" | ||
| else | ||
| PYTORCH_INDEX="https://download.pytorch.org/whl/cu130" | ||
| fi | ||
| # ensure a CUDA variant of 'torch' is used | ||
| ./ci/download-torch-wheels.sh | ||
|
|
||
| # notes: | ||
| # | ||
|
|
@@ -31,7 +33,7 @@ fi | |
| # | ||
| rapids-pip-retry install \ | ||
| -v \ | ||
| --extra-index-url "${PYTORCH_INDEX}" \ | ||
| --constraint "${PIP_CONSTRAINT}" \ | ||
| --extra-index-url 'https://pypi.nvidia.com' \ | ||
| "${LIBWHOLEGRAPH_WHEELHOUSE}"/*.whl \ | ||
| "$(echo "${PYLIBWHOLEGRAPH_WHEELHOUSE}"/pylibwholegraph_"${RAPIDS_PY_CUDA_SUFFIX}"*.whl)" \ | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -135,6 +135,7 @@ files: | |
| table: project.optional-dependencies | ||
| key: test | ||
| includes: | ||
| - depends_on_pytorch | ||
| - test_python_common | ||
| - test_python_pylibwholegraph | ||
| py_build_cugraph_pyg: | ||
|
|
@@ -324,6 +325,10 @@ dependencies: | |
| - *cmake_ver | ||
| test_python_common: | ||
| common: | ||
| - output_types: [conda] | ||
| packages: | ||
| - torchdata | ||
| - pydantic | ||
|
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just moving this here so This |
||
| - output_types: [conda, pyproject, requirements] | ||
| packages: | ||
| - pytest<9.0.0 | ||
|
|
@@ -343,14 +348,15 @@ dependencies: | |
| - pytest-forked | ||
| - scipy | ||
| depends_on_pytorch: | ||
| common: | ||
| - output_types: [conda] | ||
| packages: | ||
| - torchdata | ||
| - pydantic | ||
| specific: | ||
| - output_types: [requirements] | ||
| matrices: | ||
| # If 'include_torch_extra_index=false' is passed, avoid these --extra-index-url. | ||
| # (useful in CI scripts where we want to tightly control which indices 'pip' uses). | ||
| - matrix: | ||
| include_torch_extra_index: "false" | ||
| packages: | ||
|
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This will only affect cases where So this should not break any other uses of this file. |
||
| # otherwise, choose a CUDA-version-specific index | ||
| - matrix: | ||
| cuda: "12.*" | ||
| packages: | ||
|
|
@@ -374,13 +380,28 @@ dependencies: | |
| - matrix: | ||
| packages: | ||
| - *pytorch_pip | ||
| - output_types: [requirements] | ||
| matrices: | ||
| - matrix: | ||
| cuda: "12.*" | ||
| dependencies: "oldest" | ||
| packages: | ||
| # 2.6.0 is the oldest version on https://download.pytorch.org/whl/cu126 with CUDA wheels | ||
| - torch==2.6.0 | ||
|
||
| - matrix: | ||
| cuda: "13.*" | ||
| dependencies: "oldest" | ||
| packages: | ||
| - torch==2.9.0 | ||
| - matrix: | ||
| packages: | ||
| - output_types: [conda] | ||
| matrices: | ||
| # Prevent fallback to CPU-only pytorch when we want a CUDA variant. | ||
| - matrix: | ||
| require_gpu: "true" | ||
| packages: | ||
| - pytorch-gpu | ||
| - pytorch-gpu >=2.3 | ||
| # Default to falling back to whatever 'pytorch' is pulled in via cugraph-pyg's dependencies. | ||
| - matrix: | ||
| packages: | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: revert this. Just added here for testing, to confirm this will fix the issues we've been seeing in nightlies.