Skip to content

Comments

[CI] Revert CUDA, PyTorch and ONNX upgrade#18787

Closed
mshr-h wants to merge 2 commits intoapache:mainfrom
mshr-h:revert-cuda
Closed

[CI] Revert CUDA, PyTorch and ONNX upgrade#18787
mshr-h wants to merge 2 commits intoapache:mainfrom
mshr-h:revert-cuda

Conversation

@mshr-h
Copy link
Contributor

@mshr-h mshr-h commented Feb 16, 2026

With 20260214-152058-2a448ce4 images, OpenCL tests are failing with segmentation fault. Can't reproduce on my local machine but I gues it's due to CUDA version upgrade. This PR revert it and also downgrade PyTorch to match the CUDA compatibility.

@mshr-h mshr-h changed the title [CI} Revert CUDA and PyTorch [CI] Revert CUDA and PyTorch Feb 16, 2026
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @mshr-h, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request aims to resolve recent OpenCL test failures observed in the CI pipeline, which are believed to stem from an incompatibility introduced by a recent CUDA version upgrade. The changes involve reverting the CUDA base image to a known stable version and subsequently downgrading dependent libraries like PyTorch, ONNX, and their runtimes to ensure full compatibility and restore the reliability of the continuous integration tests.

Highlights

  • CUDA Version Reversion: The base CUDA image used in the CI GPU Dockerfile was reverted from version 12.8.1 to 12.4.1 to address suspected compatibility issues causing test failures.
  • PyTorch and ONNX Downgrade: The installation script for ONNX and PyTorch was updated to install specific, older versions of ONNX, ONNX Runtime, ONNX Optimizer, PyTorch, and torchvision. These versions are now conditionally selected based on the Python version (3.9, 3.11, or other) and the target device (CUDA/CPU) to ensure compatibility with the reverted CUDA version.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • docker/Dockerfile.ci_gpu
    • Reverted the CUDA base image from nvidia/cuda:12.8.1-cudnn-devel-ubuntu22.04 to nvidia/cuda:12.4.1-cudnn-devel-ubuntu22.04.
  • docker/install/ubuntu_install_onnx.sh
    • Introduced conditional installation logic for ONNX, ONNX Runtime, ONNX Optimizer, PyTorch, and torchvision based on the detected Python version (3.9, 3.11, or other) and the specified device (cuda/cpu).
    • Downgraded ONNX to 1.16.0 (for Python 3.9), 1.17.0 (for Python 3.11), or 1.12.0 (for other Python versions).
    • Downgraded ONNX Runtime to 1.19.2 (for Python 3.9), 1.20.1 (for Python 3.11), or 1.12.1 (for other Python versions).
    • Downgraded ONNX Optimizer to 0.2.7 across all conditional installation paths.
    • Downgraded PyTorch to 2.7.0 (for Python 3.9 and 3.11) or 2.4.1 (for other Python versions).
    • Downgraded torchvision to 0.22.0 (for Python 3.9 and 3.11) or 0.19.1 (for other Python versions).
    • Added pip3 install future to the installation steps.
    • Added comments explaining the rationale behind fixing ONNX versions due to potential test breakage.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request reverts the CUDA and PyTorch versions to fix a CI failure, which is reflected in the changes to docker/Dockerfile.ci_gpu and docker/install/ubuntu_install_onnx.sh. The changes seem correct for the stated purpose. I've added one suggestion to refactor the installation script in docker/install/ubuntu_install_onnx.sh to reduce code duplication and improve maintainability.

Comment on lines 39 to 89
if [ "$PYTHON_VERSION" == "3.9" ]; then
pip3 install \
onnx==1.16.0 \
onnxruntime==1.19.2 \
onnxoptimizer==0.2.7

if [ "$DEVICE" == "cuda" ]; then
if [ "$DEVICE" == "cuda" ]; then
pip3 install \
torch==2.7.0 \
torchvision==0.22.0 \
--index-url https://download.pytorch.org/whl/cu118
else
pip3 install \
torch==2.7.0 \
torchvision==0.22.0 \
--extra-index-url https://download.pytorch.org/whl/cpu
fi
elif [ "$PYTHON_VERSION" == "3.11" ]; then
pip3 install \
torch==2.10.0 \
torchvision==0.25.0
onnx==1.17.0 \
onnxruntime==1.20.1 \
onnxoptimizer==0.2.7

if [ "$DEVICE" == "cuda" ]; then
pip3 install \
torch==2.7.0 \
torchvision==0.22.0 \
--index-url https://download.pytorch.org/whl/cu118
else
pip3 install \
torch==2.7.0 \
torchvision==0.22.0 \
--extra-index-url https://download.pytorch.org/whl/cpu
fi
else
pip3 install \
torch==2.10.0 \
torchvision==0.25.0 \
--extra-index-url https://download.pytorch.org/whl/cpu
onnx==1.12.0 \
onnxruntime==1.12.1 \
onnxoptimizer==0.2.7

if [ "$DEVICE" == "cuda" ]; then
pip3 install \
torch==2.4.1 \
torchvision==0.19.1
else
pip3 install \
torch==2.4.1 \
torchvision==0.19.1 \
--extra-index-url https://download.pytorch.org/whl/cpu
fi
fi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This if/elif/else block contains significant code duplication for installing torch and onnx packages across different Python versions. You can refactor this by defining package version strings in variables based on the Python version, and then using these variables in shared pip3 install commands. This will make the script more readable and easier to maintain in the future.

if [ "$PYTHON_VERSION" == "3.9" ]; then
    ONNX_PKGS="onnx==1.16.0 onnxruntime==1.19.2 onnxoptimizer==0.2.7"
elif [ "$PYTHON_VERSION" == "3.11" ]; then
    ONNX_PKGS="onnx==1.17.0 onnxruntime==1.20.1 onnxoptimizer==0.2.7"
else
    ONNX_PKGS="onnx==1.12.0 onnxruntime==1.12.1 onnxoptimizer==0.2.7"
fi
pip3 install ${ONNX_PKGS}

if [ "$PYTHON_VERSION" == "3.9" ] || [ "$PYTHON_VERSION" == "3.11" ]; then
    TORCH_PKGS="torch==2.7.0 torchvision==0.22.0"
    TORCH_CUDA_ARGS="--index-url https://download.pytorch.org/whl/cu118"
else
    TORCH_PKGS="torch==2.4.1 torchvision==0.19.1"
    TORCH_CUDA_ARGS=""
fi

if [ "$DEVICE" == "cuda" ]; then
    pip3 install ${TORCH_PKGS} ${TORCH_CUDA_ARGS}
else
    pip3 install ${TORCH_PKGS} --extra-index-url https://download.pytorch.org/whl/cpu
fi

@mshr-h mshr-h changed the title [CI] Revert CUDA and PyTorch [CI] Revert CUDA, PyTorch and ONNX upgrade Feb 16, 2026
@tqchen
Copy link
Member

tqchen commented Feb 16, 2026

do we know which was the test failing? I feel it is important for the CI to be up to date in terms of cuda/torch versions.

So for the case of opencl, perhapas we can temp skip some of the tests?

@mshr-h
Copy link
Contributor Author

mshr-h commented Feb 16, 2026

I guess all of the opencl tests was failing. @tqchen

I feel it is important for the CI to be up to date in terms of cuda/torch versions.

agree

@mshr-h
Copy link
Contributor Author

mshr-h commented Feb 16, 2026

I'm trying to skip all opencl tests and see if it passes. https://ci.tlcpack.ai/blue/organizations/jenkins/tvm-gpu/detail/PR-18775/36/pipeline

@tqchen
Copy link
Member

tqchen commented Feb 16, 2026

yes, i think it is ok to skip opencl tests for now

onnx==1.20.1 \
onnxruntime==1.23.2 \
onnxoptimizer==0.4.2
future \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let us wait and see if skip works

@mshr-h
Copy link
Contributor Author

mshr-h commented Feb 17, 2026

closing as skip works

@mshr-h mshr-h closed this Feb 17, 2026
@mshr-h mshr-h deleted the revert-cuda branch February 17, 2026 05:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants