-
Notifications
You must be signed in to change notification settings - Fork 113
Use triton from pytorch in trustyai image #2558
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
JIRA: https://issues.redhat.com/browse/RHOAIENG-34086 Triton here is a dependency of torch, and is just being explicitly re-declared so that the index to pull it from can be customized. Although torch was being pulled from the PyTorch index, all of its dependencies are still defaulting to coming from PyPI (or whatever default index is used for a build). In the triton case, PyPI only has x86_64 wheels for this version, but the PyTorch index has both x86_64 and aarch64 wheels, matching the arches that are available for torch. Since it's a dependency of torch, I've also added same platform exclusion marker to omit for ppc64le.
WalkthroughUpdated Triton dependency: added a platform-constrained requirement and source mapping in pyproject, broadened the lockfile marker to exclude only Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
jupyter/trustyai/ubi9-python-3.12/pylock.toml
(1 hunks)jupyter/trustyai/ubi9-python-3.12/pyproject.toml
(2 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: jiridanek
PR: opendatahub-io/notebooks#2145
File: jupyter/pytorch/ubi9-python-3.12/pyproject.toml:53-66
Timestamp: 2025-08-27T15:52:09.615Z
Learning: jiridanek requested GitHub issue creation for triton and PyTorch components CUDA wheel mapping optimization during PR #2145 review, specifically to investigate whether triton and other PyTorch-hosted components have accelerator-specific wheels requiring explicit index mapping to pytorch-cuda. Issue #2162 was created with comprehensive investigation framework covering triton wheel analysis, other PyTorch components survey, architecture constraints evaluation, repository-wide pattern analysis, multiple solution options, clear acceptance criteria for wheel optimization verification, and proper context linking to PR #2145 review comment, assigned to jiridanek.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#2145
File: jupyter/trustyai/ubi9-python-3.12/Dockerfile.cpu:134-140
Timestamp: 2025-08-27T15:05:31.656Z
Learning: jiridanek requested GitHub issue creation for CUDA wheel optimization in TrustyAI CPU image during PR #2145 review. The CPU Dockerfile currently uses pylock.toml with CUDA-enabled PyTorch wheels (torch==2.6.0+cu126) which was previously discussed with harshad16 and grdryn but deferred. Issue created with comprehensive problem analysis covering unnecessary CUDA wheels in CPU-only image, multiple solution options including lock regeneration and separate CPU/CUDA files, clear acceptance criteria for wheel optimization verification, and proper context linking to PR #2145 review comment.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: build (jupyter-trustyai-ubi9-python-3.12, 3.12, linux/amd64, false) / build
- GitHub Check: Red Hat Konflux / odh-workbench-jupyter-trustyai-cpu-py312-ubi9-on-pull-request
- GitHub Check: code-static-analysis
🔇 Additional comments (1)
jupyter/trustyai/ubi9-python-3.12/pyproject.toml (1)
10-21
: Index override for Triton looks solid.Explicitly redeclaring
triton==3.3.1
with the same platform guard and pointing it to the PyTorch CUDA index is a clean way to ensure the proper aarch64 wheels are pulled alongside torch.Also applies to: 80-80
marker = "platform_machine != 'ppc64le'" | ||
wheels = [ | ||
{ url = "https://files.pythonhosted.org/packages/8d/a9/549e51e9b1b2c9b854fd761a1d23df0ba2fbc60bd0c13b489ffa518cfcb7/triton-3.3.1-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", upload-time = 2025-05-29T23:39:36Z, size = 155600257, hashes = { sha256 = "b74db445b1c562844d3cfad6e9679c72e93fdfb1a90a24052b03bb5c49d1242e" } }, | ||
{ url = "https://files.pythonhosted.org/packages/21/2f/3e56ea7b58f80ff68899b1dbe810ff257c9d177d288c6b0f55bf2fe4eb50/triton-3.3.1-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", upload-time = 2025-05-29T23:39:44Z, size = 155689937, hashes = { sha256 = "b31e3aa26f8cb3cc5bf4e187bf737cbacf17311e1112b781d4a059353dfd731b" } }, | ||
{ url = "https://files.pythonhosted.org/packages/24/5f/950fb373bf9c01ad4eb5a8cd5eaf32cdf9e238c02f9293557a2129b9c4ac/triton-3.3.1-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", upload-time = 2025-05-29T23:39:51Z, size = 155669138, hashes = { sha256 = "9999e83aba21e1a78c1f36f21bce621b77bcaa530277a50484a7cb4a822f6e43" } }, | ||
{ url = "https://files.pythonhosted.org/packages/74/1f/dfb531f90a2d367d914adfee771babbd3f1a5b26c3f5fbc458dee21daa78/triton-3.3.1-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", upload-time = 2025-05-29T23:40:02Z, size = 155673035, hashes = { sha256 = "b89d846b5a4198317fec27a5d3a609ea96b6d557ff44b56c23176546023c4240" } }, | ||
{ url = "https://files.pythonhosted.org/packages/28/71/bd20ffcb7a64c753dc2463489a61bf69d531f308e390ad06390268c4ea04/triton-3.3.1-cp313-cp313t-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", upload-time = 2025-05-29T23:40:10Z, size = 155735832, hashes = { sha256 = "a3198adb9d78b77818a5388bff89fa72ff36f9da0bc689db2f0a651a67ce6a42" } }, | ||
{ url = "https://files.pythonhosted.org/packages/6d/81/ac4d50af22f594c4cb7c84fd2ad5ba1e0c03e2a83fe3483ddd79edcd7ec7/triton-3.3.1-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", upload-time = 2025-05-29T23:40:18Z, size = 155596799, hashes = { sha256 = "f6139aeb04a146b0b8e0fbbd89ad1e65861c57cfed881f21d62d3cb94a36bab7" } }, | ||
{ url = "https://download.pytorch.org/whl/triton-3.3.1-cp310-cp310-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hashes = {} }, | ||
{ url = "https://download.pytorch.org/whl/triton-3.3.1-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hashes = {} }, | ||
{ url = "https://download.pytorch.org/whl/triton-3.3.1-cp311-cp311-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hashes = {} }, | ||
{ url = "https://download.pytorch.org/whl/triton-3.3.1-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hashes = {} }, | ||
{ url = "https://download.pytorch.org/whl/triton-3.3.1-cp312-cp312-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hashes = {} }, | ||
{ url = "https://download.pytorch.org/whl/triton-3.3.1-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hashes = {} }, | ||
{ url = "https://download.pytorch.org/whl/triton-3.3.1-cp313-cp313-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hashes = {} }, | ||
{ url = "https://download.pytorch.org/whl/triton-3.3.1-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hashes = {} }, | ||
{ url = "https://download.pytorch.org/whl/triton-3.3.1-cp313-cp313t-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hashes = {} }, | ||
{ url = "https://download.pytorch.org/whl/triton-3.3.1-cp313-cp313t-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hashes = {} }, | ||
{ url = "https://download.pytorch.org/whl/triton-3.3.1-cp39-cp39-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hashes = {} }, | ||
{ url = "https://download.pytorch.org/whl/triton-3.3.1-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hashes = {} }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Restore the Linux guard on Triton.
Dropping sys_platform == 'linux'
while keeping only manylinux wheels means Mac/Windows installs will now try (and fail) to resolve Triton from this lock. Please keep the Linux constraint so non-Linux developers aren’t broken.
-marker = "platform_machine != 'ppc64le'"
+marker = "platform_machine != 'ppc64le' and sys_platform == 'linux'"
🤖 Prompt for AI Agents
In jupyter/trustyai/ubi9-python-3.12/pylock.toml around lines 4238 to 4251, the
platform guard for the Triton wheels was loosened and now allows non-Linux
platforms to attempt installing manylinux wheels; restore the Linux-only
constraint. Update the marker to include sys_platform == 'linux' (for example:
change the existing marker expression to also require sys_platform == 'linux')
so these wheel entries are only considered on Linux hosts, and verify
quoting/spacing matches the surrounding file convention.
On upstream -pull-request.yaml builds we need to add arm as well on the build platform. |
Done, thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
.tekton/odh-workbench-jupyter-trustyai-cpu-py312-ubi9-pull-request.yaml
(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: jiridanek
PR: opendatahub-io/notebooks#2145
File: jupyter/pytorch/ubi9-python-3.12/pyproject.toml:53-66
Timestamp: 2025-08-27T15:52:09.615Z
Learning: jiridanek requested GitHub issue creation for triton and PyTorch components CUDA wheel mapping optimization during PR #2145 review, specifically to investigate whether triton and other PyTorch-hosted components have accelerator-specific wheels requiring explicit index mapping to pytorch-cuda. Issue #2162 was created with comprehensive investigation framework covering triton wheel analysis, other PyTorch components survey, architecture constraints evaluation, repository-wide pattern analysis, multiple solution options, clear acceptance criteria for wheel optimization verification, and proper context linking to PR #2145 review comment, assigned to jiridanek.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#2145
File: jupyter/trustyai/ubi9-python-3.12/Dockerfile.cpu:134-140
Timestamp: 2025-08-27T15:05:31.656Z
Learning: jiridanek requested GitHub issue creation for CUDA wheel optimization in TrustyAI CPU image during PR #2145 review. The CPU Dockerfile currently uses pylock.toml with CUDA-enabled PyTorch wheels (torch==2.6.0+cu126) which was previously discussed with harshad16 and grdryn but deferred. Issue created with comprehensive problem analysis covering unnecessary CUDA wheels in CPU-only image, multiple solution options including lock regeneration and separate CPU/CUDA files, clear acceptance criteria for wheel optimization verification, and proper context linking to PR #2145 review comment.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#2180
File: .tekton/odh-base-image-cuda-py312-ubi9-pull-request.yaml:36-39
Timestamp: 2025-08-28T12:42:23.404Z
Learning: Konflux build system in opendatahub-io/notebooks repository requires "linux/x86_64" platform identifier format for Tekton PipelineRun build-platforms parameter, not the OCI-standard "linux/amd64". This is consistently used across all .tekton/ pipeline files and enforced by scripts/generate_pull_request_pipelineruns.py type definitions. Docker/buildah contexts use "linux/amd64" while Konflux/Tekton contexts use "linux/x86_64".
📚 Learning: 2025-08-28T12:42:23.404Z
Learnt from: jiridanek
PR: opendatahub-io/notebooks#2180
File: .tekton/odh-base-image-cuda-py312-ubi9-pull-request.yaml:36-39
Timestamp: 2025-08-28T12:42:23.404Z
Learning: Konflux build system in opendatahub-io/notebooks repository requires "linux/x86_64" platform identifier format for Tekton PipelineRun build-platforms parameter, not the OCI-standard "linux/amd64". This is consistently used across all .tekton/ pipeline files and enforced by scripts/generate_pull_request_pipelineruns.py type definitions. Docker/buildah contexts use "linux/amd64" while Konflux/Tekton contexts use "linux/x86_64".
Applied to files:
.tekton/odh-workbench-jupyter-trustyai-cpu-py312-ubi9-pull-request.yaml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Red Hat Konflux / odh-workbench-jupyter-trustyai-cpu-py312-ubi9-on-pull-request
- GitHub Check: build (jupyter-trustyai-ubi9-python-3.12, 3.12, linux/amd64, false) / build
- linux/x86_64 | ||
- linux-m2xlarge/arm64 | ||
- linux/ppc64le |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix invalid build platform identifier.
The Konflux build-platforms parameter must use the linux/<arch>
form (e.g., linux/x86_64
, linux/ppc64le
). Adding linux-m2xlarge/arm64
breaks that contract, so the pipeline will reject or ignore the ARM build instead of scheduling it. Update the new entry to the supported identifier (linux/arm64
/ linux/aarch64
, whichever matches the generator defaults) before merging.
🤖 Prompt for AI Agents
In .tekton/odh-workbench-jupyter-trustyai-cpu-py312-ubi9-pull-request.yaml
around lines 38 to 40, the build-platforms list contains an invalid identifier
"linux-m2xlarge/arm64"; replace that entry with the supported form "linux/arm64"
(or "linux/aarch64" if your generator expects that) so all entries use the
required "linux/<arch>" format and the ARM build is scheduled correctly.
@grdryn: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: atheo89 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Description
JIRA: https://issues.redhat.com/browse/RHOAIENG-34086
Triton here is a dependency of torch, and is just being explicitly re-declared so that the index to pull it from can be customized. Although torch was being pulled from the PyTorch index, all of its dependencies are still defaulting to coming from PyPI (or whatever default index is used for a build). In the triton case, PyPI only has x86_64 wheels for this version, but the PyTorch index has both x86_64 and aarch64 wheels, matching the arches that are available for torch.
Since it's a dependency of torch, I've also added same platform exclusion marker to omit for ppc64le.
How Has This Been Tested?
Self checklist (all need to be checked):
make test
(gmake
on macOS) before asking for reviewDockerfile.konflux
files should be done inodh/notebooks
and automatically synced torhds/notebooks
. For Konflux-specific changes, modifyDockerfile.konflux
files directly inrhds/notebooks
as these require special attention in the downstream repository and flow to the upcoming RHOAI release.Merge criteria:
Summary by CodeRabbit
Bug Fixes
Chores