Use triton from pytorch in trustyai image #2558

grdryn · 2025-10-07T10:03:42Z

Description

JIRA: https://issues.redhat.com/browse/RHOAIENG-34086

Triton here is a dependency of torch, and is just being explicitly re-declared so that the index to pull it from can be customized. Although torch was being pulled from the PyTorch index, all of its dependencies are still defaulting to coming from PyPI (or whatever default index is used for a build). In the triton case, PyPI only has x86_64 wheels for this version, but the PyTorch index has both x86_64 and aarch64 wheels, matching the arches that are available for torch.

Since it's a dependency of torch, I've also added same platform exclusion marker to omit for ppc64le.

How Has This Been Tested?

Self checklist (all need to be checked):

Ensure that you have run make test (gmake on macOS) before asking for review
Changes to everything except Dockerfile.konflux files should be done in odh/notebooks and automatically synced to rhds/notebooks. For Konflux-specific changes, modify Dockerfile.konflux files directly in rhds/notebooks as these require special attention in the downstream repository and flow to the upcoming RHOAI release.

Merge criteria:

The commits are squashed in a cohesive manner and have meaningful messages.
Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
The developer has manually tested the changes and verified that the changes work

Summary by CodeRabbit

Bug Fixes
- Broadened Triton compatibility to apply on more platforms (excluding ppc64le), improving installation reliability.
Chores
- Declared Triton explicitly as a dependency and aligned its package source with the PyTorch CUDA index.
- Added arm64 (linux-m2xlarge/arm64) to the multi-arch build platforms for CI builds.

JIRA: https://issues.redhat.com/browse/RHOAIENG-34086 Triton here is a dependency of torch, and is just being explicitly re-declared so that the index to pull it from can be customized. Although torch was being pulled from the PyTorch index, all of its dependencies are still defaulting to coming from PyPI (or whatever default index is used for a build). In the triton case, PyPI only has x86_64 wheels for this version, but the PyTorch index has both x86_64 and aarch64 wheels, matching the arches that are available for torch. Since it's a dependency of torch, I've also added same platform exclusion marker to omit for ppc64le.

coderabbitai · 2025-10-07T10:04:03Z

Walkthrough

Updated Triton dependency: added a platform-constrained requirement and source mapping in pyproject, broadened the lockfile marker to exclude only ppc64le and reduced Triton wheels to a single aarch64 URL without hashes, and added an arm64 build platform to the Tekton PipelineRun spec.

Changes

Cohort / File(s)	Summary of Changes
PyProject Triton config `jupyter/trustyai/ubi9-python-3.12/pyproject.toml`	Added dependency `triton==3.3.1; platform_machine != 'ppc64le'`. Added UV source mapping `triton = { index = "pytorch-cuda" }`.
Lockfile Triton marker and wheels `jupyter/trustyai/ubi9-python-3.12/pylock.toml`	Changed `triton` marker from `platform_machine != 'ppc64le' and sys_platform == 'linux'` to `platform_machine != 'ppc64le'`. Replaced multiple arch-specific wheel entries with a single `https://download.pytorch.org/whl/triton-3.3.1-cp310-cp310-manylinux_2_27_aarch64.whl` entry and removed hashes.
CI Pipeline platforms `.tekton/odh-workbench-jupyter-trustyai-cpu-py312-ubi9-pull-request.yaml`	Added `linux-m2xlarge/arm64` to the `build-platforms` parameter in the PipelineRun spec.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Description Check	⚠️ Warning	The description includes the required “Description” and “How Has This Been Tested?” headings and outlines the rationale for redeclaring Triton, but it lacks any concrete details of the testing environment, commands executed, or observed results, and the self‐checklist remains unchecked. This omission makes it impossible to assess whether the changes were actually validated.	Please add specific testing details under “How Has This Been Tested?”, including the build environment, test commands run, and outcomes, and update the self‐checklist to reflect completed verification steps.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The title succinctly and accurately captures the primary change of the pull request, namely using Triton from the PyTorch index within the TrustyAI image, without introducing extraneous information or ambiguity. It directly reflects the main modification to dependency sourcing and is clear to reviewers.
Docstring Coverage	✅ Passed	No functions found in the changes. Docstring coverage check skipped.

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 366c43e and dfc41a4.

📒 Files selected for processing (2)

jupyter/trustyai/ubi9-python-3.12/pylock.toml (1 hunks)
jupyter/trustyai/ubi9-python-3.12/pyproject.toml (2 hunks)

🧰 Additional context used

🧠 Learnings (1)

📓 Common learnings

Learnt from: jiridanek
PR: opendatahub-io/notebooks#2145
File: jupyter/pytorch/ubi9-python-3.12/pyproject.toml:53-66
Timestamp: 2025-08-27T15:52:09.615Z
Learning: jiridanek requested GitHub issue creation for triton and PyTorch components CUDA wheel mapping optimization during PR #2145 review, specifically to investigate whether triton and other PyTorch-hosted components have accelerator-specific wheels requiring explicit index mapping to pytorch-cuda. Issue #2162 was created with comprehensive investigation framework covering triton wheel analysis, other PyTorch components survey, architecture constraints evaluation, repository-wide pattern analysis, multiple solution options, clear acceptance criteria for wheel optimization verification, and proper context linking to PR #2145 review comment, assigned to jiridanek.

Learnt from: jiridanek
PR: opendatahub-io/notebooks#2145
File: jupyter/trustyai/ubi9-python-3.12/Dockerfile.cpu:134-140
Timestamp: 2025-08-27T15:05:31.656Z
Learning: jiridanek requested GitHub issue creation for CUDA wheel optimization in TrustyAI CPU image during PR #2145 review. The CPU Dockerfile currently uses pylock.toml with CUDA-enabled PyTorch wheels (torch==2.6.0+cu126) which was previously discussed with harshad16 and grdryn but deferred. Issue created with comprehensive problem analysis covering unnecessary CUDA wheels in CPU-only image, multiple solution options including lock regeneration and separate CPU/CUDA files, clear acceptance criteria for wheel optimization verification, and proper context linking to PR #2145 review comment.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: build (jupyter-trustyai-ubi9-python-3.12, 3.12, linux/amd64, false) / build
GitHub Check: Red Hat Konflux / odh-workbench-jupyter-trustyai-cpu-py312-ubi9-on-pull-request
GitHub Check: code-static-analysis

🔇 Additional comments (1)

jupyter/trustyai/ubi9-python-3.12/pyproject.toml (1)

10-21: Index override for Triton looks solid.

Explicitly redeclaring triton==3.3.1 with the same platform guard and pointing it to the PyTorch CUDA index is a clean way to ensure the proper aarch64 wheels are pulled alongside torch.

Also applies to: 80-80

coderabbitai · 2025-10-07T10:07:01Z

jupyter/trustyai/ubi9-python-3.12/pylock.toml

+marker = "platform_machine != 'ppc64le'"
 wheels = [
-    { url = "https://files.pythonhosted.org/packages/8d/a9/549e51e9b1b2c9b854fd761a1d23df0ba2fbc60bd0c13b489ffa518cfcb7/triton-3.3.1-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", upload-time = 2025-05-29T23:39:36Z, size = 155600257, hashes = { sha256 = "b74db445b1c562844d3cfad6e9679c72e93fdfb1a90a24052b03bb5c49d1242e" } },
-    { url = "https://files.pythonhosted.org/packages/21/2f/3e56ea7b58f80ff68899b1dbe810ff257c9d177d288c6b0f55bf2fe4eb50/triton-3.3.1-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", upload-time = 2025-05-29T23:39:44Z, size = 155689937, hashes = { sha256 = "b31e3aa26f8cb3cc5bf4e187bf737cbacf17311e1112b781d4a059353dfd731b" } },
-    { url = "https://files.pythonhosted.org/packages/24/5f/950fb373bf9c01ad4eb5a8cd5eaf32cdf9e238c02f9293557a2129b9c4ac/triton-3.3.1-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", upload-time = 2025-05-29T23:39:51Z, size = 155669138, hashes = { sha256 = "9999e83aba21e1a78c1f36f21bce621b77bcaa530277a50484a7cb4a822f6e43" } },
-    { url = "https://files.pythonhosted.org/packages/74/1f/dfb531f90a2d367d914adfee771babbd3f1a5b26c3f5fbc458dee21daa78/triton-3.3.1-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", upload-time = 2025-05-29T23:40:02Z, size = 155673035, hashes = { sha256 = "b89d846b5a4198317fec27a5d3a609ea96b6d557ff44b56c23176546023c4240" } },
-    { url = "https://files.pythonhosted.org/packages/28/71/bd20ffcb7a64c753dc2463489a61bf69d531f308e390ad06390268c4ea04/triton-3.3.1-cp313-cp313t-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", upload-time = 2025-05-29T23:40:10Z, size = 155735832, hashes = { sha256 = "a3198adb9d78b77818a5388bff89fa72ff36f9da0bc689db2f0a651a67ce6a42" } },
-    { url = "https://files.pythonhosted.org/packages/6d/81/ac4d50af22f594c4cb7c84fd2ad5ba1e0c03e2a83fe3483ddd79edcd7ec7/triton-3.3.1-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", upload-time = 2025-05-29T23:40:18Z, size = 155596799, hashes = { sha256 = "f6139aeb04a146b0b8e0fbbd89ad1e65861c57cfed881f21d62d3cb94a36bab7" } },
+    { url = "https://download.pytorch.org/whl/triton-3.3.1-cp310-cp310-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hashes = {} },
+    { url = "https://download.pytorch.org/whl/triton-3.3.1-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hashes = {} },
+    { url = "https://download.pytorch.org/whl/triton-3.3.1-cp311-cp311-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hashes = {} },
+    { url = "https://download.pytorch.org/whl/triton-3.3.1-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hashes = {} },
+    { url = "https://download.pytorch.org/whl/triton-3.3.1-cp312-cp312-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hashes = {} },
+    { url = "https://download.pytorch.org/whl/triton-3.3.1-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hashes = {} },
+    { url = "https://download.pytorch.org/whl/triton-3.3.1-cp313-cp313-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hashes = {} },
+    { url = "https://download.pytorch.org/whl/triton-3.3.1-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hashes = {} },
+    { url = "https://download.pytorch.org/whl/triton-3.3.1-cp313-cp313t-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hashes = {} },
+    { url = "https://download.pytorch.org/whl/triton-3.3.1-cp313-cp313t-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hashes = {} },
+    { url = "https://download.pytorch.org/whl/triton-3.3.1-cp39-cp39-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hashes = {} },
+    { url = "https://download.pytorch.org/whl/triton-3.3.1-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hashes = {} },


⚠️ Potential issue | 🟠 Major

Restore the Linux guard on Triton.

Dropping sys_platform == 'linux' while keeping only manylinux wheels means Mac/Windows installs will now try (and fail) to resolve Triton from this lock. Please keep the Linux constraint so non-Linux developers aren’t broken.

-marker = "platform_machine != 'ppc64le'" +marker = "platform_machine != 'ppc64le' and sys_platform == 'linux'"

🤖 Prompt for AI Agents

In jupyter/trustyai/ubi9-python-3.12/pylock.toml around lines 4238 to 4251, the platform guard for the Triton wheels was loosened and now allows non-Linux platforms to attempt installing manylinux wheels; restore the Linux-only constraint. Update the marker to include sys_platform == 'linux' (for example: change the existing marker expression to also require sys_platform == 'linux') so these wheel entries are only considered on Linux hosts, and verify quoting/spacing matches the surrounding file convention.

atheo89 · 2025-10-07T12:53:25Z

On upstream -pull-request.yaml builds we need to add arm as well on the build platform.
I have trigger it on downstream 2.25 tho : https://konflux-ui.apps.stone-prd-rh01.pg1f.p1.openshiftapps.com/ns/rhoai-tenant/applications/automation/pipelineruns/odh-workbench-jupyter-trustyai-cpu-py312-on-pull-request-mmmlc

grdryn · 2025-10-07T13:39:47Z

On upstream -pull-request.yaml builds we need to add arm as well on the build platform.

Done, thanks!

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dfc41a4 and a0cb14c.

📒 Files selected for processing (1)

.tekton/odh-workbench-jupyter-trustyai-cpu-py312-ubi9-pull-request.yaml (1 hunks)

🧰 Additional context used

🧠 Learnings (2)

📓 Common learnings

Learnt from: jiridanek
PR: opendatahub-io/notebooks#2145
File: jupyter/pytorch/ubi9-python-3.12/pyproject.toml:53-66
Timestamp: 2025-08-27T15:52:09.615Z
Learning: jiridanek requested GitHub issue creation for triton and PyTorch components CUDA wheel mapping optimization during PR #2145 review, specifically to investigate whether triton and other PyTorch-hosted components have accelerator-specific wheels requiring explicit index mapping to pytorch-cuda. Issue #2162 was created with comprehensive investigation framework covering triton wheel analysis, other PyTorch components survey, architecture constraints evaluation, repository-wide pattern analysis, multiple solution options, clear acceptance criteria for wheel optimization verification, and proper context linking to PR #2145 review comment, assigned to jiridanek.

Learnt from: jiridanek
PR: opendatahub-io/notebooks#2145
File: jupyter/trustyai/ubi9-python-3.12/Dockerfile.cpu:134-140
Timestamp: 2025-08-27T15:05:31.656Z
Learning: jiridanek requested GitHub issue creation for CUDA wheel optimization in TrustyAI CPU image during PR #2145 review. The CPU Dockerfile currently uses pylock.toml with CUDA-enabled PyTorch wheels (torch==2.6.0+cu126) which was previously discussed with harshad16 and grdryn but deferred. Issue created with comprehensive problem analysis covering unnecessary CUDA wheels in CPU-only image, multiple solution options including lock regeneration and separate CPU/CUDA files, clear acceptance criteria for wheel optimization verification, and proper context linking to PR #2145 review comment.

Learnt from: jiridanek
PR: opendatahub-io/notebooks#2180
File: .tekton/odh-base-image-cuda-py312-ubi9-pull-request.yaml:36-39
Timestamp: 2025-08-28T12:42:23.404Z
Learning: Konflux build system in opendatahub-io/notebooks repository requires "linux/x86_64" platform identifier format for Tekton PipelineRun build-platforms parameter, not the OCI-standard "linux/amd64". This is consistently used across all .tekton/ pipeline files and enforced by scripts/generate_pull_request_pipelineruns.py type definitions. Docker/buildah contexts use "linux/amd64" while Konflux/Tekton contexts use "linux/x86_64".

📚 Learning: 2025-08-28T12:42:23.404Z

Learnt from: jiridanek
PR: opendatahub-io/notebooks#2180
File: .tekton/odh-base-image-cuda-py312-ubi9-pull-request.yaml:36-39
Timestamp: 2025-08-28T12:42:23.404Z
Learning: Konflux build system in opendatahub-io/notebooks repository requires "linux/x86_64" platform identifier format for Tekton PipelineRun build-platforms parameter, not the OCI-standard "linux/amd64". This is consistently used across all .tekton/ pipeline files and enforced by scripts/generate_pull_request_pipelineruns.py type definitions. Docker/buildah contexts use "linux/amd64" while Konflux/Tekton contexts use "linux/x86_64".

Applied to files:

.tekton/odh-workbench-jupyter-trustyai-cpu-py312-ubi9-pull-request.yaml

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Red Hat Konflux / odh-workbench-jupyter-trustyai-cpu-py312-ubi9-on-pull-request
GitHub Check: build (jupyter-trustyai-ubi9-python-3.12, 3.12, linux/amd64, false) / build

coderabbitai · 2025-10-07T13:40:41Z

.tekton/odh-workbench-jupyter-trustyai-cpu-py312-ubi9-pull-request.yaml

    - linux/x86_64
+    - linux-m2xlarge/arm64
    - linux/ppc64le


⚠️ Potential issue | 🔴 Critical

Fix invalid build platform identifier.

The Konflux build-platforms parameter must use the linux/<arch> form (e.g., linux/x86_64, linux/ppc64le). Adding linux-m2xlarge/arm64 breaks that contract, so the pipeline will reject or ignore the ARM build instead of scheduling it. Update the new entry to the supported identifier (linux/arm64 / linux/aarch64, whichever matches the generator defaults) before merging.

🤖 Prompt for AI Agents

In .tekton/odh-workbench-jupyter-trustyai-cpu-py312-ubi9-pull-request.yaml around lines 38 to 40, the build-platforms list contains an invalid identifier "linux-m2xlarge/arm64"; replace that entry with the supported form "linux/arm64" (or "linux/aarch64" if your generator expects that) so all entries use the required "linux/<arch>" format and the ARM build is scheduled correctly.

openshift-ci · 2025-10-07T14:04:11Z

@grdryn: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/notebooks-py312-ubi9-e2e-tests	`dfc41a4`	link	true	`/test notebooks-py312-ubi9-e2e-tests`
ci/prow/notebook-jupyter-tai-ubi9-python-3-12-pr-image-mirror	`a0cb14c`	link	true	`/test notebook-jupyter-tai-ubi9-python-3-12-pr-image-mirror`
ci/prow/images	`a0cb14c`	link	true	`/test images`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

atheo89

/lgtm

openshift-ci · 2025-10-07T14:22:18Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: atheo89

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [atheo89]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci bot requested review from atheo89 and dibryant October 7, 2025 10:03

github-actions bot added the review-requested GitHub Bot creates notification on #pr-review-ai-ide-team slack channel label Oct 7, 2025

openshift-ci bot added the size/m label Oct 7, 2025

openshift-ci bot added size/m and removed size/m labels Oct 7, 2025

coderabbitai bot reviewed Oct 7, 2025

View reviewed changes

grdryn mentioned this pull request Oct 7, 2025

Try using triton from pytorch for arm red-hat-data-services/notebooks#1613

Open

5 tasks

Add arm64 to trustyai konflux PR pipelinerun

a0cb14c

openshift-ci bot added size/m and removed size/m labels Oct 7, 2025

coderabbitai bot reviewed Oct 7, 2025

View reviewed changes

atheo89 approved these changes Oct 7, 2025

View reviewed changes

openshift-ci bot assigned atheo89 Oct 7, 2025

openshift-ci bot added the lgtm label Oct 7, 2025

openshift-ci bot added the approved label Oct 7, 2025

atheo89 merged commit 23f9dcf into opendatahub-io:main Oct 7, 2025
11 of 18 checks passed

openshift-ci bot added size/m and removed size/m labels Oct 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use triton from pytorch in trustyai image #2558

Use triton from pytorch in trustyai image #2558

Uh oh!

grdryn commented Oct 7, 2025 •

edited by jiridanek

Loading

Uh oh!

coderabbitai bot commented Oct 7, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Oct 7, 2025

Uh oh!

atheo89 commented Oct 7, 2025

Uh oh!

grdryn commented Oct 7, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Oct 7, 2025

Uh oh!

openshift-ci bot commented Oct 7, 2025

Uh oh!

atheo89 left a comment

Uh oh!

openshift-ci bot commented Oct 7, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Use triton from pytorch in trustyai image #2558

Use triton from pytorch in trustyai image #2558

Uh oh!

Conversation

grdryn commented Oct 7, 2025 • edited by jiridanek Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

How Has This Been Tested?

Merge criteria:

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

atheo89 commented Oct 7, 2025

Uh oh!

grdryn commented Oct 7, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

openshift-ci bot commented Oct 7, 2025

Uh oh!

atheo89 left a comment

Choose a reason for hiding this comment

Uh oh!

openshift-ci bot commented Oct 7, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

grdryn commented Oct 7, 2025 •

edited by jiridanek

Loading

coderabbitai bot commented Oct 7, 2025 •

edited

Loading