Skip to content

Conversation

grdryn
Copy link
Member

@grdryn grdryn commented Oct 7, 2025

Description

JIRA: https://issues.redhat.com/browse/RHOAIENG-34086

Triton here is a dependency of torch, and is just being explicitly re-declared so that the index to pull it from can be customized. Although torch was being pulled from the PyTorch index, all of its dependencies are still defaulting to coming from PyPI (or whatever default index is used for a build). In the triton case, PyPI only has x86_64 wheels for this version, but the PyTorch index has both x86_64 and aarch64 wheels, matching the arches that are available for torch.

Since it's a dependency of torch, I've also added same platform exclusion marker to omit for ppc64le.

How Has This Been Tested?

Self checklist (all need to be checked):

  • Ensure that you have run make test (gmake on macOS) before asking for review
  • Changes to everything except Dockerfile.konflux files should be done in odh/notebooks and automatically synced to rhds/notebooks. For Konflux-specific changes, modify Dockerfile.konflux files directly in rhds/notebooks as these require special attention in the downstream repository and flow to the upcoming RHOAI release.

Merge criteria:

  • The commits are squashed in a cohesive manner and have meaningful messages.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work

Summary by CodeRabbit

  • Bug Fixes

    • Broadened Triton compatibility to apply on more platforms (excluding ppc64le), improving installation reliability.
  • Chores

    • Declared Triton explicitly as a dependency and aligned its package source with the PyTorch CUDA index.
    • Added arm64 (linux-m2xlarge/arm64) to the multi-arch build platforms for CI builds.

JIRA: https://issues.redhat.com/browse/RHOAIENG-34086

Triton here is a dependency of torch, and is just being explicitly
re-declared so that the index to pull it from can be
customized. Although torch was being pulled from the PyTorch index,
all of its dependencies are still defaulting to coming from PyPI (or
whatever default index is used for a build). In the triton case, PyPI
only has x86_64 wheels for this version, but the PyTorch index has
both x86_64 and aarch64 wheels, matching the arches that are available
for torch.

Since it's a dependency of torch, I've also added same platform
exclusion marker to omit for ppc64le.
@openshift-ci openshift-ci bot requested review from atheo89 and dibryant October 7, 2025 10:03
@github-actions github-actions bot added the review-requested GitHub Bot creates notification on #pr-review-ai-ide-team slack channel label Oct 7, 2025
@openshift-ci openshift-ci bot added the size/m label Oct 7, 2025
Copy link
Contributor

coderabbitai bot commented Oct 7, 2025

Walkthrough

Updated Triton dependency: added a platform-constrained requirement and source mapping in pyproject, broadened the lockfile marker to exclude only ppc64le and reduced Triton wheels to a single aarch64 URL without hashes, and added an arm64 build platform to the Tekton PipelineRun spec.

Changes

Cohort / File(s) Summary of Changes
PyProject Triton config
jupyter/trustyai/ubi9-python-3.12/pyproject.toml
Added dependency triton==3.3.1; platform_machine != 'ppc64le'. Added UV source mapping triton = { index = "pytorch-cuda" }.
Lockfile Triton marker and wheels
jupyter/trustyai/ubi9-python-3.12/pylock.toml
Changed triton marker from platform_machine != 'ppc64le' and sys_platform == 'linux' to platform_machine != 'ppc64le'. Replaced multiple arch-specific wheel entries with a single https://download.pytorch.org/whl/triton-3.3.1-cp310-cp310-manylinux_2_27_aarch64.whl entry and removed hashes.
CI Pipeline platforms
.tekton/odh-workbench-jupyter-trustyai-cpu-py312-ubi9-pull-request.yaml
Added linux-m2xlarge/arm64 to the build-platforms parameter in the PipelineRun spec.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The description includes the required “Description” and “How Has This Been Tested?” headings and outlines the rationale for redeclaring Triton, but it lacks any concrete details of the testing environment, commands executed, or observed results, and the self‐checklist remains unchecked. This omission makes it impossible to assess whether the changes were actually validated. Please add specific testing details under “How Has This Been Tested?”, including the build environment, test commands run, and outcomes, and update the self‐checklist to reflect completed verification steps.
✅ Passed checks (2 passed)
Check name Status Explanation
Title Check ✅ Passed The title succinctly and accurately captures the primary change of the pull request, namely using Triton from the PyTorch index within the TrustyAI image, without introducing extraneous information or ambiguity. It directly reflects the main modification to dependency sourcing and is clear to reviewers.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci bot added size/m and removed size/m labels Oct 7, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 366c43e and dfc41a4.

📒 Files selected for processing (2)
  • jupyter/trustyai/ubi9-python-3.12/pylock.toml (1 hunks)
  • jupyter/trustyai/ubi9-python-3.12/pyproject.toml (2 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: jiridanek
PR: opendatahub-io/notebooks#2145
File: jupyter/pytorch/ubi9-python-3.12/pyproject.toml:53-66
Timestamp: 2025-08-27T15:52:09.615Z
Learning: jiridanek requested GitHub issue creation for triton and PyTorch components CUDA wheel mapping optimization during PR #2145 review, specifically to investigate whether triton and other PyTorch-hosted components have accelerator-specific wheels requiring explicit index mapping to pytorch-cuda. Issue #2162 was created with comprehensive investigation framework covering triton wheel analysis, other PyTorch components survey, architecture constraints evaluation, repository-wide pattern analysis, multiple solution options, clear acceptance criteria for wheel optimization verification, and proper context linking to PR #2145 review comment, assigned to jiridanek.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#2145
File: jupyter/trustyai/ubi9-python-3.12/Dockerfile.cpu:134-140
Timestamp: 2025-08-27T15:05:31.656Z
Learning: jiridanek requested GitHub issue creation for CUDA wheel optimization in TrustyAI CPU image during PR #2145 review. The CPU Dockerfile currently uses pylock.toml with CUDA-enabled PyTorch wheels (torch==2.6.0+cu126) which was previously discussed with harshad16 and grdryn but deferred. Issue created with comprehensive problem analysis covering unnecessary CUDA wheels in CPU-only image, multiple solution options including lock regeneration and separate CPU/CUDA files, clear acceptance criteria for wheel optimization verification, and proper context linking to PR #2145 review comment.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build (jupyter-trustyai-ubi9-python-3.12, 3.12, linux/amd64, false) / build
  • GitHub Check: Red Hat Konflux / odh-workbench-jupyter-trustyai-cpu-py312-ubi9-on-pull-request
  • GitHub Check: code-static-analysis
🔇 Additional comments (1)
jupyter/trustyai/ubi9-python-3.12/pyproject.toml (1)

10-21: Index override for Triton looks solid.

Explicitly redeclaring triton==3.3.1 with the same platform guard and pointing it to the PyTorch CUDA index is a clean way to ensure the proper aarch64 wheels are pulled alongside torch.

Also applies to: 80-80

Comment on lines +4238 to +4251
marker = "platform_machine != 'ppc64le'"
wheels = [
{ url = "https://files.pythonhosted.org/packages/8d/a9/549e51e9b1b2c9b854fd761a1d23df0ba2fbc60bd0c13b489ffa518cfcb7/triton-3.3.1-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", upload-time = 2025-05-29T23:39:36Z, size = 155600257, hashes = { sha256 = "b74db445b1c562844d3cfad6e9679c72e93fdfb1a90a24052b03bb5c49d1242e" } },
{ url = "https://files.pythonhosted.org/packages/21/2f/3e56ea7b58f80ff68899b1dbe810ff257c9d177d288c6b0f55bf2fe4eb50/triton-3.3.1-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", upload-time = 2025-05-29T23:39:44Z, size = 155689937, hashes = { sha256 = "b31e3aa26f8cb3cc5bf4e187bf737cbacf17311e1112b781d4a059353dfd731b" } },
{ url = "https://files.pythonhosted.org/packages/24/5f/950fb373bf9c01ad4eb5a8cd5eaf32cdf9e238c02f9293557a2129b9c4ac/triton-3.3.1-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", upload-time = 2025-05-29T23:39:51Z, size = 155669138, hashes = { sha256 = "9999e83aba21e1a78c1f36f21bce621b77bcaa530277a50484a7cb4a822f6e43" } },
{ url = "https://files.pythonhosted.org/packages/74/1f/dfb531f90a2d367d914adfee771babbd3f1a5b26c3f5fbc458dee21daa78/triton-3.3.1-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", upload-time = 2025-05-29T23:40:02Z, size = 155673035, hashes = { sha256 = "b89d846b5a4198317fec27a5d3a609ea96b6d557ff44b56c23176546023c4240" } },
{ url = "https://files.pythonhosted.org/packages/28/71/bd20ffcb7a64c753dc2463489a61bf69d531f308e390ad06390268c4ea04/triton-3.3.1-cp313-cp313t-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", upload-time = 2025-05-29T23:40:10Z, size = 155735832, hashes = { sha256 = "a3198adb9d78b77818a5388bff89fa72ff36f9da0bc689db2f0a651a67ce6a42" } },
{ url = "https://files.pythonhosted.org/packages/6d/81/ac4d50af22f594c4cb7c84fd2ad5ba1e0c03e2a83fe3483ddd79edcd7ec7/triton-3.3.1-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", upload-time = 2025-05-29T23:40:18Z, size = 155596799, hashes = { sha256 = "f6139aeb04a146b0b8e0fbbd89ad1e65861c57cfed881f21d62d3cb94a36bab7" } },
{ url = "https://download.pytorch.org/whl/triton-3.3.1-cp310-cp310-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hashes = {} },
{ url = "https://download.pytorch.org/whl/triton-3.3.1-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hashes = {} },
{ url = "https://download.pytorch.org/whl/triton-3.3.1-cp311-cp311-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hashes = {} },
{ url = "https://download.pytorch.org/whl/triton-3.3.1-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hashes = {} },
{ url = "https://download.pytorch.org/whl/triton-3.3.1-cp312-cp312-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hashes = {} },
{ url = "https://download.pytorch.org/whl/triton-3.3.1-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hashes = {} },
{ url = "https://download.pytorch.org/whl/triton-3.3.1-cp313-cp313-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hashes = {} },
{ url = "https://download.pytorch.org/whl/triton-3.3.1-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hashes = {} },
{ url = "https://download.pytorch.org/whl/triton-3.3.1-cp313-cp313t-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hashes = {} },
{ url = "https://download.pytorch.org/whl/triton-3.3.1-cp313-cp313t-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hashes = {} },
{ url = "https://download.pytorch.org/whl/triton-3.3.1-cp39-cp39-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hashes = {} },
{ url = "https://download.pytorch.org/whl/triton-3.3.1-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hashes = {} },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Restore the Linux guard on Triton.

Dropping sys_platform == 'linux' while keeping only manylinux wheels means Mac/Windows installs will now try (and fail) to resolve Triton from this lock. Please keep the Linux constraint so non-Linux developers aren’t broken.

-marker = "platform_machine != 'ppc64le'"
+marker = "platform_machine != 'ppc64le' and sys_platform == 'linux'"
🤖 Prompt for AI Agents
In jupyter/trustyai/ubi9-python-3.12/pylock.toml around lines 4238 to 4251, the
platform guard for the Triton wheels was loosened and now allows non-Linux
platforms to attempt installing manylinux wheels; restore the Linux-only
constraint. Update the marker to include sys_platform == 'linux' (for example:
change the existing marker expression to also require sys_platform == 'linux')
so these wheel entries are only considered on Linux hosts, and verify
quoting/spacing matches the surrounding file convention.

@atheo89
Copy link
Member

atheo89 commented Oct 7, 2025

On upstream -pull-request.yaml builds we need to add arm as well on the build platform.
I have trigger it on downstream 2.25 tho : https://konflux-ui.apps.stone-prd-rh01.pg1f.p1.openshiftapps.com/ns/rhoai-tenant/applications/automation/pipelineruns/odh-workbench-jupyter-trustyai-cpu-py312-on-pull-request-mmmlc

@openshift-ci openshift-ci bot added size/m and removed size/m labels Oct 7, 2025
@grdryn
Copy link
Member Author

grdryn commented Oct 7, 2025

On upstream -pull-request.yaml builds we need to add arm as well on the build platform.

Done, thanks!

@openshift-ci openshift-ci bot added size/m and removed size/m labels Oct 7, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dfc41a4 and a0cb14c.

📒 Files selected for processing (1)
  • .tekton/odh-workbench-jupyter-trustyai-cpu-py312-ubi9-pull-request.yaml (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: jiridanek
PR: opendatahub-io/notebooks#2145
File: jupyter/pytorch/ubi9-python-3.12/pyproject.toml:53-66
Timestamp: 2025-08-27T15:52:09.615Z
Learning: jiridanek requested GitHub issue creation for triton and PyTorch components CUDA wheel mapping optimization during PR #2145 review, specifically to investigate whether triton and other PyTorch-hosted components have accelerator-specific wheels requiring explicit index mapping to pytorch-cuda. Issue #2162 was created with comprehensive investigation framework covering triton wheel analysis, other PyTorch components survey, architecture constraints evaluation, repository-wide pattern analysis, multiple solution options, clear acceptance criteria for wheel optimization verification, and proper context linking to PR #2145 review comment, assigned to jiridanek.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#2145
File: jupyter/trustyai/ubi9-python-3.12/Dockerfile.cpu:134-140
Timestamp: 2025-08-27T15:05:31.656Z
Learning: jiridanek requested GitHub issue creation for CUDA wheel optimization in TrustyAI CPU image during PR #2145 review. The CPU Dockerfile currently uses pylock.toml with CUDA-enabled PyTorch wheels (torch==2.6.0+cu126) which was previously discussed with harshad16 and grdryn but deferred. Issue created with comprehensive problem analysis covering unnecessary CUDA wheels in CPU-only image, multiple solution options including lock regeneration and separate CPU/CUDA files, clear acceptance criteria for wheel optimization verification, and proper context linking to PR #2145 review comment.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#2180
File: .tekton/odh-base-image-cuda-py312-ubi9-pull-request.yaml:36-39
Timestamp: 2025-08-28T12:42:23.404Z
Learning: Konflux build system in opendatahub-io/notebooks repository requires "linux/x86_64" platform identifier format for Tekton PipelineRun build-platforms parameter, not the OCI-standard "linux/amd64". This is consistently used across all .tekton/ pipeline files and enforced by scripts/generate_pull_request_pipelineruns.py type definitions. Docker/buildah contexts use "linux/amd64" while Konflux/Tekton contexts use "linux/x86_64".
📚 Learning: 2025-08-28T12:42:23.404Z
Learnt from: jiridanek
PR: opendatahub-io/notebooks#2180
File: .tekton/odh-base-image-cuda-py312-ubi9-pull-request.yaml:36-39
Timestamp: 2025-08-28T12:42:23.404Z
Learning: Konflux build system in opendatahub-io/notebooks repository requires "linux/x86_64" platform identifier format for Tekton PipelineRun build-platforms parameter, not the OCI-standard "linux/amd64". This is consistently used across all .tekton/ pipeline files and enforced by scripts/generate_pull_request_pipelineruns.py type definitions. Docker/buildah contexts use "linux/amd64" while Konflux/Tekton contexts use "linux/x86_64".

Applied to files:

  • .tekton/odh-workbench-jupyter-trustyai-cpu-py312-ubi9-pull-request.yaml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Red Hat Konflux / odh-workbench-jupyter-trustyai-cpu-py312-ubi9-on-pull-request
  • GitHub Check: build (jupyter-trustyai-ubi9-python-3.12, 3.12, linux/amd64, false) / build

Comment on lines 38 to 40
- linux/x86_64
- linux-m2xlarge/arm64
- linux/ppc64le
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix invalid build platform identifier.

The Konflux build-platforms parameter must use the linux/<arch> form (e.g., linux/x86_64, linux/ppc64le). Adding linux-m2xlarge/arm64 breaks that contract, so the pipeline will reject or ignore the ARM build instead of scheduling it. Update the new entry to the supported identifier (linux/arm64 / linux/aarch64, whichever matches the generator defaults) before merging.

🤖 Prompt for AI Agents
In .tekton/odh-workbench-jupyter-trustyai-cpu-py312-ubi9-pull-request.yaml
around lines 38 to 40, the build-platforms list contains an invalid identifier
"linux-m2xlarge/arm64"; replace that entry with the supported form "linux/arm64"
(or "linux/aarch64" if your generator expects that) so all entries use the
required "linux/<arch>" format and the ARM build is scheduled correctly.

Copy link
Contributor

openshift-ci bot commented Oct 7, 2025

@grdryn: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/notebooks-py312-ubi9-e2e-tests dfc41a4 link true /test notebooks-py312-ubi9-e2e-tests
ci/prow/notebook-jupyter-tai-ubi9-python-3-12-pr-image-mirror a0cb14c link true /test notebook-jupyter-tai-ubi9-python-3-12-pr-image-mirror
ci/prow/images a0cb14c link true /test images

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Copy link
Member

@atheo89 atheo89 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Copy link
Contributor

openshift-ci bot commented Oct 7, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: atheo89

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved label Oct 7, 2025
@atheo89 atheo89 merged commit 23f9dcf into opendatahub-io:main Oct 7, 2025
11 of 18 checks passed
@openshift-ci openshift-ci bot added size/m and removed size/m labels Oct 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved lgtm review-requested GitHub Bot creates notification on #pr-review-ai-ide-team slack channel size/m

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants