-
Notifications
You must be signed in to change notification settings - Fork 28
RHOAIENG-28660: Updates to T&V runtimes for IBM Z Triton #879
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
RHOAIENG-28660: Updates to T&V runtimes for IBM Z Triton #879
Conversation
""" WalkthroughThe documentation was updated to add support for the "IBM Z Accelerated for NVIDIA Triton Inference Server" as a tested and verified model serving runtime. This includes new prerequisite details, YAML configuration examples, updated runtime tables, deployment requirements, and resource links. Existing NVIDIA Triton references were generalized and reorganized. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes Note ⚡️ Unit Test Generation is now available in beta!Learn more here, or try it out under "Finishing Touches" below. ✨ Finishing Touches🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (1)
modules/adding-a-tested-and-verified-runtime-for-the-single-model-serving-platform.adoc (1)
32-33
: Step wording still assumes single API selection but offers two YAML blocks.Consider clarifying that only one of the two manifests (REST or gRPC) should be applied, otherwise duplicate
metadata.name
errors may occur.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
modules/adding-a-tested-and-verified-runtime-for-the-single-model-serving-platform.adoc
(4 hunks)modules/customizable-model-serving-runtime-parameters.adoc
(1 hunks)modules/ref-tested-verified-runtimes.adoc
(3 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: bredamc
PR: opendatahub-io/opendatahub-documentation#860
File: modules/using-llm-as-a-judge-metrics-with-lmeval.adoc:62-65
Timestamp: 2025-07-16T09:05:50.422Z
Learning: In the opendatahub-documentation project, bredamc prefers comprehensive error detection in code reviews. Continue to point out potential errors whether they are related to the code changes in the PR or not, but clearly distinguish between errors introduced in the PR versus errors in existing code.
modules/adding-a-tested-and-verified-runtime-for-the-single-model-serving-platform.adoc (1)
Learnt from: jiridanek
PR: #801
File: modules/running-a-pipeline-in-jupyterlab.adoc:8-8
Timestamp: 2025-06-17T12:18:47.675Z
Learning: PR #607 in opendatahub-io/kubeflow repository introduced automatic generation of ds-pipeline-config secret for Elyra runtime configuration when DSPA objects are present in the namespace, eliminating the need for manual runtime configuration creation in basic workbenches that have access to pipeline servers.
🔇 Additional comments (3)
modules/customizable-model-serving-runtime-parameters.adoc (1)
19-20
: Table entry added correctly – no blocking issues.The new IBM Z Triton runtime row is well-formed and the link resolves to the correct public docs.
No further action required.modules/adding-a-tested-and-verified-runtime-for-the-single-model-serving-platform.adoc (1)
55-56
: Placeholder image tag:xxx
– remind authors to pin a digest.Before publishing, replace
:xxx
with a concrete tag or preferably a digest (@sha256:…
) to ensure reproducible deployments.Also applies to: 114-115
modules/ref-tested-verified-runtimes.adoc (1)
26-27
: Good addition, but double-check terminology consistency.Elsewhere the runtime is referred to as “IBM Z Accelerated for NVIDIA Triton Inference Server”; ensure the same phrasing is used across all docs (currently consistent here). No functional issues detected.
Also applies to: 37-38
modules/adding-a-tested-and-verified-runtime-for-the-single-model-serving-platform.adoc
Outdated
Show resolved
Hide resolved
limits: | ||
cpu: "2" | ||
memory: 4Gi | ||
requests: | ||
cpu: "5" | ||
memory: 4Gi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
requests.cpu
greater than limits.cpu
– scheduling will fail.
Kubernetes requires requests <= limits
.
Swap the values (or raise limits) to avoid spec.containers[].resources.limits.cpu: Invalid value
.
- limits:
- cpu: "2"
+ limits:
+ cpu: "5"
requests:
- cpu: "5"
+ cpu: "2"
Apply to both REST (lines 65-70) and gRPC (124-129) variants.
Also applies to: 124-129
🤖 Prompt for AI Agents
In
modules/adding-a-tested-and-verified-runtime-for-the-single-model-serving-platform.adoc
at lines 65-70 and 124-129, the cpu resource requests value is greater than the
limits value, which causes Kubernetes scheduling to fail. Fix this by ensuring
that the cpu requests value is less than or equal to the cpu limits value,
either by swapping the values or increasing the limits to be at least as large
as the requests.
- emptyDir: null | ||
medium: Memory | ||
sizeLimit: 2Gi | ||
name: shm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Volume definition malformed – medium
and sizeLimit
must nest under emptyDir
.
Current structure:
- emptyDir: null
medium: Memory
sizeLimit: 2Gi
name: shm
Valid structure:
- emptyDir: null
- medium: Memory
- sizeLimit: 2Gi
+emptyDir:
+ medium: Memory
+ sizeLimit: 2Gi
Without this fix the manifest fails to parse.
🤖 Prompt for AI Agents
In
modules/adding-a-tested-and-verified-runtime-for-the-single-model-serving-platform.adoc
around lines 154 to 157, the volume definition is malformed because the keys
'medium' and 'sizeLimit' are not nested under 'emptyDir'. To fix this,
restructure the YAML so that 'emptyDir' is an object containing 'medium' and
'sizeLimit' keys, instead of setting 'emptyDir' to null and placing those keys
at the same level. This will ensure the manifest parses correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (2)
modules/adding-a-tested-and-verified-runtime-for-the-single-model-serving-platform.adoc (2)
64-69
:requests.cpu
exceedslimits.cpu
– Kubernetes will reject the manifest
spec.containers[].resources.requests.cpu
("5"
) must be ≤ the correspondinglimits.cpu
("2"
).
The same mis-ordering appears in both the REST and gRPC variants.- limits: - cpu: "2" + limits: + cpu: "5" requests: - cpu: "5" + cpu: "2"Apply this swap (or raise the limits) to both blocks to avoid a
requests must not exceed limits
validation error at admission time.Also applies to: 121-127
152-155
: Volume definition still malformed –medium
&sizeLimit
must nest underemptyDir
The YAML places
medium
andsizeLimit
alongsideemptyDir: null
, yielding an invalid object:- emptyDir: null medium: Memory sizeLimit: 2GiCorrect structure:
-- emptyDir: null - medium: Memory - sizeLimit: 2Gi +emptyDir: + medium: Memory + sizeLimit: 2GiThis change is required or the manifest will not parse.
🧹 Nitpick comments (1)
modules/adding-a-tested-and-verified-runtime-for-the-single-model-serving-platform.adoc (1)
54-55
: Placeholder tag:xxx
should be replaced with a concrete, published image reference
icr.io/ibmz/ibmz-accelerated-for-nvidia-triton-inference-server:xxx
is not a resolvable image and will causeImagePullBackOff
. Replacexxx
with an explicit version/tag or digest before publishing the documentation.Also applies to: 112-113
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
modules/adding-a-tested-and-verified-runtime-for-the-single-model-serving-platform.adoc
(4 hunks)
🧰 Additional context used
🧠 Learnings (1)
modules/adding-a-tested-and-verified-runtime-for-the-single-model-serving-platform.adoc (1)
Learnt from: jiridanek
PR: #801
File: modules/running-a-pipeline-in-jupyterlab.adoc:8-8
Timestamp: 2025-06-17T12:18:47.675Z
Learning: PR #607 in opendatahub-io/kubeflow repository introduced automatic generation of ds-pipeline-config secret for Elyra runtime configuration when DSPA objects are present in the namespace, eliminating the need for manual runtime configuration creation in basic workbenches that have access to pipeline servers.
Description
How Has This Been Tested?
Merge criteria:
Summary by CodeRabbit