Update RHOAI manifests to include universal workbench image configura… by kramaranya · Pull Request #26 · opendatahub-io/trainer

kramaranya · 2025-11-21T00:01:51Z

Adding Universal Image to Workbench using Image Stream
RHOAIENG-34069

What this PR does / why we need it:

Which issue(s) this PR fixes (optional, in Fixes #<issue number>, #<issue number>, ... format, will close the issue(s) when PR gets merged):
Fixes #

Checklist:

Docs included if any changes are user facing

Summary by CodeRabbit

Release Notes

New Features
- Added three new training environment options supporting CPU, CUDA, and ROCm GPU acceleration for flexible hardware deployment
- Environments include updated software stacks with PyTorch, Kubeflow SDK, Jupyter, and essential ML libraries for training and model development workflows
Chores
- Updated infrastructure configuration for training environment management

_{✏️ Tip: You can customize this high-level summary in your review settings.}

…tion Signed-off-by: kramaranya <kramaranya15@gmail.com>

coderabbitai · 2025-11-21T00:01:58Z

Walkthrough

Introduces three new OpenShift ImageStream manifests for training hub workbench images (CPU, CUDA, and ROCm variants), each versioned at 2025.1 with bundled software versions and Python dependencies. Updates kustomization configuration to include replacement mappings from ConfigMap values to ImageStream resources and registers the new manifest files.

Changes

Cohort / File(s)	Summary
Kustomization Configuration `manifests/rhoai/kustomization.yaml`, `manifests/rhoai/params.env`	Extends kustomization with three new image replacement mappings (ConfigMap keys to ImageStream tag references) and registers three new ImageStream resource files; adds three environment variable definitions for CUDA, ROCm, and CPU image references.
ImageStream Definitions `manifests/rhoai/training-hub-universal-cuda-imagestream.yaml`, `manifests/rhoai/training-hub-universal-rocm-imagestream.yaml`, `manifests/rhoai/training-hub-universal-cpu-imagestream.yaml`	Three new OpenShift ImageStream resources with identical structure; each defines metadata, local lookup policy, and a 2025.1 tag containing notebook software (Python, PyTorch, Kubeflow SDK, Training Hub), Python dependencies, and a reference to a corresponding DockerImage.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Image reference alignment: Verify that environment variable names in params.env match the from.name references in each ImageStream file.
Kustomization replacement paths: Confirm fieldPath spec.tags.0.from.name correctly targets the tag reference in each ImageStream.
Annotation consistency: Check that notebook software versions and Python dependency lists are consistent across variants (or intentionally diverge as appropriate).

Poem

🐰 Three images hop into place,
CUDA, ROCm, CPU—a well-paced race.
Tags tagged at 2025.1 bright,
Training hub shines with workbench light! ✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main change: adding universal workbench image configurations (ImageStreams and parameters) to RHOAI manifests.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Signed-off-by: kramaranya <kramaranya15@gmail.com>

MStokluska · 2025-11-21T19:03:30Z

manifests/rhoai/training-hub-universal-cuda-imagestream.yaml

+    opendatahub.io/notebook-image: "true"
+  annotations:
+    opendatahub.io/notebook-image-url: "https://github.com/opendatahub-io/trainer"
+    opendatahub.io/notebook-image-name: "Training Hub Universal (CUDA, Python 3.12)"


I would remove the version from here and include torch.
Reason why version IMO should be removed is because we can have a single imageStream with multiple version (imageStreamTags) and these versions can have different python version over time.
Unless, we can agree to keep this versioned drop it once new version (with updated python version is out)

So basically an option between core image > versions of the core image vs potentially multiple core image with different versions.

Agree, updated

MStokluska · 2025-11-21T19:08:57Z

manifests/rhoai/training-hub-universal-cuda-imagestream.yaml

+      annotations:
+        opendatahub.io/notebook-software: |
+          [
+            {"name": "CUDA", "version": "12.6"},


Suggested change

{"name": "CUDA", "version": "12.6"},

{"name": "CUDA", "version": "12.8"},

MStokluska · 2025-11-21T19:10:27Z

manifests/rhoai/training-hub-universal-cuda-imagestream.yaml

+          [
+            {"name": "CUDA", "version": "12.6"},
+            {"name": "Python", "version": "v3.12"},
+            {"name": "Training Hub", "version": "v0.3.0"}


I think PyTorch 2.8.0 should also be included

MStokluska · 2025-11-21T19:11:08Z

manifests/rhoai/training-hub-universal-cuda-imagestream.yaml

+  lookupPolicy:
+    local: true
+  tags:
+    - name: latest


I think this is fine to be latest for WIP but might be good to set it to actual tag value before merge, once we agree what that should be (2026.1 ? )

I used 2025.1 since it's gonna be merged before 2026, right?

MStokluska · 2025-11-21T19:21:35Z

manifests/rhoai/params.env

@@ -1 +1,2 @@
 odh-kubeflow-trainer-controller-image=quay.io/opendatahub/trainer:v2.1.0
+odh-kubeflow-trainer-universal-workbench-image=quay.io/opendatahub/odh-training-th03-cuda128-torch28-py312-rhel9:latest


@sutaakar just to clarify - will this value be essentially coming from related images at some point?

all productized images used by Trainer should and will be listed here

MStokluska · 2025-11-21T19:22:59Z

manifests/rhoai/training-hub-universal-cuda-imagestream.yaml

+          [
+            {"name": "JupyterLab", "version": "4.4"}
+          ]
+        openshift.io/imported-from: quay.io/opendatahub/odh-training-th03-cuda128-torch28-py312-rhel9


I wonder if this should be dynamically fetched? I guess at some point this will be coming from registry (??)

Notebooks repo keeps it static, but we can make it dynamic after images move to the final registry

Signed-off-by: kramaranya <kramaranya15@gmail.com>

briangallagher · 2025-12-03T09:48:36Z

manifests/rhoai/params.env

 odh-kubeflow-trainer-controller-image=quay.io/opendatahub/trainer:v2.1.0
+odh-kubeflow-trainer-universal-workbench-image-cuda=quay.io/opendatahub/odh-training-th03-cuda128-torch28-py312-rhel9:latest
+odh-kubeflow-trainer-universal-workbench-image-rocm=quay.io/mstoklus/workbench-images:py312-rocm64-torch280-3
+odh-kubeflow-trainer-universal-workbench-image-cpu=quay.io/opendatahub/odh-training-th03-cuda128-torch28-py312-rhel9:latest


@kramaranya I guess this is just temp until we have a CPU image?

MStokluska · 2025-12-03T10:04:25Z

manifests/rhoai/params.env

@@ -1 +1,4 @@
 odh-kubeflow-trainer-controller-image=quay.io/opendatahub/trainer:v2.1.0
+odh-kubeflow-trainer-universal-workbench-image-cuda=quay.io/opendatahub/odh-training-th03-cuda128-torch28-py312-rhel9:latest


hey @kramaranya
just something to collaborate on / think about...
So with controller image - it makes perfect sense that there's a single param - since only one operator image will run at any given time.
With imageStreams we should think of it as:

each image stream can have multiple tags

each tag is essentially a different image

if in version X our image streams will have a single image that's fine but if version x+n introduces another set of images, how are we going to do the mapping between params and the tags?

My point is - should these params be versioned ?

Yeah, we could have one param per version in params.env. For example:

odh-kubeflow-trainer-universal-workbench-image-cuda-2025-1=quay.io/opendatahub/odh-training-th03-cuda128-torch28-py312-rhel9:aaa odh-kubeflow-trainer-universal-workbench-image-cuda-2025-2=quay.io/opendatahub/odh-training-th03-cuda128-torch28-py312-rhel9:bbb

wdyt @MStokluska @briangallagher

2025-1 - using these is IMO a good idea @kramaranya 👍
Just to add it it, so lets say we have 2025-1 with cuda 2.8 and py312; and 2025-2 with cuda 2.9 and py312.
If 2025-1 hits a CVE unrelated to core packages we will need to address it and I guess at this point - we will overwrite 2025-1 right?

Yes, exactly

kramaranya · 2025-12-03T12:41:11Z

Here's how this will look while creating a workbench:

Signed-off-by: kramaranya <kramaranya15@gmail.com>

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (3)

manifests/rhoai/params.env (1)

2-4: Params wiring looks good; consider pinning tags and confirming final registries

The new per-version params line up with the kustomize replacements, so the wiring to the ImageStreams looks correct. For release though, you may want to avoid :latest / :latest-cpu and point to explicit tags (or digests), and double‑check that the ROCm image under quay.io/mstoklus/workbench-images is the intended productized location vs a temporary namespace.
manifests/rhoai/training-hub-universal-cpu-imagestream.yaml (1)
16-41: Align openshift.io/imported-from with the actual CPU image

from.name is correctly parameterized, but the openshift.io/imported-from annotation still points at the generic CUDA image name without the CPU tag, which can be misleading when inspecting the ImageStream. Consider matching the real CPU image location (e.g. the value behind odh-kubeflow-trainer-universal-workbench-image-cpu-2025-1) so the metadata stays truthful.
-        openshift.io/imported-from: quay.io/opendatahub/odh-training-th03-cuda128-torch28-py312-rhel9
+        openshift.io/imported-from: quay.io/opendatahub/odh-training-th03-cuda128-torch28-py312-rhel9:latest-cpu
(Adjust the tag if you use something other than latest-cpu.)
manifests/rhoai/training-hub-universal-rocm-imagestream.yaml (1)

17-36: Optional: include Training Hub in the ROCm software metadata for consistency

This ImageStream looks well-formed and matches the CUDA/CPU structure; one small nit is that opendatahub.io/notebook-software here omits the Training Hub entry that you include on the CUDA/CPU variants. If the ROCm image also ships Training Hub (same or similar version), consider adding it so the UI metadata stays consistent across runtimes.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9212662 and 54f0089.

📒 Files selected for processing (5)

manifests/rhoai/kustomization.yaml (2 hunks)
manifests/rhoai/params.env (1 hunks)
manifests/rhoai/training-hub-universal-cpu-imagestream.yaml (1 hunks)
manifests/rhoai/training-hub-universal-cuda-imagestream.yaml (1 hunks)
manifests/rhoai/training-hub-universal-rocm-imagestream.yaml (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: pre-commit

🔇 Additional comments (2)

manifests/rhoai/kustomization.yaml (1)

29-67: Kustomize wiring between params and ImageStreams looks correct

The new replacements and resources cleanly hook the versioned params into the three ImageStreams (spec.tags.0.from.name), and the selectors/fieldPaths all line up with the new manifests. I don’t see any structural or scoping issues here.

Also applies to: 83-85

manifests/rhoai/training-hub-universal-cuda-imagestream.yaml (1)

16-47: CUDA ImageStream manifest is structurally sound and aligned with the other runtimes

The CUDA ImageStream follows the expected ODH notebook pattern (labels, annotations, tag structure, referencePolicy, recommended accelerators), and its from.name placeholder lines up with the new params/replacements; I don’t see any issues in this definition.

kramaranya · 2025-12-16T09:27:16Z

/hold

Update RHOAI manifests to include universal workbench image configura…

d8ecc0a

…tion Signed-off-by: kramaranya <kramaranya15@gmail.com>

kramaranya changed the title ~~WIP: Update RHOAI manifests to include universal workbench image configura…~~ Update RHOAI manifests to include universal workbench image configura… Nov 21, 2025

Update quay repo

f8b7c92

Signed-off-by: kramaranya <kramaranya15@gmail.com>

MStokluska reviewed Nov 21, 2025

View reviewed changes

Add ROCm and CPU ImageStreams

e657541

Signed-off-by: kramaranya <kramaranya15@gmail.com>

briangallagher reviewed Dec 3, 2025

View reviewed changes

MStokluska reviewed Dec 3, 2025

View reviewed changes

kramaranya added 2 commits December 3, 2025 13:45

Fix syntax error in CUDA Image Stream

9bbceeb

Signed-off-by: kramaranya <kramaranya15@gmail.com>

manifests: refactor params.env to handle multiple versions

54f0089

Signed-off-by: kramaranya <kramaranya15@gmail.com>

coderabbitai bot reviewed Dec 5, 2025

View reviewed changes

	{"name": "CUDA", "version": "12.6"},
	{"name": "CUDA", "version": "12.8"},

		@@ -1 +1,2 @@
		odh-kubeflow-trainer-controller-image=quay.io/opendatahub/trainer:v2.1.0
		odh-kubeflow-trainer-universal-workbench-image=quay.io/opendatahub/odh-training-th03-cuda128-torch28-py312-rhel9:latest

		@@ -1 +1,4 @@
		odh-kubeflow-trainer-controller-image=quay.io/opendatahub/trainer:v2.1.0
		odh-kubeflow-trainer-universal-workbench-image-cuda=quay.io/opendatahub/odh-training-th03-cuda128-torch28-py312-rhel9:latest

Conversation

kramaranya commented Nov 21, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kramaranya commented Dec 3, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

kramaranya commented Dec 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

kramaranya commented Nov 21, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 21, 2025 •

edited

Loading