Skip to content

Update RHOAI manifests to include universal workbench image configura…#26

Open
kramaranya wants to merge 5 commits intoopendatahub-io:mainfrom
kramaranya:imagestream
Open

Update RHOAI manifests to include universal workbench image configura…#26
kramaranya wants to merge 5 commits intoopendatahub-io:mainfrom
kramaranya:imagestream

Conversation

@kramaranya
Copy link

@kramaranya kramaranya commented Nov 21, 2025

Adding Universal Image to Workbench using Image Stream
RHOAIENG-34069

What this PR does / why we need it:

Which issue(s) this PR fixes (optional, in Fixes #<issue number>, #<issue number>, ... format, will close the issue(s) when PR gets merged):
Fixes #

Checklist:

  • Docs included if any changes are user facing

Summary by CodeRabbit

Release Notes

  • New Features

    • Added three new training environment options supporting CPU, CUDA, and ROCm GPU acceleration for flexible hardware deployment
    • Environments include updated software stacks with PyTorch, Kubeflow SDK, Jupyter, and essential ML libraries for training and model development workflows
  • Chores

    • Updated infrastructure configuration for training environment management

✏️ Tip: You can customize this high-level summary in your review settings.

…tion

Signed-off-by: kramaranya <kramaranya15@gmail.com>
@coderabbitai
Copy link

coderabbitai bot commented Nov 21, 2025

Walkthrough

Introduces three new OpenShift ImageStream manifests for training hub workbench images (CPU, CUDA, and ROCm variants), each versioned at 2025.1 with bundled software versions and Python dependencies. Updates kustomization configuration to include replacement mappings from ConfigMap values to ImageStream resources and registers the new manifest files.

Changes

Cohort / File(s) Summary
Kustomization Configuration
manifests/rhoai/kustomization.yaml, manifests/rhoai/params.env
Extends kustomization with three new image replacement mappings (ConfigMap keys to ImageStream tag references) and registers three new ImageStream resource files; adds three environment variable definitions for CUDA, ROCm, and CPU image references.
ImageStream Definitions
manifests/rhoai/training-hub-universal-cuda-imagestream.yaml, manifests/rhoai/training-hub-universal-rocm-imagestream.yaml, manifests/rhoai/training-hub-universal-cpu-imagestream.yaml
Three new OpenShift ImageStream resources with identical structure; each defines metadata, local lookup policy, and a 2025.1 tag containing notebook software (Python, PyTorch, Kubeflow SDK, Training Hub), Python dependencies, and a reference to a corresponding DockerImage.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

  • Image reference alignment: Verify that environment variable names in params.env match the from.name references in each ImageStream file.
  • Kustomization replacement paths: Confirm fieldPath spec.tags.0.from.name correctly targets the tag reference in each ImageStream.
  • Annotation consistency: Check that notebook software versions and Python dependency lists are consistent across variants (or intentionally diverge as appropriate).

Poem

🐰 Three images hop into place,
CUDA, ROCm, CPU—a well-paced race.
Tags tagged at 2025.1 bright,
Training hub shines with workbench light! ✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: adding universal workbench image configurations (ImageStreams and parameters) to RHOAI manifests.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@kramaranya kramaranya changed the title WIP: Update RHOAI manifests to include universal workbench image configura… Update RHOAI manifests to include universal workbench image configura… Nov 21, 2025
Signed-off-by: kramaranya <kramaranya15@gmail.com>
opendatahub.io/notebook-image: "true"
annotations:
opendatahub.io/notebook-image-url: "https://github.com/opendatahub-io/trainer"
opendatahub.io/notebook-image-name: "Training Hub Universal (CUDA, Python 3.12)"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove the version from here and include torch.
Reason why version IMO should be removed is because we can have a single imageStream with multiple version (imageStreamTags) and these versions can have different python version over time.
Unless, we can agree to keep this versioned drop it once new version (with updated python version is out)

So basically an option between core image > versions of the core image vs potentially multiple core image with different versions.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, updated

annotations:
opendatahub.io/notebook-software: |
[
{"name": "CUDA", "version": "12.6"},

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
{"name": "CUDA", "version": "12.6"},
{"name": "CUDA", "version": "12.8"},

[
{"name": "CUDA", "version": "12.6"},
{"name": "Python", "version": "v3.12"},
{"name": "Training Hub", "version": "v0.3.0"}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think PyTorch 2.8.0 should also be included

lookupPolicy:
local: true
tags:
- name: latest

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine to be latest for WIP but might be good to set it to actual tag value before merge, once we agree what that should be (2026.1 ? )

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used 2025.1 since it's gonna be merged before 2026, right?

@@ -1 +1,2 @@
odh-kubeflow-trainer-controller-image=quay.io/opendatahub/trainer:v2.1.0
odh-kubeflow-trainer-universal-workbench-image=quay.io/opendatahub/odh-training-th03-cuda128-torch28-py312-rhel9:latest

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sutaakar just to clarify - will this value be essentially coming from related images at some point?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all productized images used by Trainer should and will be listed here

[
{"name": "JupyterLab", "version": "4.4"}
]
openshift.io/imported-from: quay.io/opendatahub/odh-training-th03-cuda128-torch28-py312-rhel9

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this should be dynamically fetched? I guess at some point this will be coming from registry (??)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notebooks repo keeps it static, but we can make it dynamic after images move to the final registry

Signed-off-by: kramaranya <kramaranya15@gmail.com>
odh-kubeflow-trainer-controller-image=quay.io/opendatahub/trainer:v2.1.0
odh-kubeflow-trainer-universal-workbench-image-cuda=quay.io/opendatahub/odh-training-th03-cuda128-torch28-py312-rhel9:latest
odh-kubeflow-trainer-universal-workbench-image-rocm=quay.io/mstoklus/workbench-images:py312-rocm64-torch280-3
odh-kubeflow-trainer-universal-workbench-image-cpu=quay.io/opendatahub/odh-training-th03-cuda128-torch28-py312-rhel9:latest

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kramaranya I guess this is just temp until we have a CPU image?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep!

@@ -1 +1,4 @@
odh-kubeflow-trainer-controller-image=quay.io/opendatahub/trainer:v2.1.0
odh-kubeflow-trainer-universal-workbench-image-cuda=quay.io/opendatahub/odh-training-th03-cuda128-torch28-py312-rhel9:latest

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey @kramaranya
just something to collaborate on / think about...
So with controller image - it makes perfect sense that there's a single param - since only one operator image will run at any given time.
With imageStreams we should think of it as:

  • each image stream can have multiple tags
  • each tag is essentially a different image
  • if in version X our image streams will have a single image that's fine but if version x+n introduces another set of images, how are we going to do the mapping between params and the tags?

My point is - should these params be versioned ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we could have one param per version in params.env. For example:

odh-kubeflow-trainer-universal-workbench-image-cuda-2025-1=quay.io/opendatahub/odh-training-th03-cuda128-torch28-py312-rhel9:aaa
odh-kubeflow-trainer-universal-workbench-image-cuda-2025-2=quay.io/opendatahub/odh-training-th03-cuda128-torch28-py312-rhel9:bbb

wdyt @MStokluska @briangallagher

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2025-1 - using these is IMO a good idea @kramaranya 👍
Just to add it it, so lets say we have 2025-1 with cuda 2.8 and py312; and 2025-2 with cuda 2.9 and py312.
If 2025-1 hits a CVE unrelated to core packages we will need to address it and I guess at this point - we will overwrite 2025-1 right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, exactly

@kramaranya
Copy link
Author

Here's how this will look while creating a workbench:
Screenshot 2025-12-03 at 13 39 08
Screenshot 2025-12-03 at 13 39 24

Signed-off-by: kramaranya <kramaranya15@gmail.com>
Signed-off-by: kramaranya <kramaranya15@gmail.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
manifests/rhoai/params.env (1)

2-4: Params wiring looks good; consider pinning tags and confirming final registries

The new per-version params line up with the kustomize replacements, so the wiring to the ImageStreams looks correct. For release though, you may want to avoid :latest / :latest-cpu and point to explicit tags (or digests), and double‑check that the ROCm image under quay.io/mstoklus/workbench-images is the intended productized location vs a temporary namespace.

manifests/rhoai/training-hub-universal-cpu-imagestream.yaml (1)

16-41: Align openshift.io/imported-from with the actual CPU image

from.name is correctly parameterized, but the openshift.io/imported-from annotation still points at the generic CUDA image name without the CPU tag, which can be misleading when inspecting the ImageStream. Consider matching the real CPU image location (e.g. the value behind odh-kubeflow-trainer-universal-workbench-image-cpu-2025-1) so the metadata stays truthful.

-        openshift.io/imported-from: quay.io/opendatahub/odh-training-th03-cuda128-torch28-py312-rhel9
+        openshift.io/imported-from: quay.io/opendatahub/odh-training-th03-cuda128-torch28-py312-rhel9:latest-cpu

(Adjust the tag if you use something other than latest-cpu.)

manifests/rhoai/training-hub-universal-rocm-imagestream.yaml (1)

17-36: Optional: include Training Hub in the ROCm software metadata for consistency

This ImageStream looks well-formed and matches the CUDA/CPU structure; one small nit is that opendatahub.io/notebook-software here omits the Training Hub entry that you include on the CUDA/CPU variants. If the ROCm image also ships Training Hub (same or similar version), consider adding it so the UI metadata stays consistent across runtimes.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9212662 and 54f0089.

📒 Files selected for processing (5)
  • manifests/rhoai/kustomization.yaml (2 hunks)
  • manifests/rhoai/params.env (1 hunks)
  • manifests/rhoai/training-hub-universal-cpu-imagestream.yaml (1 hunks)
  • manifests/rhoai/training-hub-universal-cuda-imagestream.yaml (1 hunks)
  • manifests/rhoai/training-hub-universal-rocm-imagestream.yaml (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: pre-commit
🔇 Additional comments (2)
manifests/rhoai/kustomization.yaml (1)

29-67: Kustomize wiring between params and ImageStreams looks correct

The new replacements and resources cleanly hook the versioned params into the three ImageStreams (spec.tags.0.from.name), and the selectors/fieldPaths all line up with the new manifests. I don’t see any structural or scoping issues here.

Also applies to: 83-85

manifests/rhoai/training-hub-universal-cuda-imagestream.yaml (1)

16-47: CUDA ImageStream manifest is structurally sound and aligned with the other runtimes

The CUDA ImageStream follows the expected ODH notebook pattern (labels, annotations, tag structure, referencePolicy, recommended accelerators), and its from.name placeholder lines up with the new params/replacements; I don’t see any issues in this definition.

@kramaranya
Copy link
Author

/hold

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants