Update RHOAI manifests to include universal workbench image configura…#26
Update RHOAI manifests to include universal workbench image configura…#26kramaranya wants to merge 5 commits intoopendatahub-io:mainfrom
Conversation
…tion Signed-off-by: kramaranya <kramaranya15@gmail.com>
WalkthroughIntroduces three new OpenShift ImageStream manifests for training hub workbench images (CPU, CUDA, and ROCm variants), each versioned at 2025.1 with bundled software versions and Python dependencies. Updates kustomization configuration to include replacement mappings from ConfigMap values to ImageStream resources and registers the new manifest files. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Signed-off-by: kramaranya <kramaranya15@gmail.com>
| opendatahub.io/notebook-image: "true" | ||
| annotations: | ||
| opendatahub.io/notebook-image-url: "https://github.com/opendatahub-io/trainer" | ||
| opendatahub.io/notebook-image-name: "Training Hub Universal (CUDA, Python 3.12)" |
There was a problem hiding this comment.
I would remove the version from here and include torch.
Reason why version IMO should be removed is because we can have a single imageStream with multiple version (imageStreamTags) and these versions can have different python version over time.
Unless, we can agree to keep this versioned drop it once new version (with updated python version is out)
So basically an option between core image > versions of the core image vs potentially multiple core image with different versions.
| annotations: | ||
| opendatahub.io/notebook-software: | | ||
| [ | ||
| {"name": "CUDA", "version": "12.6"}, |
There was a problem hiding this comment.
| {"name": "CUDA", "version": "12.6"}, | |
| {"name": "CUDA", "version": "12.8"}, |
| [ | ||
| {"name": "CUDA", "version": "12.6"}, | ||
| {"name": "Python", "version": "v3.12"}, | ||
| {"name": "Training Hub", "version": "v0.3.0"} |
There was a problem hiding this comment.
I think PyTorch 2.8.0 should also be included
| lookupPolicy: | ||
| local: true | ||
| tags: | ||
| - name: latest |
There was a problem hiding this comment.
I think this is fine to be latest for WIP but might be good to set it to actual tag value before merge, once we agree what that should be (2026.1 ? )
There was a problem hiding this comment.
I used 2025.1 since it's gonna be merged before 2026, right?
manifests/rhoai/params.env
Outdated
| @@ -1 +1,2 @@ | |||
| odh-kubeflow-trainer-controller-image=quay.io/opendatahub/trainer:v2.1.0 | |||
| odh-kubeflow-trainer-universal-workbench-image=quay.io/opendatahub/odh-training-th03-cuda128-torch28-py312-rhel9:latest | |||
There was a problem hiding this comment.
@sutaakar just to clarify - will this value be essentially coming from related images at some point?
There was a problem hiding this comment.
all productized images used by Trainer should and will be listed here
| [ | ||
| {"name": "JupyterLab", "version": "4.4"} | ||
| ] | ||
| openshift.io/imported-from: quay.io/opendatahub/odh-training-th03-cuda128-torch28-py312-rhel9 |
There was a problem hiding this comment.
I wonder if this should be dynamically fetched? I guess at some point this will be coming from registry (??)
There was a problem hiding this comment.
Notebooks repo keeps it static, but we can make it dynamic after images move to the final registry
Signed-off-by: kramaranya <kramaranya15@gmail.com>
manifests/rhoai/params.env
Outdated
| odh-kubeflow-trainer-controller-image=quay.io/opendatahub/trainer:v2.1.0 | ||
| odh-kubeflow-trainer-universal-workbench-image-cuda=quay.io/opendatahub/odh-training-th03-cuda128-torch28-py312-rhel9:latest | ||
| odh-kubeflow-trainer-universal-workbench-image-rocm=quay.io/mstoklus/workbench-images:py312-rocm64-torch280-3 | ||
| odh-kubeflow-trainer-universal-workbench-image-cpu=quay.io/opendatahub/odh-training-th03-cuda128-torch28-py312-rhel9:latest |
There was a problem hiding this comment.
@kramaranya I guess this is just temp until we have a CPU image?
manifests/rhoai/params.env
Outdated
| @@ -1 +1,4 @@ | |||
| odh-kubeflow-trainer-controller-image=quay.io/opendatahub/trainer:v2.1.0 | |||
| odh-kubeflow-trainer-universal-workbench-image-cuda=quay.io/opendatahub/odh-training-th03-cuda128-torch28-py312-rhel9:latest | |||
There was a problem hiding this comment.
hey @kramaranya
just something to collaborate on / think about...
So with controller image - it makes perfect sense that there's a single param - since only one operator image will run at any given time.
With imageStreams we should think of it as:
- each image stream can have multiple tags
- each tag is essentially a different image
- if in version X our image streams will have a single image that's fine but if version x+n introduces another set of images, how are we going to do the mapping between params and the tags?
My point is - should these params be versioned ?
There was a problem hiding this comment.
Yeah, we could have one param per version in params.env. For example:
odh-kubeflow-trainer-universal-workbench-image-cuda-2025-1=quay.io/opendatahub/odh-training-th03-cuda128-torch28-py312-rhel9:aaa
odh-kubeflow-trainer-universal-workbench-image-cuda-2025-2=quay.io/opendatahub/odh-training-th03-cuda128-torch28-py312-rhel9:bbb
There was a problem hiding this comment.
2025-1 - using these is IMO a good idea @kramaranya 👍
Just to add it it, so lets say we have 2025-1 with cuda 2.8 and py312; and 2025-2 with cuda 2.9 and py312.
If 2025-1 hits a CVE unrelated to core packages we will need to address it and I guess at this point - we will overwrite 2025-1 right?
Signed-off-by: kramaranya <kramaranya15@gmail.com>
Signed-off-by: kramaranya <kramaranya15@gmail.com>
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (3)
manifests/rhoai/params.env (1)
2-4: Params wiring looks good; consider pinning tags and confirming final registriesThe new per-version params line up with the kustomize replacements, so the wiring to the ImageStreams looks correct. For release though, you may want to avoid
:latest/:latest-cpuand point to explicit tags (or digests), and double‑check that the ROCm image underquay.io/mstoklus/workbench-imagesis the intended productized location vs a temporary namespace.manifests/rhoai/training-hub-universal-cpu-imagestream.yaml (1)
16-41: Alignopenshift.io/imported-fromwith the actual CPU image
from.nameis correctly parameterized, but theopenshift.io/imported-fromannotation still points at the generic CUDA image name without the CPU tag, which can be misleading when inspecting the ImageStream. Consider matching the real CPU image location (e.g. the value behindodh-kubeflow-trainer-universal-workbench-image-cpu-2025-1) so the metadata stays truthful.- openshift.io/imported-from: quay.io/opendatahub/odh-training-th03-cuda128-torch28-py312-rhel9 + openshift.io/imported-from: quay.io/opendatahub/odh-training-th03-cuda128-torch28-py312-rhel9:latest-cpu(Adjust the tag if you use something other than
latest-cpu.)manifests/rhoai/training-hub-universal-rocm-imagestream.yaml (1)
17-36: Optional: include Training Hub in the ROCm software metadata for consistencyThis ImageStream looks well-formed and matches the CUDA/CPU structure; one small nit is that
opendatahub.io/notebook-softwarehere omits theTraining Hubentry that you include on the CUDA/CPU variants. If the ROCm image also ships Training Hub (same or similar version), consider adding it so the UI metadata stays consistent across runtimes.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
manifests/rhoai/kustomization.yaml(2 hunks)manifests/rhoai/params.env(1 hunks)manifests/rhoai/training-hub-universal-cpu-imagestream.yaml(1 hunks)manifests/rhoai/training-hub-universal-cuda-imagestream.yaml(1 hunks)manifests/rhoai/training-hub-universal-rocm-imagestream.yaml(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: pre-commit
🔇 Additional comments (2)
manifests/rhoai/kustomization.yaml (1)
29-67: Kustomize wiring between params and ImageStreams looks correctThe new replacements and resources cleanly hook the versioned params into the three ImageStreams (
spec.tags.0.from.name), and the selectors/fieldPaths all line up with the new manifests. I don’t see any structural or scoping issues here.Also applies to: 83-85
manifests/rhoai/training-hub-universal-cuda-imagestream.yaml (1)
16-47: CUDA ImageStream manifest is structurally sound and aligned with the other runtimesThe CUDA ImageStream follows the expected ODH notebook pattern (labels, annotations, tag structure, referencePolicy, recommended accelerators), and its
from.nameplaceholder lines up with the new params/replacements; I don’t see any issues in this definition.
|
/hold |


Adding Universal Image to Workbench using Image Stream
RHOAIENG-34069
What this PR does / why we need it:
Which issue(s) this PR fixes (optional, in
Fixes #<issue number>, #<issue number>, ...format, will close the issue(s) when PR gets merged):Fixes #
Checklist:
Summary by CodeRabbit
Release Notes
New Features
Chores
✏️ Tip: You can customize this high-level summary in your review settings.