Skip to content

Conversation

dibryant
Copy link
Contributor

@dibryant dibryant commented Jul 2, 2025

Fixes for https://issues.redhat.com/browse/RHOAIENG-27434

Description

Create Rocm Tensorflow Python 3.12

How Has This Been Tested?

$ make rocm-jupyter-tensorflow-ubi9-python-3.12 \
    -e RELEASE_PYTHON_VERSION=3.12 \
    -e IMAGE_REGISTRY="quay.io/$USER/workbench-images" \
    -e RELEASE="2025b" \
    -e CONTAINER_BUILD_CACHE_ARGS="" \
    -e PUSH_IMAGES="no"
$ export QUAY_IMAGE=$(podman images --format "{{.Repository}}:{{.Tag}}" | head -1)
$ podman run -it -p 8888:8888 $QUAY_IMAGE

Merge criteria:

  • The commits are squashed in a cohesive manner and have meaningful messages.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work

Summary by CodeRabbit

  • New Features

    • Added a ROCm-enabled TensorFlow Jupyter image for UBI9 with Python 3.12, Kubernetes manifests (Service/StatefulSet/kustomize), DB/MSSQL tooling, JupyterLab addons, PDF export and Elyra integration.
  • Chores

    • Added Pipfiles/requirements, pinned ROCm TensorFlow wheel and related packages, bumped multiple dependency versions, adjusted Makefile targets, and removed a problematic pip install constraint.
  • Tests

    • Added a notebook test suite validating Python/TensorFlow versions, tf2onnx conversion, MNIST training/evaluation, and TensorBoard logging.

Copy link
Contributor

coderabbitai bot commented Jul 2, 2025

Walkthrough

Adds a new ROCm-enabled multi-stage Jupyter TensorFlow image for UBI9/Python 3.12, with Pipfile/requirements updates, Kustomize manifests, a test notebook, and Makefile changes to enable building the new images.

Changes

Cohort / File(s) Summary of changes
ROCm TF Jupyter 3.12 (new image & tests)
jupyter/rocm/tensorflow/ubi9-python-3.12/Dockerfile.rocm, jupyter/rocm/tensorflow/ubi9-python-3.12/Pipfile, jupyter/rocm/tensorflow/ubi9-python-3.12/test/test_notebook.ipynb
New multi-stage ROCm Jupyter Dockerfile (mongocli-builder, base, rocm-base, rocm-jupyter-* stages), Pipfile with explicit tensorflow-rocm cp312 wheel and pinned deps, and a test notebook exercising TF/ONNX/TensorBoard and version checks.
Kustomize deployment manifests
jupyter/rocm/tensorflow/ubi9-python-3.12/kustomize/base/kustomization.yaml, .../service.yaml, .../statefulset.yaml
New kustomize base, ClusterIP Service (port 8888), and StatefulSet (single-replica notebook container, probes, NOTEBOOK_ARGS env, resource requests/limits) targeting the new image.
Makefile build orchestration
Makefile
Enables building rocm runtime/jupyter TensorFlow images for Python 3.12 (adds/uncomments BASE_DIRS and all-images entries) and fixes a comment typo.
ROCm TensorFlow runtime updates (3.12)
runtimes/rocm-tensorflow/ubi9-python-3.12/Pipfile, .../requirements.txt, runtimes/rocm-tensorflow/ubi9-python-3.12/Dockerfile.rocm
Pipfile: switch tensorflow-rocm to explicit 2.18.1 wheel URL and bump tensorboard; fix tf2onnx spacing. requirements.txt: multiple package version/hash updates. Dockerfile: remove --build-constraints option from uv pip install (comment added).
ROCm TensorFlow 3.11 formatting fixes
jupyter/rocm/tensorflow/ubi9-python-3.11/Pipfile, runtimes/rocm-tensorflow/ubi9-python-3.11/Pipfile
Minor formatting: remove space in tf2onnx "~=" version spec.
Datascience 3.12 dependency bumps
jupyter/datascience/ubi9-python-3.12/Pipfile, jupyter/datascience/ubi9-python-3.12/requirements.txt
Bump kubeflow-training 1.9.2 → 1.9.3 and update associated sha256 hashes (two occurrences).
Runtime deps pin updates
runtimes/rocm-tensorflow/ubi9-python-3.12/requirements.txt
Multiple package version upgrades and updated hashes (aiohttp, certifi, debugpy, distlib, fonttools, fsspec, h5py hashes expanded, jsonschema, minio, narwhals, pyarrow, virtualenv).

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

Possibly related PRs

Suggested labels

review-requested

Suggested reviewers

  • daniellutz

📜 Recent review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between c83cdda and 26b399f.

📒 Files selected for processing (1)
  • jupyter/rocm/tensorflow/ubi9-python-3.12/Dockerfile.rocm (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • jupyter/rocm/tensorflow/ubi9-python-3.12/Dockerfile.rocm
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: build (rocm-runtime-tensorflow-ubi9-python-3.12, 3.12, linux/amd64, false) / build
  • GitHub Check: build (jupyter-datascience-ubi9-python-3.12, 3.12, linux/amd64, false) / build
  • GitHub Check: build (rocm-jupyter-tensorflow-ubi9-python-3.11, 3.11, linux/amd64, false) / build
  • GitHub Check: build (rocm-runtime-tensorflow-ubi9-python-3.11, 3.11, linux/amd64, false) / build
  • GitHub Check: build (rocm-jupyter-tensorflow-ubi9-python-3.12, 3.12, linux/amd64, false) / build
  • GitHub Check: Red Hat Konflux / odh-pipeline-runtime-tensorflow-rocm-py311-ubi9-on-pull-request
  • GitHub Check: Red Hat Konflux / odh-pipeline-runtime-tensorflow-rocm-py312-ubi9-on-pull-request
  • GitHub Check: Red Hat Konflux / odh-workbench-jupyter-tensorflow-rocm-py311-ubi9-on-pull-request
✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@openshift-ci openshift-ci bot requested review from atheo89 and daniellutz July 2, 2025 14:14
@openshift-ci openshift-ci bot added size/xxl and removed size/xxl labels Jul 2, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

🧹 Nitpick comments (1)
jupyter/rocm/tensorflow/ubi9-python-3.12/test/test_notebook.ipynb (1)

48-53: tf2onnx conversion test needs more comprehensive validation.

The current test only checks that the conversion result is not None, but doesn't validate the actual ONNX model structure or functionality.

     def test_tf2onnx_conversion(self):
-        # Replace this with an actual TensorFlow model conversion using tf2onnx
         model = tf.keras.Sequential([tf.keras.layers.Dense(1, input_shape=(10,))])
-        onnx_model = tf2onnx.convert.from_keras(model)
+        onnx_model, _ = tf2onnx.convert.from_keras(model)
         
         self.assertTrue(onnx_model is not None)
+        # Verify ONNX model has expected structure
+        self.assertTrue(hasattr(onnx_model, 'graph'), "ONNX model should have a graph")
+        self.assertTrue(len(onnx_model.graph.node) > 0, "ONNX model should have nodes")
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bce04f9 and dc5976f.

⛔ Files ignored due to path filters (1)
  • jupyter/rocm/tensorflow/ubi9-python-3.12/Pipfile.lock is excluded by !**/*.lock
📒 Files selected for processing (6)
  • jupyter/rocm/tensorflow/ubi9-python-3.12/Dockerfile.rocm (1 hunks)
  • jupyter/rocm/tensorflow/ubi9-python-3.12/Pipfile (1 hunks)
  • jupyter/rocm/tensorflow/ubi9-python-3.12/kustomize/base/kustomization.yaml (1 hunks)
  • jupyter/rocm/tensorflow/ubi9-python-3.12/kustomize/base/service.yaml (1 hunks)
  • jupyter/rocm/tensorflow/ubi9-python-3.12/kustomize/base/statefulset.yaml (1 hunks)
  • jupyter/rocm/tensorflow/ubi9-python-3.12/test/test_notebook.ipynb (1 hunks)
🧰 Additional context used
🧠 Learnings (7)
📓 Common learnings
Learnt from: jiridanek
PR: opendatahub-io/notebooks#0
File: :0-0
Timestamp: 2025-06-16T11:32:09.203Z
Learning: In the opendatahub-io/notebooks repository, there is a known issue with missing `runtimes/rocm/pytorch/ubi9-python-3.11/kustomize/base/kustomization.yaml` file that causes rocm runtime tests to fail with "no such file or directory" error. This is tracked in JIRA RHOAIENG-22044 and was intended to be fixed in PR #1015.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#1230
File: jupyter/minimal/ubi9-python-3.12/Dockerfile.cpu:21-24
Timestamp: 2025-07-01T06:48:13.154Z
Learning: jiridanek creates comprehensive follow-up issues from review comments that expand scope appropriately, include clear acceptance criteria, proper backlinks, and structured implementation guidance. Issue #1241 demonstrates this by turning a specific oc client checksum concern into a thorough security enhancement plan covering all downloaded binaries across the Python 3.12 implementation.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#1247
File: .github/workflows/build-notebooks-TEMPLATE.yaml:50-53
Timestamp: 2025-07-01T14:36:52.852Z
Learning: In the opendatahub-io/notebooks repository, the test runner's Python version (configured in GitHub Actions UV setup) intentionally doesn't need to match the Python version of the container images being tested. jiridanek's team uses Python 3.12 for running tests while images may use different Python versions (like 3.11), and this approach works fine since the test code is separate from the application code running inside the containers.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#0
File: :0-0
Timestamp: 2025-06-20T11:51:59.716Z
Learning: This project follows the practice of associating PRs with Jira tickets from https://issues.redhat.com for traceability between requirements, release process, and product documentation. This is critical for enterprise software development compliance and cross-team coordination.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#1218
File: jupyter/trustyai/ubi9-python-3.11/Pipfile:49-49
Timestamp: 2025-06-28T14:15:41.168Z
Learning: TrustyAI's jupyter-bokeh was pinned to 3.0.5 due to compatibility requirements with TrustyAI's visualization components, but the actual deployed version in requirements.txt shows 3.0.7, indicating incremental testing. The upgrade to 4.0.5 in this PR represents the completion of a gradual migration strategy from the 3.x series after confirming compatibility with Bokeh 3.7.3.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#1151
File: jupyter/tensorflow/ubi9-python-3.12/kustomize/base/statefulset.yaml:11-17
Timestamp: 2025-07-01T06:50:37.115Z
Learning: jiridanek manages StatefulSet selector issues systematically across multiple images in opendatahub-io/notebooks. When the same configuration issue (empty spec.selector and template.metadata.labels) appears in different images like jupyter/minimal and jupyter/tensorflow, he tracks them under a single coordinated effort rather than creating duplicate issues for each affected image.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#1230
File: jupyter/pytorch/ubi9-python-3.12/kustomize/components/accelerator/pod-patch.yaml:11-22
Timestamp: 2025-06-30T14:36:53.890Z
Learning: The pod-patch.yaml file in jupyter/pytorch/ubi9-python-3.12/kustomize/components/accelerator/ is used only for running tests, not production deployments. This affects the risk assessment for resource management configurations like sizeLimit on emptyDir volumes.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#1230
File: jupyter/minimal/ubi9-python-3.12/Dockerfile.rocm:43-55
Timestamp: 2025-07-01T06:48:21.070Z
Learning: When security concerns are raised during PR reviews in opendatahub-io/notebooks, comprehensive follow-up issues are created (often by CodeRabbit) to track all related security enhancements with clear acceptance criteria and implementation guidance. This ensures security improvements are systematically addressed in dedicated efforts rather than blocking current deliverables.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#0
File: :0-0
Timestamp: 2025-07-01T10:41:56.419Z
Learning: jiridanek's team uses containerized dependency locking for cross-platform compatibility in opendatahub-io/notebooks. They run `pipenv lock` inside UBI9 containers with specific platform arguments (`--platform=linux/amd64 --python-version 3.12`) to avoid host OS dependency conflicts when generating Pipfile.lock files.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#0
File: :0-0
Timestamp: 2025-07-01T10:41:56.419Z
Learning: In the opendatahub-io/notebooks repository, TensorFlow packages with `extras = ["and-cuda"]` can cause build conflicts on macOS due to platform-specific CUDA packages. When the Dockerfile installs CUDA system-wide, removing the extras and letting TensorFlow find CUDA at runtime resolves these conflicts.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#1218
File: jupyter/trustyai/ubi9-python-3.11/Pipfile:49-49
Timestamp: 2025-06-28T14:15:41.168Z
Learning: The jupyter-bokeh package was previously pinned to version 3.0.5 in the TrustyAI notebook image due to compatibility requirements with TrustyAI components, as indicated by the comment "Should be pinned down to this version in order to be compatible with trustyai" that was removed in this update.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#1218
File: jupyter/datascience/ubi9-python-3.11/Pipfile:34-36
Timestamp: 2025-06-28T14:13:27.890Z
Learning: In the opendatahub-io/notebooks repository, the dependency pinning strategy follows a deliberate pattern: core `jupyterlab` package uses exact pinning (==) across all notebook images to ensure UI consistency, while JupyterLab extensions and all server components (jupyter-server, jupyter-server-proxy, jupyter-server-terminals) use compatible release (~=) pinning to allow automatic security updates and bug fixes while maintaining API compatibility.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#1197
File: runtimes/minimal/ubi9-python-3.11/requirements.txt:395-405
Timestamp: 2025-06-26T15:28:35.416Z
Learning: psutil version 7.x is compatible with UBI9, CentOS Stream 9, and RHEL 9 platforms in the opendatahub-io/notebooks repository. The upgrade from psutil 5.x to 7.x has been validated for these environments.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#1218
File: jupyter/trustyai/ubi9-python-3.11/Pipfile:49-49
Timestamp: 2025-06-28T14:21:09.429Z
Learning: TrustyAI 0.6.1 (latest version as of June 2025) has a hard dependency constraint on jupyter-bokeh~=3.0.5, preventing upgrades to jupyter-bokeh 4.x in notebook images that include TrustyAI. This requires either waiting for TrustyAI to update their dependency or excluding TrustyAI from jupyter-bokeh upgrades.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#1230
File: jupyter/pytorch/ubi9-python-3.12/kustomize/base/statefulset.yaml:54-60
Timestamp: 2025-06-30T14:43:08.138Z
Learning: Issue #1212 in opendatahub-io/notebooks demonstrates that missing securityContext configuration (allowPrivilegeEscalation, runAsNonRoot, seccompProfile) causes runtime pods to fail reaching ready state and timeout after 300s on OpenShift due to PodSecurity policy violations.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#0
File: :0-0
Timestamp: 2025-06-16T11:32:09.203Z
Learning: Runtime deployment tests in opendatahub-io/notebooks may show PodSecurity warnings about allowPrivilegeEscalation, capabilities, runAsNonRoot, and seccompProfile settings. These warnings occur on OpenShift but not on GitHub Actions because GitHub Actions uses upstream Kubernetes without SecurityContextConstraints (SCC).
jupyter/rocm/tensorflow/ubi9-python-3.12/kustomize/base/kustomization.yaml (5)
Learnt from: jiridanek
PR: opendatahub-io/notebooks#0
File: :0-0
Timestamp: 2025-06-16T11:32:09.203Z
Learning: In the opendatahub-io/notebooks repository, there is a known issue with missing `runtimes/rocm/pytorch/ubi9-python-3.11/kustomize/base/kustomization.yaml` file that causes rocm runtime tests to fail with "no such file or directory" error. This is tracked in JIRA RHOAIENG-22044 and was intended to be fixed in PR #1015.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#1230
File: jupyter/pytorch/ubi9-python-3.12/kustomize/components/accelerator/pod-patch.yaml:11-22
Timestamp: 2025-06-30T14:36:53.890Z
Learning: The pod-patch.yaml file in jupyter/pytorch/ubi9-python-3.12/kustomize/components/accelerator/ is used only for running tests, not production deployments. This affects the risk assessment for resource management configurations like sizeLimit on emptyDir volumes.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#1151
File: jupyter/tensorflow/ubi9-python-3.12/kustomize/base/statefulset.yaml:11-17
Timestamp: 2025-07-01T06:50:37.115Z
Learning: jiridanek manages StatefulSet selector issues systematically across multiple images in opendatahub-io/notebooks. When the same configuration issue (empty spec.selector and template.metadata.labels) appears in different images like jupyter/minimal and jupyter/tensorflow, he tracks them under a single coordinated effort rather than creating duplicate issues for each affected image.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#1218
File: jupyter/rocm/pytorch/ubi9-python-3.11/requirements.txt:987-989
Timestamp: 2025-06-28T15:06:03.344Z
Learning: In the opendatahub-io/notebooks repository, checks for broken links and missing files in kustomization manifests are already performed by `ci/kustomize.sh`, which is invoked from `.github/workflows/code-quality.yaml` (lines 112–116). No additional pytest is needed for this purpose.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#1151
File: jupyter/tensorflow/ubi9-python-3.12/kustomize/base/statefulset.yaml:11-17
Timestamp: 2025-07-01T06:50:37.115Z
Learning: StatefulSet selector issue with empty `spec.selector: {}` is a known systemic problem across multiple images in opendatahub-io/notebooks repository, tracked in issue #1236. This affects multiple StatefulSet manifests and is being addressed centrally rather than in individual PRs.
jupyter/rocm/tensorflow/ubi9-python-3.12/kustomize/base/service.yaml (3)
Learnt from: jiridanek
PR: opendatahub-io/notebooks#0
File: :0-0
Timestamp: 2025-06-16T11:32:09.203Z
Learning: In the opendatahub-io/notebooks repository, there is a known issue with missing `runtimes/rocm/pytorch/ubi9-python-3.11/kustomize/base/kustomization.yaml` file that causes rocm runtime tests to fail with "no such file or directory" error. This is tracked in JIRA RHOAIENG-22044 and was intended to be fixed in PR #1015.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#1151
File: jupyter/tensorflow/ubi9-python-3.12/kustomize/base/statefulset.yaml:11-17
Timestamp: 2025-07-01T06:50:37.115Z
Learning: jiridanek manages StatefulSet selector issues systematically across multiple images in opendatahub-io/notebooks. When the same configuration issue (empty spec.selector and template.metadata.labels) appears in different images like jupyter/minimal and jupyter/tensorflow, he tracks them under a single coordinated effort rather than creating duplicate issues for each affected image.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#1230
File: jupyter/pytorch/ubi9-python-3.12/kustomize/components/accelerator/pod-patch.yaml:11-22
Timestamp: 2025-06-30T14:36:53.890Z
Learning: The pod-patch.yaml file in jupyter/pytorch/ubi9-python-3.12/kustomize/components/accelerator/ is used only for running tests, not production deployments. This affects the risk assessment for resource management configurations like sizeLimit on emptyDir volumes.
jupyter/rocm/tensorflow/ubi9-python-3.12/kustomize/base/statefulset.yaml (7)
Learnt from: jiridanek
PR: opendatahub-io/notebooks#0
File: :0-0
Timestamp: 2025-07-01T07:11:44.903Z
Learning: Resource limits in StatefulSet manifests in opendatahub-io/notebooks are configured for testing purposes, not production deployments. This affects risk assessment when reviewing resource configurations like memory and CPU limits.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#1151
File: jupyter/tensorflow/ubi9-python-3.12/kustomize/base/statefulset.yaml:11-17
Timestamp: 2025-07-01T06:50:37.115Z
Learning: jiridanek manages StatefulSet selector issues systematically across multiple images in opendatahub-io/notebooks. When the same configuration issue (empty spec.selector and template.metadata.labels) appears in different images like jupyter/minimal and jupyter/tensorflow, he tracks them under a single coordinated effort rather than creating duplicate issues for each affected image.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#0
File: :0-0
Timestamp: 2025-06-16T11:32:09.203Z
Learning: In the opendatahub-io/notebooks repository, there is a known issue with missing `runtimes/rocm/pytorch/ubi9-python-3.11/kustomize/base/kustomization.yaml` file that causes rocm runtime tests to fail with "no such file or directory" error. This is tracked in JIRA RHOAIENG-22044 and was intended to be fixed in PR #1015.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#1151
File: jupyter/tensorflow/ubi9-python-3.12/kustomize/base/statefulset.yaml:11-17
Timestamp: 2025-07-01T06:50:37.115Z
Learning: StatefulSet selector issue with empty `spec.selector: {}` is a known systemic problem across multiple images in opendatahub-io/notebooks repository, tracked in issue #1236. This affects multiple StatefulSet manifests and is being addressed centrally rather than in individual PRs.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#1230
File: jupyter/pytorch/ubi9-python-3.12/kustomize/components/accelerator/pod-patch.yaml:11-22
Timestamp: 2025-06-30T14:36:53.890Z
Learning: The pod-patch.yaml file in jupyter/pytorch/ubi9-python-3.12/kustomize/components/accelerator/ is used only for running tests, not production deployments. This affects the risk assessment for resource management configurations like sizeLimit on emptyDir volumes.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#1230
File: jupyter/pytorch/ubi9-python-3.12/kustomize/base/statefulset.yaml:54-60
Timestamp: 2025-06-30T14:43:08.138Z
Learning: Issue #1212 in opendatahub-io/notebooks demonstrates that missing securityContext configuration (allowPrivilegeEscalation, runAsNonRoot, seccompProfile) causes runtime pods to fail reaching ready state and timeout after 300s on OpenShift due to PodSecurity policy violations.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#0
File: :0-0
Timestamp: 2025-06-16T11:32:09.203Z
Learning: Runtime deployment tests in opendatahub-io/notebooks may show PodSecurity warnings about allowPrivilegeEscalation, capabilities, runAsNonRoot, and seccompProfile settings. These warnings occur on OpenShift but not on GitHub Actions because GitHub Actions uses upstream Kubernetes without SecurityContextConstraints (SCC).
jupyter/rocm/tensorflow/ubi9-python-3.12/Dockerfile.rocm (9)
Learnt from: jiridanek
PR: opendatahub-io/notebooks#0
File: :0-0
Timestamp: 2025-06-16T11:32:09.203Z
Learning: In the opendatahub-io/notebooks repository, there is a known issue with missing `runtimes/rocm/pytorch/ubi9-python-3.11/kustomize/base/kustomization.yaml` file that causes rocm runtime tests to fail with "no such file or directory" error. This is tracked in JIRA RHOAIENG-22044 and was intended to be fixed in PR #1015.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#0
File: :0-0
Timestamp: 2025-07-01T10:41:56.419Z
Learning: In the opendatahub-io/notebooks repository, TensorFlow packages with `extras = ["and-cuda"]` can cause build conflicts on macOS due to platform-specific CUDA packages. When the Dockerfile installs CUDA system-wide, removing the extras and letting TensorFlow find CUDA at runtime resolves these conflicts.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#0
File: :0-0
Timestamp: 2025-07-01T10:41:56.419Z
Learning: jiridanek's team uses containerized dependency locking for cross-platform compatibility in opendatahub-io/notebooks. They run `pipenv lock` inside UBI9 containers with specific platform arguments (`--platform=linux/amd64 --python-version 3.12`) to avoid host OS dependency conflicts when generating Pipfile.lock files.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#1151
File: jupyter/tensorflow/ubi9-python-3.12/kustomize/base/statefulset.yaml:11-17
Timestamp: 2025-07-01T06:50:37.115Z
Learning: jiridanek manages StatefulSet selector issues systematically across multiple images in opendatahub-io/notebooks. When the same configuration issue (empty spec.selector and template.metadata.labels) appears in different images like jupyter/minimal and jupyter/tensorflow, he tracks them under a single coordinated effort rather than creating duplicate issues for each affected image.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#1154
File: manifests/base/jupyter-pytorch-notebook-imagestream.yaml:0-0
Timestamp: 2025-06-16T11:06:33.139Z
Learning: In the opendatahub-io/notebooks repository, N-1 versions of images in manifest files (like imagestream.yaml files) should not be updated regularly. The versions of packages like codeflare-sdk in N-1 images are frozen to what was released when the image was moved from N to N-1 version. N-1 images are only updated for security vulnerabilities of packages, not for regular version bumps. This is why the version of packages in N-1 images may be quite old compared to the latest N version.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#1230
File: jupyter/pytorch/ubi9-python-3.12/kustomize/components/accelerator/pod-patch.yaml:11-22
Timestamp: 2025-06-30T14:35:34.805Z
Learning: In the opendatahub-io/notebooks repository, mounting emptyDir volumes over /opt/app-root/src is intentional behavior that matches production deployment patterns where odh-dashboard mounts empty PVCs at this location (the $HOME directory). This mounting is expected to hide base image content.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#1230
File: jupyter/pytorch/ubi9-python-3.12/kustomize/base/statefulset.yaml:54-60
Timestamp: 2025-06-30T14:43:08.138Z
Learning: Issue #1212 in opendatahub-io/notebooks demonstrates that missing securityContext configuration (allowPrivilegeEscalation, runAsNonRoot, seccompProfile) causes runtime pods to fail reaching ready state and timeout after 300s on OpenShift due to PodSecurity policy violations.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#0
File: :0-0
Timestamp: 2025-06-16T11:32:09.203Z
Learning: Runtime deployment tests in opendatahub-io/notebooks may show PodSecurity warnings about allowPrivilegeEscalation, capabilities, runAsNonRoot, and seccompProfile settings. These warnings occur on OpenShift but not on GitHub Actions because GitHub Actions uses upstream Kubernetes without SecurityContextConstraints (SCC).
Learnt from: jiridanek
PR: opendatahub-io/notebooks#0
File: :0-0
Timestamp: 2025-06-16T11:32:09.203Z
Learning: OpenShift CI infrastructure issues in opendatahub-io/notebooks can manifest as "ContainerFailed one or more containers exited" errors in release steps, or as "Entrypoint received interrupt: terminated" messages when pods are killed during CI runs. These are typically infrastructure-level issues rather than code problems.
jupyter/rocm/tensorflow/ubi9-python-3.12/Pipfile (7)
Learnt from: jiridanek
PR: opendatahub-io/notebooks#0
File: :0-0
Timestamp: 2025-06-16T11:32:09.203Z
Learning: In the opendatahub-io/notebooks repository, there is a known issue with missing `runtimes/rocm/pytorch/ubi9-python-3.11/kustomize/base/kustomization.yaml` file that causes rocm runtime tests to fail with "no such file or directory" error. This is tracked in JIRA RHOAIENG-22044 and was intended to be fixed in PR #1015.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#0
File: :0-0
Timestamp: 2025-07-01T10:41:56.419Z
Learning: jiridanek's team uses containerized dependency locking for cross-platform compatibility in opendatahub-io/notebooks. They run `pipenv lock` inside UBI9 containers with specific platform arguments (`--platform=linux/amd64 --python-version 3.12`) to avoid host OS dependency conflicts when generating Pipfile.lock files.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#1218
File: jupyter/datascience/ubi9-python-3.11/Pipfile:34-36
Timestamp: 2025-06-28T14:13:27.890Z
Learning: In the opendatahub-io/notebooks repository, the dependency pinning strategy follows a deliberate pattern: core `jupyterlab` package uses exact pinning (==) across all notebook images to ensure UI consistency, while JupyterLab extensions and all server components (jupyter-server, jupyter-server-proxy, jupyter-server-terminals) use compatible release (~=) pinning to allow automatic security updates and bug fixes while maintaining API compatibility.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#1218
File: jupyter/trustyai/ubi9-python-3.11/Pipfile:49-49
Timestamp: 2025-06-28T14:15:41.168Z
Learning: The jupyter-bokeh package was previously pinned to version 3.0.5 in the TrustyAI notebook image due to compatibility requirements with TrustyAI components, as indicated by the comment "Should be pinned down to this version in order to be compatible with trustyai" that was removed in this update.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#0
File: :0-0
Timestamp: 2025-07-01T10:41:56.419Z
Learning: In the opendatahub-io/notebooks repository, TensorFlow packages with `extras = ["and-cuda"]` can cause build conflicts on macOS due to platform-specific CUDA packages. When the Dockerfile installs CUDA system-wide, removing the extras and letting TensorFlow find CUDA at runtime resolves these conflicts.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#1218
File: jupyter/trustyai/ubi9-python-3.11/Pipfile:49-49
Timestamp: 2025-06-28T14:21:09.429Z
Learning: The jupyter-bokeh pinning to 3.0.5 in TrustyAI notebook image was not due to TrustyAI code compatibility issues, but because the trustyai package itself explicitly declares jupyter-bokeh~=3.0.5 as a hard dependency, causing pip dependency resolution conflicts when trying to upgrade to jupyter-bokeh 4.x.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#1218
File: jupyter/trustyai/ubi9-python-3.11/Pipfile:49-49
Timestamp: 2025-06-28T14:21:09.429Z
Learning: TrustyAI explicitly declares jupyter-bokeh~=3.0.5 as a hard dependency in both their requirements.txt and pyproject.toml files, with no open issues tracking jupyter-bokeh 4.x compatibility. This creates an unresolvable pip dependency conflict when trying to upgrade jupyter-bokeh to 4.x in notebook images that include TrustyAI.
jupyter/rocm/tensorflow/ubi9-python-3.12/test/test_notebook.ipynb (7)
Learnt from: jiridanek
PR: opendatahub-io/notebooks#0
File: :0-0
Timestamp: 2025-06-16T11:32:09.203Z
Learning: In the opendatahub-io/notebooks repository, there is a known issue with missing `runtimes/rocm/pytorch/ubi9-python-3.11/kustomize/base/kustomization.yaml` file that causes rocm runtime tests to fail with "no such file or directory" error. This is tracked in JIRA RHOAIENG-22044 and was intended to be fixed in PR #1015.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#1247
File: .github/workflows/build-notebooks-TEMPLATE.yaml:50-53
Timestamp: 2025-07-01T14:36:52.852Z
Learning: In the opendatahub-io/notebooks repository, the test runner's Python version (configured in GitHub Actions UV setup) intentionally doesn't need to match the Python version of the container images being tested. jiridanek's team uses Python 3.12 for running tests while images may use different Python versions (like 3.11), and this approach works fine since the test code is separate from the application code running inside the containers.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#1218
File: jupyter/trustyai/ubi9-python-3.11/Pipfile:49-49
Timestamp: 2025-06-28T14:15:41.168Z
Learning: TrustyAI's jupyter-bokeh was pinned to 3.0.5 due to compatibility requirements with TrustyAI's visualization components, but the actual deployed version in requirements.txt shows 3.0.7, indicating incremental testing. The upgrade to 4.0.5 in this PR represents the completion of a gradual migration strategy from the 3.x series after confirming compatibility with Bokeh 3.7.3.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#0
File: :0-0
Timestamp: 2025-07-01T10:41:56.419Z
Learning: In the opendatahub-io/notebooks repository, TensorFlow packages with `extras = ["and-cuda"]` can cause build conflicts on macOS due to platform-specific CUDA packages. When the Dockerfile installs CUDA system-wide, removing the extras and letting TensorFlow find CUDA at runtime resolves these conflicts.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#1218
File: jupyter/trustyai/ubi9-python-3.11/Pipfile:49-49
Timestamp: 2025-06-28T14:15:41.168Z
Learning: The jupyter-bokeh package was previously pinned to version 3.0.5 in the TrustyAI notebook image due to compatibility requirements with TrustyAI components, as indicated by the comment "Should be pinned down to this version in order to be compatible with trustyai" that was removed in this update.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#0
File: :0-0
Timestamp: 2025-06-16T11:32:09.203Z
Learning: Test failures in opendatahub-io/notebooks should be categorized as either repository bugs (code/configuration issues) or infrastructure issues (CI platform problems) for proper diagnosis and resolution.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#1218
File: jupyter/rocm/pytorch/ubi9-python-3.11/requirements.txt:987-989
Timestamp: 2025-06-28T15:06:03.344Z
Learning: In the opendatahub-io/notebooks repository, checks for broken links and missing files in kustomization manifests are already performed by `ci/kustomize.sh`, which is invoked from `.github/workflows/code-quality.yaml` (lines 112–116). No additional pytest is needed for this purpose.
🪛 Checkov (3.2.334)
jupyter/rocm/tensorflow/ubi9-python-3.12/kustomize/base/statefulset.yaml

[MEDIUM] 2-60: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)


[MEDIUM] 2-60: Minimize the admission of root containers

(CKV_K8S_23)

Copy link
Member

@jiridanek jiridanek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add your new image to Makefile. There are two places. Uncommenting the relevant line(s) should do the trick. I'll try to ask rabbit if it can give precise instructions.

@coderabbitai the new python 3.12 image has to be added in the two lists in Makefile. For Pipfile lock updates and for all-images target so that GitHub actions CI runs on it. Can you please explain how this is to be accomplished?

@jiridanek
Copy link
Member

jiridanek commented Jul 2, 2025

@coderabbitai the new python 3.12 image has to be added in the two lists in Makefile. For Pipfile lock updates and for all-images target so that GitHub actions CI runs on it. Can you please explain how this is to be accomplished?

Copy link
Contributor

coderabbitai bot commented Jul 2, 2025

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

@jiridanek, I'll help you identify where to add the new Python 3.12 ROCm TensorFlow image in the Makefile. Let me examine the current Makefile structure first.


🏁 Script executed:

#!/bin/bash
# Examine the Makefile to understand its structure and find the relevant sections
cat Makefile

Length of output: 23173


Perfect! I can see exactly what needs to be done. The Makefile already has the infrastructure in place for the ROCm TensorFlow Python 3.12 image, but the entries are currently commented out. You need to uncomment two specific lines:

1. For Pipfile lock updates (around line 445):

In the BASE_DIRS variable for Python 3.12, uncomment this line:

# jupyter/rocm/tensorflow/ubi9-python-$(PYTHON_VERSION)

So it becomes:

else ifeq ($(PYTHON_VERSION), 3.12)
	BASE_DIRS := \
	    jupyter/minimal/ubi9-python-$(PYTHON_VERSION) \
		jupyter/datascience/ubi9-python-$(PYTHON_VERSION) \
		jupyter/pytorch/ubi9-python-$(PYTHON_VERSION) \
		jupyter/tensorflow/ubi9-python-$(PYTHON_VERSION) \
		jupyter/rocm/pytorch/ubi9-python-$(PYTHON_VERSION) \
		jupyter/rocm/tensorflow/ubi9-python-$(PYTHON_VERSION)
		# jupyter/trustyai/ubi9-python-$(PYTHON_VERSION)
		# codeserver/ubi9-python-$(PYTHON_VERSION)

2. For all-images target (around line 515):

In the all-images target for Python 3.12, uncomment this line:

# rocm-jupyter-tensorflow-ubi9-python-$(RELEASE_PYTHON_VERSION)

So it becomes:

else ifeq ($(RELEASE_PYTHON_VERSION), 3.12)
all-images: \
	jupyter-minimal-ubi9-python-$(RELEASE_PYTHON_VERSION) \
	jupyter-datascience-ubi9-python-$(RELEASE_PYTHON_VERSION) \
	cuda-jupyter-minimal-ubi9-python-$(RELEASE_PYTHON_VERSION) \
	cuda-jupyter-tensorflow-ubi9-python-$(RELEASE_PYTHON_VERSION) \
	cuda-jupyter-pytorch-ubi9-python-$(RELEASE_PYTHON_VERSION) \
 	rocm-jupyter-pytorch-ubi9-python-$(RELEASE_PYTHON_VERSION) \
 	rocm-jupyter-tensorflow-ubi9-python-$(RELEASE_PYTHON_VERSION)

These changes will:

  • Include the ROCm TensorFlow Python 3.12 image in the Pipfile lock renewal process
  • Add it to the GitHub Actions CI matrix so it gets built and tested automatically
  • Ensure consistency with the other Python 3.12 images in the build system

The build target rocm-jupyter-tensorflow-ubi9-python-$(RELEASE_PYTHON_VERSION) is already defined earlier in the Makefile (around line 186), so no additional build configuration is needed.


🧠 Learnings used
Learnt from: jiridanek
PR: opendatahub-io/notebooks#1247
File: .github/workflows/build-notebooks-TEMPLATE.yaml:50-53
Timestamp: 2025-07-01T14:36:52.852Z
Learning: In the opendatahub-io/notebooks repository, the test runner's Python version (configured in GitHub Actions UV setup) intentionally doesn't need to match the Python version of the container images being tested. jiridanek's team uses Python 3.12 for running tests while images may use different Python versions (like 3.11), and this approach works fine since the test code is separate from the application code running inside the containers.

Learnt from: jiridanek
PR: opendatahub-io/notebooks#1230
File: jupyter/minimal/ubi9-python-3.12/Dockerfile.cpu:21-24
Timestamp: 2025-07-01T06:48:13.154Z
Learning: jiridanek creates comprehensive follow-up issues from review comments that expand scope appropriately, include clear acceptance criteria, proper backlinks, and structured implementation guidance. Issue #1241 demonstrates this by turning a specific oc client checksum concern into a thorough security enhancement plan covering all downloaded binaries across the Python 3.12 implementation.

Learnt from: jiridanek
PR: opendatahub-io/notebooks#1230
File: jupyter/minimal/ubi9-python-3.12/Dockerfile.rocm:43-55
Timestamp: 2025-07-01T06:48:21.070Z
Learning: When security concerns are raised during PR reviews in opendatahub-io/notebooks, comprehensive follow-up issues are created (often by CodeRabbit) to track all related security enhancements with clear acceptance criteria and implementation guidance. This ensures security improvements are systematically addressed in dedicated efforts rather than blocking current deliverables.

Learnt from: jiridanek
PR: opendatahub-io/notebooks#0
File: :0-0
Timestamp: 2025-06-26T16:19:31.295Z
Learning: In the opendatahub-io/notebooks repository, the Playwright Docker image version in `.github/workflows/build-notebooks-TEMPLATE.yaml` (format: `mcr.microsoft.com/playwright:v1.53.1-noble`) must always match the `@playwright/test` version specified in the `tests/browser/package.json` file. Both versions need to be updated together to maintain consistency between CI/CD pipeline and project dependencies.

Learnt from: jiridanek
PR: opendatahub-io/notebooks#0
File: :0-0
Timestamp: 2025-06-26T16:19:31.295Z
Learning: In the opendatahub-io/notebooks repository, the Playwright Docker image version in `.github/workflows/build-notebooks-TEMPLATE.yaml` (format: `mcr.microsoft.com/playwright:v1.53.1-noble`) must always match the `@playwright/test` version specified in the `tests/browser/package.json` file. Both versions need to be updated together to maintain consistency between CI/CD pipeline and project dependencies.

Learnt from: jiridanek
PR: opendatahub-io/notebooks#0
File: :0-0
Timestamp: 2025-07-01T10:41:56.419Z
Learning: jiridanek's team uses containerized dependency locking for cross-platform compatibility in opendatahub-io/notebooks. They run `pipenv lock` inside UBI9 containers with specific platform arguments (`--platform=linux/amd64 --python-version 3.12`) to avoid host OS dependency conflicts when generating Pipfile.lock files.

Learnt from: jiridanek
PR: opendatahub-io/notebooks#0
File: :0-0
Timestamp: 2025-06-16T11:32:09.203Z
Learning: In the opendatahub-io/notebooks repository, there is a known issue with missing `runtimes/rocm/pytorch/ubi9-python-3.11/kustomize/base/kustomization.yaml` file that causes rocm runtime tests to fail with "no such file or directory" error. This is tracked in JIRA RHOAIENG-22044 and was intended to be fixed in PR #1015.

Learnt from: jiridanek
PR: opendatahub-io/notebooks#1151
File: jupyter/tensorflow/ubi9-python-3.12/test/test_notebook.ipynb:24-30
Timestamp: 2025-07-01T06:57:26.974Z
Learning: In the opendatahub-io/notebooks repository, jiridanek's team relies on strict CI process enforcement to handle potentially fragile code patterns like hard-coded relative paths. They accept some code fragility when it's mitigated through reliable CI setup and process controls, rather than always requiring code-level fixes.

Learnt from: jiridanek
PR: opendatahub-io/notebooks#0
File: :0-0
Timestamp: 2025-06-30T19:44:57.656Z
Learning: In the opendatahub-io/notebooks repository, jiridanek effectively manages comprehensive PR reviews by tracking all feedback as follow-up GitHub issues rather than blocking merge of high-quality implementations. This demonstrates excellent project management practices for balancing current deliverables with future improvements.

Learnt from: jiridanek
PR: opendatahub-io/notebooks#1151
File: jupyter/tensorflow/ubi9-python-3.12/kustomize/base/statefulset.yaml:11-17
Timestamp: 2025-07-01T06:50:37.115Z
Learning: jiridanek manages StatefulSet selector issues systematically across multiple images in opendatahub-io/notebooks. When the same configuration issue (empty spec.selector and template.metadata.labels) appears in different images like jupyter/minimal and jupyter/tensorflow, he tracks them under a single coordinated effort rather than creating duplicate issues for each affected image.

Learnt from: jiridanek
PR: opendatahub-io/notebooks#1151
File: jupyter/tensorflow/ubi9-python-3.12/test/test_notebook.ipynb:31-34
Timestamp: 2025-07-01T07:03:05.385Z
Learning: jiridanek demonstrates excellent pattern recognition for identifying duplicated code issues across the opendatahub-io/notebooks repository. When spotting a potential problem in test notebooks, he correctly assesses that such patterns are likely replicated across multiple similar files rather than being isolated incidents, leading to more effective systematic solutions.

Learnt from: jiridanek
PR: opendatahub-io/notebooks#0
File: :0-0
Timestamp: 2025-06-20T11:51:59.716Z
Learning: This project follows the practice of associating PRs with Jira tickets from https://issues.redhat.com for traceability between requirements, release process, and product documentation. This is critical for enterprise software development compliance and cross-team coordination.

Learnt from: jiridanek
PR: opendatahub-io/notebooks#1218
File: jupyter/trustyai/ubi9-python-3.11/Pipfile:49-49
Timestamp: 2025-06-28T14:15:41.168Z
Learning: TrustyAI's jupyter-bokeh was pinned to 3.0.5 due to compatibility requirements with TrustyAI's visualization components, but the actual deployed version in requirements.txt shows 3.0.7, indicating incremental testing. The upgrade to 4.0.5 in this PR represents the completion of a gradual migration strategy from the 3.x series after confirming compatibility with Bokeh 3.7.3.

Learnt from: jiridanek
PR: opendatahub-io/notebooks#1230
File: jupyter/pytorch/ubi9-python-3.12/kustomize/components/accelerator/pod-patch.yaml:11-22
Timestamp: 2025-06-30T14:36:53.890Z
Learning: The pod-patch.yaml file in jupyter/pytorch/ubi9-python-3.12/kustomize/components/accelerator/ is used only for running tests, not production deployments. This affects the risk assessment for resource management configurations like sizeLimit on emptyDir volumes.

Copy link
Contributor

openshift-ci bot commented Jul 3, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from jiridanek. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot removed the size/xxl label Jul 3, 2025
jiridanek and others added 10 commits August 20, 2025 20:30
…erfile as these aren't available on ubi9+epel9
…ccording to Documentation in TensorFlow Dockerfile

```
Error:
 Problem: package rocm-opencl-sdk-6.4.3.60403-128.el9.x86_64 from ROCm requires rocm-opencl-devel = 2.0.0.60403-128.el9, but none of the providers can be installed
  - package rocm-opencl-devel-2.0.0.60403-128.el9.x86_64 from ROCm requires rocm-opencl, but none of the providers can be installed
  - conflicting requests
  - nothing provides opencl-filesystem needed by rocm-opencl-5.4.4-1.el9.x86_64 from epel
  - nothing provides ocl-icd(x86-64) needed by rocm-opencl-5.4.4-1.el9.x86_64 from epel
  - nothing provides ocl-icd needed by rocm-opencl-2.0.0.60403-128.el9.x86_64 from ROCm
```
@jiridanek
Copy link
Member

@EmilienM hopefully this is the last time we're doing something like this on our own

```
./jupyter/rocm/tensorflow/ubi9-python-3.12/Dockerfile.rocm:84:1 unexpected 'R' expecting a new line followed by the next instruction
```
@openshift-ci openshift-ci bot added size/xxl and removed size/xxl labels Aug 20, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (5)
jupyter/rocm/tensorflow/ubi9-python-3.12/kustomize/base/statefulset.yaml (2)

18-60: Missing minimal pod/container securityContext (tracked)

Hardening fields like runAsNonRoot, allowPrivilegeEscalation, and seccompProfile are missing. This is already tracked across images and managed centrally; no need to block this PR.

Reference: Issue #1212 (repository-wide pod security hardening).


12-16: Empty selector and template labels will be rejected by Kubernetes

The selector must match labels on the pod template. Use a non-empty matchLabels and mirror it on the template.

Apply:

-  selector: {}
+  selector:
+    matchLabels:
+      app: notebook
   serviceName: notebook
   template:
     metadata:
-      labels: {}
+      labels:
+        app: notebook
jupyter/rocm/tensorflow/ubi9-python-3.12/test/test_notebook.ipynb (3)

22-29: Improve load_expected_versions() error handling (tracked)

Lacks handling for missing file or invalid JSON. Already tracked for systematic fix across notebooks.

Reference: Issue #1266.


31-34: Guard get_expected_version() for missing keys (tracked)

Accessing a missing key leads to a TypeError in re.sub. Handle not-found and empty versions. Already tracked.

Reference: Issue #1243.


80-95: Verify TensorBoard logs are actually created (tracked)

The callback is used, but the test doesn’t check that event files were written. Prefer a TemporaryDirectory and assert non-empty contents. Already tracked.

Reference: Issue #1263.

🧹 Nitpick comments (5)
runtimes/rocm-tensorflow/ubi9-python-3.12/Pipfile (1)

13-15: Direct wheel reference is correct for AMD’s manylinux repo; drop redundant version to avoid drift

AMD’s repo is not a PyPI index, so using file=… with a hash is the right approach. Minor: keeping both version==… and file=… is redundant and risks divergence in the future. Suggest removing the version field and relying on the pinned wheel URL + hash.

Apply within this hunk:

- tensorflow-rocm = {version = "==2.18.1", file = "https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4/tensorflow_rocm-2.18.1-cp312-cp312-manylinux_2_28_x86_64.whl", hashes = ["sha256:814a5e4842b0c92d63c7a0bb4df0baf51ff2db8615535d83fe8958204b840598"]}
+ tensorflow-rocm = {file = "https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4/tensorflow_rocm-2.18.1-cp312-cp312-manylinux_2_28_x86_64.whl", hashes = ["sha256:814a5e4842b0c92d63c7a0bb4df0baf51ff2db8615535d83fe8958204b840598"]}
jupyter/rocm/tensorflow/ubi9-python-3.12/kustomize/base/statefulset.yaml (1)

23-31: Auth disabled in NOTEBOOK_ARGS — clarify test-only intent

Token/password are set to empty strings. If this manifest is used only in tests, consider adding an inline comment to avoid accidental reuse in production contexts.

Example:

         - name: NOTEBOOK_ARGS
           value: |-
+            # NOTE: Authentication intentionally disabled for test deployments.
             --ServerApp.port=8888
             --ServerApp.token=''
             --ServerApp.password=''
jupyter/rocm/tensorflow/ubi9-python-3.12/test/test_notebook.ipynb (3)

15-15: Remove unused import

The tensorboard module is not used (the callback is referenced via tf.keras), and Ruff flags it (F401). Safe to remove.

-    "import tensorboard\n",

48-53: Stabilize tf2onnx conversion by providing an input signature

Without an input_signature, tf2onnx may rely on inference and can behave inconsistently across versions. Also, from_keras often returns a tuple; capture accordingly.

-    "        model = tf.keras.Sequential([tf.keras.layers.Dense(1, input_shape=(10,))])\n",
-    "        onnx_model = tf2onnx.convert.from_keras(model)\n",
-    "        \n",
-    "        self.assertTrue(onnx_model is not None)\n",
+    "        model = tf.keras.Sequential([tf.keras.layers.Dense(1, input_shape=(10,))])\n",
+    "        spec = (tf.TensorSpec((None, 10), tf.float32, name=\"input\"),)\n",
+    "        onnx_model, _ = tf2onnx.convert.from_keras(model, input_signature=spec, opset=13)\n",
+    "        self.assertTrue(onnx_model is not None)\n",

66-69: Fix useless expression; add a minimal assertion

The bare “predictions” expression is a no-op (Ruff B018). Make it validate shape to turn this into an actual check.

-    "        predictions = model(x_train[:1]).numpy()\n",
-    "        predictions\n",
+    "        predictions = model(x_train[:1]).numpy()\n",
+    "        self.assertEqual(predictions.shape, (1, 10), \"Predictions should be (1, 10)\")\n",

Note: Broader assertions for accuracy/loss are already tracked separately (Issue #1261).

@openshift-ci openshift-ci bot added size/xxl and removed size/xxl labels Aug 20, 2025
Copy link
Contributor

openshift-ci bot commented Aug 20, 2025

@dibryant: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/notebook-rocm-jupyter-tf-ubi9-python-3-12-pr-image-mirror ae422fe link true /test notebook-rocm-jupyter-tf-ubi9-python-3-12-pr-image-mirror
ci/prow/rocm-notebooks-py312-e2e-tests ae422fe link true /test rocm-notebooks-py312-e2e-tests
ci/prow/rocm-notebooks-e2e-tests 26b399f link true /test rocm-notebooks-e2e-tests
ci/prow/rocm-runtimes-ubi9-e2e-tests 26b399f link true /test rocm-runtimes-ubi9-e2e-tests

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@jiridanek jiridanek merged commit 93150ff into opendatahub-io:main Aug 21, 2025
21 of 27 checks passed
@jiridanek
Copy link
Member

@ckhordiasma
Copy link

/retest

@jiridanek
Copy link
Member

/kf-build odh-workbench-jupyter-tensorflow-rocm-py312-ubi9-on-pull-request

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

tensorflow_rocm package lacks Python 3.12 wheel support causing build failures
6 participants