-
Notifications
You must be signed in to change notification settings - Fork 110
RHOAIENG-27434: create ROCm Tensorflow Python 3.12 Image #1259
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughAdds a new ROCm-enabled multi-stage Jupyter TensorFlow image for UBI9/Python 3.12, with Pipfile/requirements updates, Kustomize manifests, a test notebook, and Makefile changes to enable building the new images. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related issues
Possibly related PRs
Suggested labels
Suggested reviewers
📜 Recent review detailsConfiguration used: .coderabbit.yaml 💡 Knowledge Base configuration:
You can enable these sources in your CodeRabbit configuration. 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
✨ Finishing Touches🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
Status, Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 11
🧹 Nitpick comments (1)
jupyter/rocm/tensorflow/ubi9-python-3.12/test/test_notebook.ipynb (1)
48-53
: tf2onnx conversion test needs more comprehensive validation.The current test only checks that the conversion result is not None, but doesn't validate the actual ONNX model structure or functionality.
def test_tf2onnx_conversion(self): - # Replace this with an actual TensorFlow model conversion using tf2onnx model = tf.keras.Sequential([tf.keras.layers.Dense(1, input_shape=(10,))]) - onnx_model = tf2onnx.convert.from_keras(model) + onnx_model, _ = tf2onnx.convert.from_keras(model) self.assertTrue(onnx_model is not None) + # Verify ONNX model has expected structure + self.assertTrue(hasattr(onnx_model, 'graph'), "ONNX model should have a graph") + self.assertTrue(len(onnx_model.graph.node) > 0, "ONNX model should have nodes")
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
jupyter/rocm/tensorflow/ubi9-python-3.12/Pipfile.lock
is excluded by!**/*.lock
📒 Files selected for processing (6)
jupyter/rocm/tensorflow/ubi9-python-3.12/Dockerfile.rocm
(1 hunks)jupyter/rocm/tensorflow/ubi9-python-3.12/Pipfile
(1 hunks)jupyter/rocm/tensorflow/ubi9-python-3.12/kustomize/base/kustomization.yaml
(1 hunks)jupyter/rocm/tensorflow/ubi9-python-3.12/kustomize/base/service.yaml
(1 hunks)jupyter/rocm/tensorflow/ubi9-python-3.12/kustomize/base/statefulset.yaml
(1 hunks)jupyter/rocm/tensorflow/ubi9-python-3.12/test/test_notebook.ipynb
(1 hunks)
🧰 Additional context used
🧠 Learnings (7)
📓 Common learnings
Learnt from: jiridanek
PR: opendatahub-io/notebooks#0
File: :0-0
Timestamp: 2025-06-16T11:32:09.203Z
Learning: In the opendatahub-io/notebooks repository, there is a known issue with missing `runtimes/rocm/pytorch/ubi9-python-3.11/kustomize/base/kustomization.yaml` file that causes rocm runtime tests to fail with "no such file or directory" error. This is tracked in JIRA RHOAIENG-22044 and was intended to be fixed in PR #1015.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#1230
File: jupyter/minimal/ubi9-python-3.12/Dockerfile.cpu:21-24
Timestamp: 2025-07-01T06:48:13.154Z
Learning: jiridanek creates comprehensive follow-up issues from review comments that expand scope appropriately, include clear acceptance criteria, proper backlinks, and structured implementation guidance. Issue #1241 demonstrates this by turning a specific oc client checksum concern into a thorough security enhancement plan covering all downloaded binaries across the Python 3.12 implementation.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#1247
File: .github/workflows/build-notebooks-TEMPLATE.yaml:50-53
Timestamp: 2025-07-01T14:36:52.852Z
Learning: In the opendatahub-io/notebooks repository, the test runner's Python version (configured in GitHub Actions UV setup) intentionally doesn't need to match the Python version of the container images being tested. jiridanek's team uses Python 3.12 for running tests while images may use different Python versions (like 3.11), and this approach works fine since the test code is separate from the application code running inside the containers.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#0
File: :0-0
Timestamp: 2025-06-20T11:51:59.716Z
Learning: This project follows the practice of associating PRs with Jira tickets from https://issues.redhat.com for traceability between requirements, release process, and product documentation. This is critical for enterprise software development compliance and cross-team coordination.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#1218
File: jupyter/trustyai/ubi9-python-3.11/Pipfile:49-49
Timestamp: 2025-06-28T14:15:41.168Z
Learning: TrustyAI's jupyter-bokeh was pinned to 3.0.5 due to compatibility requirements with TrustyAI's visualization components, but the actual deployed version in requirements.txt shows 3.0.7, indicating incremental testing. The upgrade to 4.0.5 in this PR represents the completion of a gradual migration strategy from the 3.x series after confirming compatibility with Bokeh 3.7.3.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#1151
File: jupyter/tensorflow/ubi9-python-3.12/kustomize/base/statefulset.yaml:11-17
Timestamp: 2025-07-01T06:50:37.115Z
Learning: jiridanek manages StatefulSet selector issues systematically across multiple images in opendatahub-io/notebooks. When the same configuration issue (empty spec.selector and template.metadata.labels) appears in different images like jupyter/minimal and jupyter/tensorflow, he tracks them under a single coordinated effort rather than creating duplicate issues for each affected image.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#1230
File: jupyter/pytorch/ubi9-python-3.12/kustomize/components/accelerator/pod-patch.yaml:11-22
Timestamp: 2025-06-30T14:36:53.890Z
Learning: The pod-patch.yaml file in jupyter/pytorch/ubi9-python-3.12/kustomize/components/accelerator/ is used only for running tests, not production deployments. This affects the risk assessment for resource management configurations like sizeLimit on emptyDir volumes.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#1230
File: jupyter/minimal/ubi9-python-3.12/Dockerfile.rocm:43-55
Timestamp: 2025-07-01T06:48:21.070Z
Learning: When security concerns are raised during PR reviews in opendatahub-io/notebooks, comprehensive follow-up issues are created (often by CodeRabbit) to track all related security enhancements with clear acceptance criteria and implementation guidance. This ensures security improvements are systematically addressed in dedicated efforts rather than blocking current deliverables.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#0
File: :0-0
Timestamp: 2025-07-01T10:41:56.419Z
Learning: jiridanek's team uses containerized dependency locking for cross-platform compatibility in opendatahub-io/notebooks. They run `pipenv lock` inside UBI9 containers with specific platform arguments (`--platform=linux/amd64 --python-version 3.12`) to avoid host OS dependency conflicts when generating Pipfile.lock files.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#0
File: :0-0
Timestamp: 2025-07-01T10:41:56.419Z
Learning: In the opendatahub-io/notebooks repository, TensorFlow packages with `extras = ["and-cuda"]` can cause build conflicts on macOS due to platform-specific CUDA packages. When the Dockerfile installs CUDA system-wide, removing the extras and letting TensorFlow find CUDA at runtime resolves these conflicts.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#1218
File: jupyter/trustyai/ubi9-python-3.11/Pipfile:49-49
Timestamp: 2025-06-28T14:15:41.168Z
Learning: The jupyter-bokeh package was previously pinned to version 3.0.5 in the TrustyAI notebook image due to compatibility requirements with TrustyAI components, as indicated by the comment "Should be pinned down to this version in order to be compatible with trustyai" that was removed in this update.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#1218
File: jupyter/datascience/ubi9-python-3.11/Pipfile:34-36
Timestamp: 2025-06-28T14:13:27.890Z
Learning: In the opendatahub-io/notebooks repository, the dependency pinning strategy follows a deliberate pattern: core `jupyterlab` package uses exact pinning (==) across all notebook images to ensure UI consistency, while JupyterLab extensions and all server components (jupyter-server, jupyter-server-proxy, jupyter-server-terminals) use compatible release (~=) pinning to allow automatic security updates and bug fixes while maintaining API compatibility.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#1197
File: runtimes/minimal/ubi9-python-3.11/requirements.txt:395-405
Timestamp: 2025-06-26T15:28:35.416Z
Learning: psutil version 7.x is compatible with UBI9, CentOS Stream 9, and RHEL 9 platforms in the opendatahub-io/notebooks repository. The upgrade from psutil 5.x to 7.x has been validated for these environments.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#1218
File: jupyter/trustyai/ubi9-python-3.11/Pipfile:49-49
Timestamp: 2025-06-28T14:21:09.429Z
Learning: TrustyAI 0.6.1 (latest version as of June 2025) has a hard dependency constraint on jupyter-bokeh~=3.0.5, preventing upgrades to jupyter-bokeh 4.x in notebook images that include TrustyAI. This requires either waiting for TrustyAI to update their dependency or excluding TrustyAI from jupyter-bokeh upgrades.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#1230
File: jupyter/pytorch/ubi9-python-3.12/kustomize/base/statefulset.yaml:54-60
Timestamp: 2025-06-30T14:43:08.138Z
Learning: Issue #1212 in opendatahub-io/notebooks demonstrates that missing securityContext configuration (allowPrivilegeEscalation, runAsNonRoot, seccompProfile) causes runtime pods to fail reaching ready state and timeout after 300s on OpenShift due to PodSecurity policy violations.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#0
File: :0-0
Timestamp: 2025-06-16T11:32:09.203Z
Learning: Runtime deployment tests in opendatahub-io/notebooks may show PodSecurity warnings about allowPrivilegeEscalation, capabilities, runAsNonRoot, and seccompProfile settings. These warnings occur on OpenShift but not on GitHub Actions because GitHub Actions uses upstream Kubernetes without SecurityContextConstraints (SCC).
jupyter/rocm/tensorflow/ubi9-python-3.12/kustomize/base/kustomization.yaml (5)
Learnt from: jiridanek
PR: opendatahub-io/notebooks#0
File: :0-0
Timestamp: 2025-06-16T11:32:09.203Z
Learning: In the opendatahub-io/notebooks repository, there is a known issue with missing `runtimes/rocm/pytorch/ubi9-python-3.11/kustomize/base/kustomization.yaml` file that causes rocm runtime tests to fail with "no such file or directory" error. This is tracked in JIRA RHOAIENG-22044 and was intended to be fixed in PR #1015.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#1230
File: jupyter/pytorch/ubi9-python-3.12/kustomize/components/accelerator/pod-patch.yaml:11-22
Timestamp: 2025-06-30T14:36:53.890Z
Learning: The pod-patch.yaml file in jupyter/pytorch/ubi9-python-3.12/kustomize/components/accelerator/ is used only for running tests, not production deployments. This affects the risk assessment for resource management configurations like sizeLimit on emptyDir volumes.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#1151
File: jupyter/tensorflow/ubi9-python-3.12/kustomize/base/statefulset.yaml:11-17
Timestamp: 2025-07-01T06:50:37.115Z
Learning: jiridanek manages StatefulSet selector issues systematically across multiple images in opendatahub-io/notebooks. When the same configuration issue (empty spec.selector and template.metadata.labels) appears in different images like jupyter/minimal and jupyter/tensorflow, he tracks them under a single coordinated effort rather than creating duplicate issues for each affected image.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#1218
File: jupyter/rocm/pytorch/ubi9-python-3.11/requirements.txt:987-989
Timestamp: 2025-06-28T15:06:03.344Z
Learning: In the opendatahub-io/notebooks repository, checks for broken links and missing files in kustomization manifests are already performed by `ci/kustomize.sh`, which is invoked from `.github/workflows/code-quality.yaml` (lines 112–116). No additional pytest is needed for this purpose.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#1151
File: jupyter/tensorflow/ubi9-python-3.12/kustomize/base/statefulset.yaml:11-17
Timestamp: 2025-07-01T06:50:37.115Z
Learning: StatefulSet selector issue with empty `spec.selector: {}` is a known systemic problem across multiple images in opendatahub-io/notebooks repository, tracked in issue #1236. This affects multiple StatefulSet manifests and is being addressed centrally rather than in individual PRs.
jupyter/rocm/tensorflow/ubi9-python-3.12/kustomize/base/service.yaml (3)
Learnt from: jiridanek
PR: opendatahub-io/notebooks#0
File: :0-0
Timestamp: 2025-06-16T11:32:09.203Z
Learning: In the opendatahub-io/notebooks repository, there is a known issue with missing `runtimes/rocm/pytorch/ubi9-python-3.11/kustomize/base/kustomization.yaml` file that causes rocm runtime tests to fail with "no such file or directory" error. This is tracked in JIRA RHOAIENG-22044 and was intended to be fixed in PR #1015.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#1151
File: jupyter/tensorflow/ubi9-python-3.12/kustomize/base/statefulset.yaml:11-17
Timestamp: 2025-07-01T06:50:37.115Z
Learning: jiridanek manages StatefulSet selector issues systematically across multiple images in opendatahub-io/notebooks. When the same configuration issue (empty spec.selector and template.metadata.labels) appears in different images like jupyter/minimal and jupyter/tensorflow, he tracks them under a single coordinated effort rather than creating duplicate issues for each affected image.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#1230
File: jupyter/pytorch/ubi9-python-3.12/kustomize/components/accelerator/pod-patch.yaml:11-22
Timestamp: 2025-06-30T14:36:53.890Z
Learning: The pod-patch.yaml file in jupyter/pytorch/ubi9-python-3.12/kustomize/components/accelerator/ is used only for running tests, not production deployments. This affects the risk assessment for resource management configurations like sizeLimit on emptyDir volumes.
jupyter/rocm/tensorflow/ubi9-python-3.12/kustomize/base/statefulset.yaml (7)
Learnt from: jiridanek
PR: opendatahub-io/notebooks#0
File: :0-0
Timestamp: 2025-07-01T07:11:44.903Z
Learning: Resource limits in StatefulSet manifests in opendatahub-io/notebooks are configured for testing purposes, not production deployments. This affects risk assessment when reviewing resource configurations like memory and CPU limits.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#1151
File: jupyter/tensorflow/ubi9-python-3.12/kustomize/base/statefulset.yaml:11-17
Timestamp: 2025-07-01T06:50:37.115Z
Learning: jiridanek manages StatefulSet selector issues systematically across multiple images in opendatahub-io/notebooks. When the same configuration issue (empty spec.selector and template.metadata.labels) appears in different images like jupyter/minimal and jupyter/tensorflow, he tracks them under a single coordinated effort rather than creating duplicate issues for each affected image.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#0
File: :0-0
Timestamp: 2025-06-16T11:32:09.203Z
Learning: In the opendatahub-io/notebooks repository, there is a known issue with missing `runtimes/rocm/pytorch/ubi9-python-3.11/kustomize/base/kustomization.yaml` file that causes rocm runtime tests to fail with "no such file or directory" error. This is tracked in JIRA RHOAIENG-22044 and was intended to be fixed in PR #1015.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#1151
File: jupyter/tensorflow/ubi9-python-3.12/kustomize/base/statefulset.yaml:11-17
Timestamp: 2025-07-01T06:50:37.115Z
Learning: StatefulSet selector issue with empty `spec.selector: {}` is a known systemic problem across multiple images in opendatahub-io/notebooks repository, tracked in issue #1236. This affects multiple StatefulSet manifests and is being addressed centrally rather than in individual PRs.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#1230
File: jupyter/pytorch/ubi9-python-3.12/kustomize/components/accelerator/pod-patch.yaml:11-22
Timestamp: 2025-06-30T14:36:53.890Z
Learning: The pod-patch.yaml file in jupyter/pytorch/ubi9-python-3.12/kustomize/components/accelerator/ is used only for running tests, not production deployments. This affects the risk assessment for resource management configurations like sizeLimit on emptyDir volumes.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#1230
File: jupyter/pytorch/ubi9-python-3.12/kustomize/base/statefulset.yaml:54-60
Timestamp: 2025-06-30T14:43:08.138Z
Learning: Issue #1212 in opendatahub-io/notebooks demonstrates that missing securityContext configuration (allowPrivilegeEscalation, runAsNonRoot, seccompProfile) causes runtime pods to fail reaching ready state and timeout after 300s on OpenShift due to PodSecurity policy violations.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#0
File: :0-0
Timestamp: 2025-06-16T11:32:09.203Z
Learning: Runtime deployment tests in opendatahub-io/notebooks may show PodSecurity warnings about allowPrivilegeEscalation, capabilities, runAsNonRoot, and seccompProfile settings. These warnings occur on OpenShift but not on GitHub Actions because GitHub Actions uses upstream Kubernetes without SecurityContextConstraints (SCC).
jupyter/rocm/tensorflow/ubi9-python-3.12/Dockerfile.rocm (9)
Learnt from: jiridanek
PR: opendatahub-io/notebooks#0
File: :0-0
Timestamp: 2025-06-16T11:32:09.203Z
Learning: In the opendatahub-io/notebooks repository, there is a known issue with missing `runtimes/rocm/pytorch/ubi9-python-3.11/kustomize/base/kustomization.yaml` file that causes rocm runtime tests to fail with "no such file or directory" error. This is tracked in JIRA RHOAIENG-22044 and was intended to be fixed in PR #1015.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#0
File: :0-0
Timestamp: 2025-07-01T10:41:56.419Z
Learning: In the opendatahub-io/notebooks repository, TensorFlow packages with `extras = ["and-cuda"]` can cause build conflicts on macOS due to platform-specific CUDA packages. When the Dockerfile installs CUDA system-wide, removing the extras and letting TensorFlow find CUDA at runtime resolves these conflicts.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#0
File: :0-0
Timestamp: 2025-07-01T10:41:56.419Z
Learning: jiridanek's team uses containerized dependency locking for cross-platform compatibility in opendatahub-io/notebooks. They run `pipenv lock` inside UBI9 containers with specific platform arguments (`--platform=linux/amd64 --python-version 3.12`) to avoid host OS dependency conflicts when generating Pipfile.lock files.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#1151
File: jupyter/tensorflow/ubi9-python-3.12/kustomize/base/statefulset.yaml:11-17
Timestamp: 2025-07-01T06:50:37.115Z
Learning: jiridanek manages StatefulSet selector issues systematically across multiple images in opendatahub-io/notebooks. When the same configuration issue (empty spec.selector and template.metadata.labels) appears in different images like jupyter/minimal and jupyter/tensorflow, he tracks them under a single coordinated effort rather than creating duplicate issues for each affected image.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#1154
File: manifests/base/jupyter-pytorch-notebook-imagestream.yaml:0-0
Timestamp: 2025-06-16T11:06:33.139Z
Learning: In the opendatahub-io/notebooks repository, N-1 versions of images in manifest files (like imagestream.yaml files) should not be updated regularly. The versions of packages like codeflare-sdk in N-1 images are frozen to what was released when the image was moved from N to N-1 version. N-1 images are only updated for security vulnerabilities of packages, not for regular version bumps. This is why the version of packages in N-1 images may be quite old compared to the latest N version.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#1230
File: jupyter/pytorch/ubi9-python-3.12/kustomize/components/accelerator/pod-patch.yaml:11-22
Timestamp: 2025-06-30T14:35:34.805Z
Learning: In the opendatahub-io/notebooks repository, mounting emptyDir volumes over /opt/app-root/src is intentional behavior that matches production deployment patterns where odh-dashboard mounts empty PVCs at this location (the $HOME directory). This mounting is expected to hide base image content.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#1230
File: jupyter/pytorch/ubi9-python-3.12/kustomize/base/statefulset.yaml:54-60
Timestamp: 2025-06-30T14:43:08.138Z
Learning: Issue #1212 in opendatahub-io/notebooks demonstrates that missing securityContext configuration (allowPrivilegeEscalation, runAsNonRoot, seccompProfile) causes runtime pods to fail reaching ready state and timeout after 300s on OpenShift due to PodSecurity policy violations.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#0
File: :0-0
Timestamp: 2025-06-16T11:32:09.203Z
Learning: Runtime deployment tests in opendatahub-io/notebooks may show PodSecurity warnings about allowPrivilegeEscalation, capabilities, runAsNonRoot, and seccompProfile settings. These warnings occur on OpenShift but not on GitHub Actions because GitHub Actions uses upstream Kubernetes without SecurityContextConstraints (SCC).
Learnt from: jiridanek
PR: opendatahub-io/notebooks#0
File: :0-0
Timestamp: 2025-06-16T11:32:09.203Z
Learning: OpenShift CI infrastructure issues in opendatahub-io/notebooks can manifest as "ContainerFailed one or more containers exited" errors in release steps, or as "Entrypoint received interrupt: terminated" messages when pods are killed during CI runs. These are typically infrastructure-level issues rather than code problems.
jupyter/rocm/tensorflow/ubi9-python-3.12/Pipfile (7)
Learnt from: jiridanek
PR: opendatahub-io/notebooks#0
File: :0-0
Timestamp: 2025-06-16T11:32:09.203Z
Learning: In the opendatahub-io/notebooks repository, there is a known issue with missing `runtimes/rocm/pytorch/ubi9-python-3.11/kustomize/base/kustomization.yaml` file that causes rocm runtime tests to fail with "no such file or directory" error. This is tracked in JIRA RHOAIENG-22044 and was intended to be fixed in PR #1015.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#0
File: :0-0
Timestamp: 2025-07-01T10:41:56.419Z
Learning: jiridanek's team uses containerized dependency locking for cross-platform compatibility in opendatahub-io/notebooks. They run `pipenv lock` inside UBI9 containers with specific platform arguments (`--platform=linux/amd64 --python-version 3.12`) to avoid host OS dependency conflicts when generating Pipfile.lock files.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#1218
File: jupyter/datascience/ubi9-python-3.11/Pipfile:34-36
Timestamp: 2025-06-28T14:13:27.890Z
Learning: In the opendatahub-io/notebooks repository, the dependency pinning strategy follows a deliberate pattern: core `jupyterlab` package uses exact pinning (==) across all notebook images to ensure UI consistency, while JupyterLab extensions and all server components (jupyter-server, jupyter-server-proxy, jupyter-server-terminals) use compatible release (~=) pinning to allow automatic security updates and bug fixes while maintaining API compatibility.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#1218
File: jupyter/trustyai/ubi9-python-3.11/Pipfile:49-49
Timestamp: 2025-06-28T14:15:41.168Z
Learning: The jupyter-bokeh package was previously pinned to version 3.0.5 in the TrustyAI notebook image due to compatibility requirements with TrustyAI components, as indicated by the comment "Should be pinned down to this version in order to be compatible with trustyai" that was removed in this update.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#0
File: :0-0
Timestamp: 2025-07-01T10:41:56.419Z
Learning: In the opendatahub-io/notebooks repository, TensorFlow packages with `extras = ["and-cuda"]` can cause build conflicts on macOS due to platform-specific CUDA packages. When the Dockerfile installs CUDA system-wide, removing the extras and letting TensorFlow find CUDA at runtime resolves these conflicts.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#1218
File: jupyter/trustyai/ubi9-python-3.11/Pipfile:49-49
Timestamp: 2025-06-28T14:21:09.429Z
Learning: The jupyter-bokeh pinning to 3.0.5 in TrustyAI notebook image was not due to TrustyAI code compatibility issues, but because the trustyai package itself explicitly declares jupyter-bokeh~=3.0.5 as a hard dependency, causing pip dependency resolution conflicts when trying to upgrade to jupyter-bokeh 4.x.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#1218
File: jupyter/trustyai/ubi9-python-3.11/Pipfile:49-49
Timestamp: 2025-06-28T14:21:09.429Z
Learning: TrustyAI explicitly declares jupyter-bokeh~=3.0.5 as a hard dependency in both their requirements.txt and pyproject.toml files, with no open issues tracking jupyter-bokeh 4.x compatibility. This creates an unresolvable pip dependency conflict when trying to upgrade jupyter-bokeh to 4.x in notebook images that include TrustyAI.
jupyter/rocm/tensorflow/ubi9-python-3.12/test/test_notebook.ipynb (7)
Learnt from: jiridanek
PR: opendatahub-io/notebooks#0
File: :0-0
Timestamp: 2025-06-16T11:32:09.203Z
Learning: In the opendatahub-io/notebooks repository, there is a known issue with missing `runtimes/rocm/pytorch/ubi9-python-3.11/kustomize/base/kustomization.yaml` file that causes rocm runtime tests to fail with "no such file or directory" error. This is tracked in JIRA RHOAIENG-22044 and was intended to be fixed in PR #1015.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#1247
File: .github/workflows/build-notebooks-TEMPLATE.yaml:50-53
Timestamp: 2025-07-01T14:36:52.852Z
Learning: In the opendatahub-io/notebooks repository, the test runner's Python version (configured in GitHub Actions UV setup) intentionally doesn't need to match the Python version of the container images being tested. jiridanek's team uses Python 3.12 for running tests while images may use different Python versions (like 3.11), and this approach works fine since the test code is separate from the application code running inside the containers.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#1218
File: jupyter/trustyai/ubi9-python-3.11/Pipfile:49-49
Timestamp: 2025-06-28T14:15:41.168Z
Learning: TrustyAI's jupyter-bokeh was pinned to 3.0.5 due to compatibility requirements with TrustyAI's visualization components, but the actual deployed version in requirements.txt shows 3.0.7, indicating incremental testing. The upgrade to 4.0.5 in this PR represents the completion of a gradual migration strategy from the 3.x series after confirming compatibility with Bokeh 3.7.3.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#0
File: :0-0
Timestamp: 2025-07-01T10:41:56.419Z
Learning: In the opendatahub-io/notebooks repository, TensorFlow packages with `extras = ["and-cuda"]` can cause build conflicts on macOS due to platform-specific CUDA packages. When the Dockerfile installs CUDA system-wide, removing the extras and letting TensorFlow find CUDA at runtime resolves these conflicts.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#1218
File: jupyter/trustyai/ubi9-python-3.11/Pipfile:49-49
Timestamp: 2025-06-28T14:15:41.168Z
Learning: The jupyter-bokeh package was previously pinned to version 3.0.5 in the TrustyAI notebook image due to compatibility requirements with TrustyAI components, as indicated by the comment "Should be pinned down to this version in order to be compatible with trustyai" that was removed in this update.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#0
File: :0-0
Timestamp: 2025-06-16T11:32:09.203Z
Learning: Test failures in opendatahub-io/notebooks should be categorized as either repository bugs (code/configuration issues) or infrastructure issues (CI platform problems) for proper diagnosis and resolution.
Learnt from: jiridanek
PR: opendatahub-io/notebooks#1218
File: jupyter/rocm/pytorch/ubi9-python-3.11/requirements.txt:987-989
Timestamp: 2025-06-28T15:06:03.344Z
Learning: In the opendatahub-io/notebooks repository, checks for broken links and missing files in kustomization manifests are already performed by `ci/kustomize.sh`, which is invoked from `.github/workflows/code-quality.yaml` (lines 112–116). No additional pytest is needed for this purpose.
🪛 Checkov (3.2.334)
jupyter/rocm/tensorflow/ubi9-python-3.12/kustomize/base/statefulset.yaml
[MEDIUM] 2-60: Containers should not run with allowPrivilegeEscalation
(CKV_K8S_20)
[MEDIUM] 2-60: Minimize the admission of root containers
(CKV_K8S_23)
jupyter/rocm/tensorflow/ubi9-python-3.12/kustomize/base/kustomization.yaml
Show resolved
Hide resolved
jupyter/rocm/tensorflow/ubi9-python-3.12/kustomize/base/statefulset.yaml
Show resolved
Hide resolved
jupyter/rocm/tensorflow/ubi9-python-3.12/kustomize/base/statefulset.yaml
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add your new image to Makefile. There are two places. Uncommenting the relevant line(s) should do the trick. I'll try to ask rabbit if it can give precise instructions.
@coderabbitai the new python 3.12 image has to be added in the two lists in Makefile. For Pipfile lock updates and for all-images target so that GitHub actions CI runs on it. Can you please explain how this is to be accomplished?
@coderabbitai the new python 3.12 image has to be added in the two lists in Makefile. For Pipfile lock updates and for all-images target so that GitHub actions CI runs on it. Can you please explain how this is to be accomplished? |
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain@jiridanek, I'll help you identify where to add the new Python 3.12 ROCm TensorFlow image in the Makefile. Let me examine the current Makefile structure first. 🏁 Script executed:
Length of output: 23173 Perfect! I can see exactly what needs to be done. The Makefile already has the infrastructure in place for the ROCm TensorFlow Python 3.12 image, but the entries are currently commented out. You need to uncomment two specific lines: 1. For Pipfile lock updates (around line 445):In the # jupyter/rocm/tensorflow/ubi9-python-$(PYTHON_VERSION) So it becomes: else ifeq ($(PYTHON_VERSION), 3.12)
BASE_DIRS := \
jupyter/minimal/ubi9-python-$(PYTHON_VERSION) \
jupyter/datascience/ubi9-python-$(PYTHON_VERSION) \
jupyter/pytorch/ubi9-python-$(PYTHON_VERSION) \
jupyter/tensorflow/ubi9-python-$(PYTHON_VERSION) \
jupyter/rocm/pytorch/ubi9-python-$(PYTHON_VERSION) \
jupyter/rocm/tensorflow/ubi9-python-$(PYTHON_VERSION)
# jupyter/trustyai/ubi9-python-$(PYTHON_VERSION)
# codeserver/ubi9-python-$(PYTHON_VERSION) 2. For all-images target (around line 515):In the # rocm-jupyter-tensorflow-ubi9-python-$(RELEASE_PYTHON_VERSION) So it becomes: else ifeq ($(RELEASE_PYTHON_VERSION), 3.12)
all-images: \
jupyter-minimal-ubi9-python-$(RELEASE_PYTHON_VERSION) \
jupyter-datascience-ubi9-python-$(RELEASE_PYTHON_VERSION) \
cuda-jupyter-minimal-ubi9-python-$(RELEASE_PYTHON_VERSION) \
cuda-jupyter-tensorflow-ubi9-python-$(RELEASE_PYTHON_VERSION) \
cuda-jupyter-pytorch-ubi9-python-$(RELEASE_PYTHON_VERSION) \
rocm-jupyter-pytorch-ubi9-python-$(RELEASE_PYTHON_VERSION) \
rocm-jupyter-tensorflow-ubi9-python-$(RELEASE_PYTHON_VERSION) These changes will:
The build target 🧠 Learnings used
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
… requirements error
…erfile as these aren't available on ubi9+epel9
…ccording to Documentation in TensorFlow Dockerfile ``` Error: Problem: package rocm-opencl-sdk-6.4.3.60403-128.el9.x86_64 from ROCm requires rocm-opencl-devel = 2.0.0.60403-128.el9, but none of the providers can be installed - package rocm-opencl-devel-2.0.0.60403-128.el9.x86_64 from ROCm requires rocm-opencl, but none of the providers can be installed - conflicting requests - nothing provides opencl-filesystem needed by rocm-opencl-5.4.4-1.el9.x86_64 from epel - nothing provides ocl-icd(x86-64) needed by rocm-opencl-5.4.4-1.el9.x86_64 from epel - nothing provides ocl-icd needed by rocm-opencl-2.0.0.60403-128.el9.x86_64 from ROCm ```
@EmilienM hopefully this is the last time we're doing something like this on our own |
``` ./jupyter/rocm/tensorflow/ubi9-python-3.12/Dockerfile.rocm:84:1 unexpected 'R' expecting a new line followed by the next instruction ```
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (5)
jupyter/rocm/tensorflow/ubi9-python-3.12/kustomize/base/statefulset.yaml (2)
18-60
: Missing minimal pod/container securityContext (tracked)Hardening fields like runAsNonRoot, allowPrivilegeEscalation, and seccompProfile are missing. This is already tracked across images and managed centrally; no need to block this PR.
Reference: Issue #1212 (repository-wide pod security hardening).
12-16
: Empty selector and template labels will be rejected by KubernetesThe selector must match labels on the pod template. Use a non-empty matchLabels and mirror it on the template.
Apply:
- selector: {} + selector: + matchLabels: + app: notebook serviceName: notebook template: metadata: - labels: {} + labels: + app: notebookjupyter/rocm/tensorflow/ubi9-python-3.12/test/test_notebook.ipynb (3)
22-29
: Improve load_expected_versions() error handling (tracked)Lacks handling for missing file or invalid JSON. Already tracked for systematic fix across notebooks.
Reference: Issue #1266.
31-34
: Guard get_expected_version() for missing keys (tracked)Accessing a missing key leads to a TypeError in re.sub. Handle not-found and empty versions. Already tracked.
Reference: Issue #1243.
80-95
: Verify TensorBoard logs are actually created (tracked)The callback is used, but the test doesn’t check that event files were written. Prefer a TemporaryDirectory and assert non-empty contents. Already tracked.
Reference: Issue #1263.
🧹 Nitpick comments (5)
runtimes/rocm-tensorflow/ubi9-python-3.12/Pipfile (1)
13-15
: Direct wheel reference is correct for AMD’s manylinux repo; drop redundant version to avoid driftAMD’s repo is not a PyPI index, so using file=… with a hash is the right approach. Minor: keeping both version==… and file=… is redundant and risks divergence in the future. Suggest removing the version field and relying on the pinned wheel URL + hash.
Apply within this hunk:
- tensorflow-rocm = {version = "==2.18.1", file = "https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4/tensorflow_rocm-2.18.1-cp312-cp312-manylinux_2_28_x86_64.whl", hashes = ["sha256:814a5e4842b0c92d63c7a0bb4df0baf51ff2db8615535d83fe8958204b840598"]} + tensorflow-rocm = {file = "https://repo.radeon.com/rocm/manylinux/rocm-rel-6.4/tensorflow_rocm-2.18.1-cp312-cp312-manylinux_2_28_x86_64.whl", hashes = ["sha256:814a5e4842b0c92d63c7a0bb4df0baf51ff2db8615535d83fe8958204b840598"]}jupyter/rocm/tensorflow/ubi9-python-3.12/kustomize/base/statefulset.yaml (1)
23-31
: Auth disabled in NOTEBOOK_ARGS — clarify test-only intentToken/password are set to empty strings. If this manifest is used only in tests, consider adding an inline comment to avoid accidental reuse in production contexts.
Example:
- name: NOTEBOOK_ARGS value: |- + # NOTE: Authentication intentionally disabled for test deployments. --ServerApp.port=8888 --ServerApp.token='' --ServerApp.password=''
jupyter/rocm/tensorflow/ubi9-python-3.12/test/test_notebook.ipynb (3)
15-15
: Remove unused importThe tensorboard module is not used (the callback is referenced via tf.keras), and Ruff flags it (F401). Safe to remove.
- "import tensorboard\n",
48-53
: Stabilize tf2onnx conversion by providing an input signatureWithout an input_signature, tf2onnx may rely on inference and can behave inconsistently across versions. Also, from_keras often returns a tuple; capture accordingly.
- " model = tf.keras.Sequential([tf.keras.layers.Dense(1, input_shape=(10,))])\n", - " onnx_model = tf2onnx.convert.from_keras(model)\n", - " \n", - " self.assertTrue(onnx_model is not None)\n", + " model = tf.keras.Sequential([tf.keras.layers.Dense(1, input_shape=(10,))])\n", + " spec = (tf.TensorSpec((None, 10), tf.float32, name=\"input\"),)\n", + " onnx_model, _ = tf2onnx.convert.from_keras(model, input_signature=spec, opset=13)\n", + " self.assertTrue(onnx_model is not None)\n",
66-69
: Fix useless expression; add a minimal assertionThe bare “predictions” expression is a no-op (Ruff B018). Make it validate shape to turn this into an actual check.
- " predictions = model(x_train[:1]).numpy()\n", - " predictions\n", + " predictions = model(x_train[:1]).numpy()\n", + " self.assertEqual(predictions.shape, (1, 10), \"Predictions should be (1, 10)\")\n",Note: Broader assertions for accuracy/loss are already tracked separately (Issue #1261).
@dibryant: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Review done on slack, https://redhat-internal.slack.com/archives/C096ZR053RQ/p1755633454094739 |
/retest |
/kf-build odh-workbench-jupyter-tensorflow-rocm-py312-ubi9-on-pull-request |
Fixes for https://issues.redhat.com/browse/RHOAIENG-27434
Description
Create Rocm Tensorflow Python 3.12
How Has This Been Tested?
Merge criteria:
Summary by CodeRabbit
New Features
Chores
Tests