Skip to content
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions ci/cached-builds/gen_gha_matrix_jobs.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,8 @@
"runtime-minimal-ubi9-python-3.12",
"jupyter-minimal-ubi9-python-3.11",
"jupyter-minimal-ubi9-python-3.12",
"runtime-datascience-ubi9-python-3.11",
"runtime-datascience-ubi9-python-3.12",
# add more here
}

Expand Down
127 changes: 120 additions & 7 deletions runtimes/datascience/ubi9-python-3.12/Dockerfile.cpu
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ ARG BASE_IMAGE
####################
FROM ${BASE_IMAGE} AS cpu-base

ARG TARGETARCH

WORKDIR /opt/app-root/bin

# OS Packages needs to be installed as root
Expand All @@ -19,7 +21,40 @@ RUN dnf -y upgrade --refresh --best --nodocs --noplugins --setopt=install_weak_d
# upgrade first to avoid fixable vulnerabilities end

# Install useful OS packages
RUN dnf install -y mesa-libGL skopeo libxcrypt-compat && dnf clean all && rm -rf /var/cache/yum
RUN --mount=type=cache,target=/var/cache/dnf \
echo "Building for architecture: ${TARGETARCH}" && \
PACKAGES="mesa-libGL skopeo libxcrypt-compat" && \
# Additional dev tools only for s390x
if [ "$TARGETARCH" = "s390x" ]; then \
PACKAGES="$PACKAGES gcc gcc-c++ make openssl-devel autoconf automake libtool cmake python3-devel pybind11-devel openblas-devel unixODBC-devel openssl zlib-devel"; \
fi && \
if [ -n "$PACKAGES" ]; then \
dnf install -y $PACKAGES && \
dnf clean all && rm -rf /var/cache/yum; \
fi

# For s390x only, set ENV vars and install Rust
RUN if [ "$TARGETARCH" = "s390x" ]; then \
# Install Rust and set up environment
mkdir -p /opt/.cargo && \
export HOME=/root && \
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs -o rustup-init.sh && \
chmod +x rustup-init.sh && \
CARGO_HOME=/opt/.cargo HOME=/root ./rustup-init.sh -y --no-modify-path && \
rm -f rustup-init.sh && \
chown -R 1001:0 /opt/.cargo && \
# Set environment variables
echo 'export PATH=/opt/.cargo/bin:$PATH' >> /etc/profile.d/cargo.sh && \
echo 'export CARGO_HOME=/opt/.cargo' >> /etc/profile.d/cargo.sh && \
echo 'export GRPC_PYTHON_BUILD_SYSTEM_OPENSSL=1' >> /etc/profile.d/cargo.sh; \
fi
Comment on lines +36 to +50
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Download-and-execute of rustup-init.sh lacks integrity checks
Executing a remote script without verifying a checksum is a classic supply-chain risk and contradicts repo policy tracked in issue #1241. At minimum verify the SHA256 or import the official GPG key.

-    curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs -o rustup-init.sh && \
+    curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs -o rustup-init.sh && \
+    echo "<EXPECTED_SHA256>  rustup-init.sh" | sha256sum -c - && \
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# For s390x only, set ENV vars and install Rust
RUN if [ "$TARGETARCH" = "s390x" ]; then \
# Install Rust and set up environment
mkdir -p /opt/.cargo && \
export HOME=/root && \
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs -o rustup-init.sh && \
chmod +x rustup-init.sh && \
CARGO_HOME=/opt/.cargo HOME=/root ./rustup-init.sh -y --no-modify-path && \
rm -f rustup-init.sh && \
chown -R 1001:0 /opt/.cargo && \
# Set environment variables
echo 'export PATH=/opt/.cargo/bin:$PATH' >> /etc/profile.d/cargo.sh && \
echo 'export CARGO_HOME=/opt/.cargo' >> /etc/profile.d/cargo.sh && \
echo 'export GRPC_PYTHON_BUILD_SYSTEM_OPENSSL=1' >> /etc/profile.d/cargo.sh; \
fi
# For s390x only, set ENV vars and install Rust
RUN if [ "$TARGETARCH" = "s390x" ]; then \
# Install Rust and set up environment
mkdir -p /opt/.cargo && \
export HOME=/root && \
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs -o rustup-init.sh && \
echo "<EXPECTED_SHA256> rustup-init.sh" | sha256sum -c - && \
chmod +x rustup-init.sh && \
CARGO_HOME=/opt/.cargo HOME=/root ./rustup-init.sh -y --no-modify-path && \
rm -f rustup-init.sh && \
chown -R 1001:0 /opt/.cargo && \
# Set environment variables
echo 'export PATH=/opt/.cargo/bin:$PATH' >> /etc/profile.d/cargo.sh && \
echo 'export CARGO_HOME=/opt/.cargo' >> /etc/profile.d/cargo.sh && \
echo 'export GRPC_PYTHON_BUILD_SYSTEM_OPENSSL=1' >> /etc/profile.d/cargo.sh; \
fi
🧰 Tools
🪛 GitHub Actions: Code static analysis

[warning] 27-27: Hadolint info SC2016: Expressions don't expand in single quotes, use double quotes for that.

🤖 Prompt for AI Agents
In runtimes/datascience/ubi9-python-3.12/Dockerfile.cpu around lines 26 to 40,
the rustup-init.sh script is downloaded and executed without verifying its
integrity, which poses a security risk. To fix this, add a step to verify the
SHA256 checksum of the downloaded rustup-init.sh file against the official
checksum before executing it. Alternatively, import and verify the script using
the official GPG key if available. Only proceed with execution if the
verification passes, otherwise fail the build.


# Set python alternatives only for s390x (not needed for other arches)
RUN if [ "$TARGETARCH" = "s390x" ]; then \
alternatives --install /usr/bin/python python /usr/bin/python3.12 1 && \
alternatives --install /usr/bin/python3 python3 /usr/bin/python3.12 1 && \
python --version && python3 --version; \
fi

# Other apps and tools installed as default user
USER 1001
Expand All @@ -35,11 +70,69 @@ RUN curl -L https://mirror.openshift.com/pub/openshift-v4/$(uname -m)/clients/oc
rm -f /tmp/openshift-client-linux.tar.gz
# Install the oc client end

##############################
# wheel-builder stage #
# NOTE: Only used in s390x
##############################
FROM cpu-base AS s390x-builder

ARG TARGETARCH
USER 0
WORKDIR /tmp/build-wheels

# Build pyarrow optimized for s390x
RUN --mount=type=cache,target=/root/.cache/pip \
--mount=type=cache,target=/root/.cache/dnf \
if [ "$TARGETARCH" = "s390x" ]; then \
# Install build dependencies (shared for pyarrow and onnx)
dnf install -y cmake make gcc-c++ pybind11-devel wget && \
dnf clean all && \
# Build and collect pyarrow wheel
git clone --depth 1 https://github.com/apache/arrow.git && \
cd arrow/cpp && \
mkdir release && cd release && \
cmake -DCMAKE_BUILD_TYPE=Release \
Comment on lines +91 to +94
Copy link
Contributor

@coderabbitai coderabbitai bot Aug 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Pin Apache Arrow revision for reproducible builds.
Unpinned git clone fetches whatever HEAD is current, making image rebuilds nondeterministic.

-        git clone --depth 1 https://github.com/apache/arrow.git && \
+        ARROW_TAG=apache-arrow-16.1.0 && \
+        git clone --branch ${ARROW_TAG} --depth 1 https://github.com/apache/arrow.git && \
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
git clone --depth 1 https://github.com/apache/arrow.git && \
cd arrow/cpp && \
mkdir release && cd release && \
cmake -DCMAKE_BUILD_TYPE=Release \
ARROW_TAG=apache-arrow-16.1.0 && \
git clone --branch ${ARROW_TAG} --depth 1 https://github.com/apache/arrow.git && \
cd arrow/cpp && \
mkdir release && cd release && \
cmake -DCMAKE_BUILD_TYPE=Release \
🤖 Prompt for AI Agents
In runtimes/datascience/ubi9-python-3.12/Dockerfile.cpu around lines 80 to 83,
the git clone command for Apache Arrow does not specify a commit or tag, causing
nondeterministic builds. Modify the Dockerfile to clone a specific, fixed
revision of the Apache Arrow repository by adding a checkout command after
cloning to pin the build to that commit, ensuring reproducible image builds.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@codrabbitai create issue for this, need to pin it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

-DCMAKE_INSTALL_PREFIX=/usr/local \
-DARROW_PYTHON=ON \
-DARROW_PARQUET=ON \
-DARROW_ORC=ON \
-DARROW_FILESYSTEM=ON \
-DARROW_JSON=ON \
-DARROW_CSV=ON \
-DARROW_DATASET=ON \
-DARROW_DEPENDENCY_SOURCE=BUNDLED \
-DARROW_WITH_LZ4=OFF \
-DARROW_WITH_ZSTD=OFF \
-DARROW_WITH_SNAPPY=OFF \
-DARROW_BUILD_TESTS=OFF \
-DARROW_BUILD_BENCHMARKS=OFF \
Comment on lines +104 to +108
Copy link
Contributor

@coderabbitai coderabbitai bot Sep 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Do not disable core codecs (Snappy/Zstd/LZ4) in Arrow

Disabling these breaks reading most real-world Parquet/Arrow datasets.

-              -DARROW_WITH_LZ4=OFF \
-              -DARROW_WITH_ZSTD=OFF \
-              -DARROW_WITH_SNAPPY=OFF \
+              -DARROW_WITH_LZ4=ON \
+              -DARROW_WITH_ZSTD=ON \
+              -DARROW_WITH_SNAPPY=ON \

With -DARROW_DEPENDENCY_SOURCE=BUNDLED, required codec libs will be built in-tree.

🤖 Prompt for AI Agents
In runtimes/datascience/ubi9-python-3.12/Dockerfile.cpu around lines 99–103, the
build is disabling core Arrow codecs (LZ4, Zstd, Snappy) which prevents reading
common Parquet/Arrow datasets; remove the -DARROW_WITH_LZ4=OFF,
-DARROW_WITH_ZSTD=OFF and -DARROW_WITH_SNAPPY=OFF flags (or set them to ON) so
those codecs are enabled, and ensure ARROW_DEPENDENCY_SOURCE=BUNDLED is set so
the required codec libraries are built in-tree.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coderabbitai sounds legit, create github issue

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

I'll create a GitHub issue for the Arrow codec configuration problem.

@jiridanek Issue created! This addresses the Arrow codec configuration problem that would prevent users from reading most compressed Parquet datasets on s390x. The solution should be straightforward since the build already uses bundled dependencies.


🏁 Script executed:

#!/bin/bash

# Create GitHub issue for Arrow codec configuration problem
gh issue create \
  --title "Enable core Arrow codecs in s390x pyarrow build for datascience runtime" \
  --body "## Problem Description

The s390x pyarrow build in \`runtimes/datascience/ubi9-python-3.12/Dockerfile.cpu\` (lines 99-103) explicitly disables core Arrow codecs:

\`\`\`dockerfile
-DARROW_WITH_LZ4=OFF \\
-DARROW_WITH_ZSTD=OFF \\
-DARROW_WITH_SNAPPY=OFF \\
\`\`\`

This configuration prevents reading most real-world Parquet and Arrow datasets that use these common compression formats, significantly limiting the functionality of the datascience runtime on s390x architecture.

## Impact Analysis

- **Data Compatibility**: Users cannot read Parquet files compressed with LZ4, Zstd, or Snappy (the most common compression formats)
- **Runtime Failures**: Applications attempting to read compressed datasets will fail with codec-related errors
- **User Experience**: s390x datascience runtime becomes significantly less capable than other architectures

## Root Cause

The codecs were likely disabled to avoid build complexity or dependencies, but with \`-DARROW_DEPENDENCY_SOURCE=BUNDLED\` already set, the required codec libraries should be built in-tree without requiring additional system dependencies.

## Solution

Enable the core codecs in the Arrow build configuration:

\`\`\`dockerfile
# Change from:
-DARROW_WITH_LZ4=OFF \\
-DARROW_WITH_ZSTD=OFF \\
-DARROW_WITH_SNAPPY=OFF \\

# To:
-DARROW_WITH_LZ4=ON \\
-DARROW_WITH_ZSTD=ON \\
-DARROW_WITH_SNAPPY=ON \\
\`\`\`

## Acceptance Criteria

- [ ] Core Arrow codecs (LZ4, Zstd, Snappy) are enabled in s390x pyarrow build
- [ ] s390x datascience runtime can successfully read Parquet files compressed with these formats
- [ ] Build time impact is acceptable (should be minimal with BUNDLED dependencies)
- [ ] No regression in build success rate for s390x architecture

## Files Affected

- \`runtimes/datascience/ubi9-python-3.12/Dockerfile.cpu\`

## Context

Identified during PR #1513 review: https://github.com/opendatahub-io/notebooks/pull/1513#discussion_r2314568303

The current implementation prioritizes build simplicity over runtime functionality, but enabling these codecs should not introduce significant complexity given the bundled dependency strategy." \
  --assignee jiridanek

Length of output: 2280


@jiridanek Issue #2305 created successfully! The issue comprehensively covers the Arrow codec configuration problem, including the impact on data compatibility and the straightforward solution to enable LZ4, Zstd, and Snappy codecs in the s390x pyarrow build.

Link: #2305


✏️ Learnings added
Learnt from: jiridanek
PR: opendatahub-io/notebooks#1513
File: runtimes/datascience/ubi9-python-3.12/Dockerfile.cpu:104-108
Timestamp: 2025-09-05T10:07:53.439Z
Learning: jiridanek requested GitHub issue creation for Arrow codec configuration problem during PR #1513 review. Issue #2305 was created addressing disabled core Arrow codecs (LZ4, Zstd, Snappy) in s390x pyarrow build that prevents reading compressed Parquet/Arrow datasets. The issue includes comprehensive problem description covering data compatibility impact, detailed solution enabling codecs with BUNDLED dependencies, clear acceptance criteria for functionality verification, and proper context linking to PR #1513 review comment, assigned to jiridanek.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: grdryn
PR: opendatahub-io/notebooks#1320
File: rstudio/rhel9-python-3.11/Dockerfile.cuda:40-42
Timestamp: 2025-07-04T17:08:02.399Z
Learning: In the opendatahub-io/notebooks repository, when using multi-architecture Dockerfiles with BuildKit, the implicit build argument TARGETARCH is automatically available in the global scope for FROM instructions without explicit declaration. However, if TARGETARCH is used within a build stage, it must be declared explicitly within that stage. The current placement pattern (declaring ARG TARGETARCH after FROM instructions that use it) is correct for modern Docker/Podman/Buildah environments and does not require compatibility with older Docker versions.

Learnt from: jiridanek
PR: opendatahub-io/notebooks#1720
File: jupyter/pytorch+llmcompressor/ubi9-python-3.11/requirements.txt:2659-2680
Timestamp: 2025-08-07T12:41:48.997Z
Learning: For opendatahub-io/notebooks, rpds-py 0.27.0 provides manylinux wheels for Python 3.11 and 3.12 on x86_64 and aarch64, so no Rust build is needed for these platforms. For s390x and ppc64le, wheels are not available, so a Rust build stage or version pinning is required if those images are built.

Learnt from: grdryn
PR: opendatahub-io/notebooks#1320
File: jupyter/minimal/ubi9-python-3.11/Dockerfile.cuda:29-38
Timestamp: 2025-07-04T17:07:52.656Z
Learning: In the opendatahub-io/notebooks repository, modern Docker with BuildKit automatically provides build arguments like TARGETARCH in the global scope for FROM instructions, but these arguments must be explicitly declared with ARG statements inside build stages where they will be used. The ARG declaration should be placed within the stage that uses it, not moved to the global scope, as this is the correct pattern for modern Docker/Podman/Buildah environments.

Learnt from: jiridanek
PR: opendatahub-io/notebooks#2145
File: runtimes/pytorch+llmcompressor/ubi9-python-3.12/Dockerfile.cuda:152-159
Timestamp: 2025-08-27T14:49:24.112Z
Learning: jiridanek requested GitHub issue creation for CUDA version alignment in pytorch+llmcompressor runtime during PR #2145 review. Issue #2148 was created addressing the mismatch between Dockerfile CUDA 12.6 and pylock.toml cu124 PyTorch wheels. The issue includes comprehensive problem description covering affected files (runtimes/pytorch+llmcompressor/ubi9-python-3.12/Dockerfile.cuda and pylock.toml), detailed solution with PyTorch index URL update from cu124 to cu126, lock regeneration steps using uv, clear acceptance criteria for wheel alignment verification, and proper context linking to PR #2145 review comment, assigned to jiridanek.

Learnt from: jiridanek
PR: opendatahub-io/notebooks#2145
File: runtimes/pytorch+llmcompressor/ubi9-python-3.12/Dockerfile.cuda:152-159
Timestamp: 2025-08-27T14:49:24.112Z
Learning: jiridanek requested GitHub issue creation for CUDA version alignment in pytorch+llmcompressor runtime during PR #2145 review. Issue addresses mismatch between Dockerfile CUDA 12.6 and pylock.toml cu124 PyTorch wheels. Comprehensive issue created with detailed problem description covering affected files, current state analysis, step-by-step solution including PyTorch index URL update and lock regeneration, clear acceptance criteria for cu126 wheel verification, and proper context linking to PR #2145 review comment.

Learnt from: jiridanek
PR: opendatahub-io/notebooks#2145
File: jupyter/trustyai/ubi9-python-3.12/Dockerfile.cpu:134-140
Timestamp: 2025-08-27T15:05:31.656Z
Learning: jiridanek requested GitHub issue creation for CUDA wheel optimization in TrustyAI CPU image during PR #2145 review. The CPU Dockerfile currently uses pylock.toml with CUDA-enabled PyTorch wheels (torch==2.6.0+cu126) which was previously discussed with harshad16 and grdryn but deferred. Issue created with comprehensive problem analysis covering unnecessary CUDA wheels in CPU-only image, multiple solution options including lock regeneration and separate CPU/CUDA files, clear acceptance criteria for wheel optimization verification, and proper context linking to PR #2145 review comment.

Learnt from: jiridanek
PR: opendatahub-io/notebooks#1320
File: jupyter/pytorch/ubi9-python-3.12/Dockerfile.cuda:65-66
Timestamp: 2025-07-09T12:31:02.033Z
Learning: jiridanek requested GitHub issue creation for MSSQL repo file hardcoding problem during PR #1320 review. Issue #1363 was created and updated with comprehensive problem description covering hardcoded x86_64 MSSQL repo files breaking multi-architecture builds across 10 affected Dockerfiles (including datascience, CUDA, ROCm, and TrustyAI variants), detailed root cause analysis, three solution options with code examples, clear acceptance criteria for all image types, implementation guidance following established multi-architecture patterns, and proper context linking, continuing the established pattern of systematic code quality improvements through detailed issue tracking.

Learnt from: jiridanek
PR: opendatahub-io/notebooks#1306
File: jupyter/trustyai/ubi9-python-3.12/kustomize/base/kustomization.yaml:8-12
Timestamp: 2025-07-08T19:09:48.746Z
Learning: jiridanek requested GitHub issue creation for misleading CUDA prefix in TrustyAI image tags during PR #1306 review, affecting both Python 3.11 and 3.12 versions. Issue #1338 was created with comprehensive problem description covering both affected images, repository pattern analysis comparing correct vs incorrect naming conventions, clear solution with code examples, detailed acceptance criteria, and proper context linking, continuing the established pattern of systematic code quality improvements through detailed issue tracking.

Learnt from: jiridanek
PR: opendatahub-io/notebooks#2145
File: runtimes/pytorch/ubi9-python-3.12/pyproject.toml:25-30
Timestamp: 2025-08-27T15:39:58.693Z
Learning: jiridanek requested GitHub issue creation for missing OS dependencies for database connectors during PR #2145 review. Issue #2160 was created addressing the missing unixODBC and postgresql-libs packages required by pyodbc and psycopg Python dependencies in runtimes/pytorch/ubi9-python-3.12/Dockerfile.cuda. The issue includes comprehensive problem description covering runtime failure risks, detailed solution with Dockerfile updates, clear acceptance criteria for package installation and verification, and proper context linking to PR #2145 review comment, assigned to jiridanek.

Learnt from: jiridanek
PR: opendatahub-io/notebooks#1909
File: runtimes/pytorch+llmcompressor/ubi9-python-3.11/Dockerfile.cuda:11-15
Timestamp: 2025-08-12T08:40:55.286Z
Learning: jiridanek requested GitHub issue creation for redundant CUDA upgrade optimization during PR #1909 review. Issue covers duplicate yum/dnf upgrade commands in cuda-base stages that execute after base stages already performed comprehensive upgrades, causing unnecessary CI latency and build inefficiency across multiple CUDA Dockerfiles. The solution requires investigating NVIDIA upstream documentation requirements before removing redundant upgrades, with systematic testing of all CUDA variants and performance measurement. Issue follows established pattern of comprehensive problem analysis, multiple solution options, detailed acceptance criteria, and proper context linking.

Learnt from: jiridanek
PR: opendatahub-io/notebooks#1306
File: jupyter/trustyai/ubi9-python-3.12/kustomize/base/kustomization.yaml:8-12
Timestamp: 2025-07-08T19:09:48.746Z
Learning: jiridanek requested GitHub issue creation for misleading CUDA prefix in TrustyAI image tags during PR #1306 review. Issue was created with comprehensive problem description covering both Python 3.11 and 3.12 versions, repository pattern analysis showing correct vs incorrect naming, clear solution with code examples, detailed acceptance criteria, and proper context linking, continuing the established pattern of systematic code quality improvements through detailed issue tracking.

Learnt from: jiridanek
PR: opendatahub-io/notebooks#0
File: :0-0
Timestamp: 2025-08-05T17:24:08.616Z
Learning: jiridanek requested PR review for #1521 covering s390x architecture support improvements, demonstrating continued focus on systematic multi-architecture compatibility enhancements in the opendatahub-io/notebooks repository through clean implementation with centralized configuration, proper CI integration, and architecture-aware testing patterns.

Learnt from: atheo89
PR: opendatahub-io/notebooks#1258
File: codeserver/ubi9-python-3.11/Dockerfile.cpu:32-32
Timestamp: 2025-07-07T11:08:48.524Z
Learning: atheo89 requested GitHub issue creation for multi-architecture Dockerfile improvements during PR #1258 review, specifically for enhancing structural consistency across Docker stages, replacing $(uname -m) with ${TARGETARCH} for cross-architecture builds, and adding OCI-compliant metadata labels. Issue #1332 was created with comprehensive problem description, phased implementation approach, detailed acceptance criteria, implementation guidance with code examples, and proper context linking, continuing the established pattern of systematic code quality improvements.

Learnt from: jiridanek
PR: opendatahub-io/notebooks#1320
File: jupyter/pytorch/ubi9-python-3.12/Dockerfile.cuda:42-52
Timestamp: 2025-07-09T12:29:56.162Z
Learning: jiridanek requested GitHub issue creation for OpenShift client architecture mapping problem affecting 29 Dockerfiles during PR #1320 review. Issue was created with comprehensive analysis covering all affected files using $(uname -m) returning 'aarch64' but OpenShift mirror expecting 'arm64', systematic solution using BuildKit TARGETARCH mapping with proper amd64→x86_64 and arm64→arm64 conversion, detailed acceptance criteria, and implementation guidance, continuing the established pattern of systematic code quality improvements through detailed issue tracking.

Learnt from: jiridanek
PR: opendatahub-io/notebooks#1320
File: scripts/buildinputs/dockerfile.go:35-35
Timestamp: 2025-07-10T15:02:13.228Z
Learning: jiridanek requested GitHub issue creation for improving architecture detection in buildinputs tool during PR #1320 review. The automated issue creation failed despite appearing successful, so comprehensive issue content was provided for manual creation covering qemu-user s390x build failures where runtime.GOARCH returns host architecture instead of target architecture, detailed impact analysis, four solution options (environment variable detection, configuration parameter, platform detection logic, hybrid approach), clear acceptance criteria for multi-architecture support, phased implementation guidance, and proper context linking, continuing the established pattern of systematic code quality improvements through detailed issue tracking.

Learnt from: jiridanek
PR: opendatahub-io/notebooks#1333
File: runtimes/rocm-tensorflow/ubi9-python-3.12/Dockerfile.rocm:50-50
Timestamp: 2025-07-08T19:30:01.738Z
Learning: jiridanek requested GitHub issue creation for multi-architecture support in ROCm TensorFlow image during PR #1333 review. Issue #1346 was created with comprehensive problem description covering hardcoded x86_64 architecture breaking multi-arch support, detailed impact analysis, three solution options (runtime detection, BuildKit TARGETARCH integration, hybrid approach) with pros/cons analysis, comprehensive acceptance criteria covering core requirements and testing, phased implementation guidance, related files identification, and proper context linking, continuing the established pattern of systematic code quality improvements through detailed issue tracking.

Learnt from: jiridanek
PR: opendatahub-io/notebooks#1909
File: runtimes/pytorch+llmcompressor/ubi9-python-3.11/Dockerfile.cuda:11-15
Timestamp: 2025-08-12T08:40:55.286Z
Learning: jiridanek requested GitHub issue creation for redundant CUDA upgrade optimization during PR #1909 review. Analysis revealed all 14 CUDA Dockerfiles contain redundant `yum upgrade -y` commands in cuda-base stages that execute after base stages already performed comprehensive `dnf upgrade` via pre-upgrade blocks, causing unnecessary CI latency and build inefficiency. Issue includes complete scope analysis with specific line numbers, investigation framework requiring NVIDIA upstream documentation review, multiple solution options, comprehensive acceptance criteria covering systematic testing and performance measurement, and proper context linking to PR #1909 review comment.

Learnt from: jiridanek
PR: opendatahub-io/notebooks#1333
File: runtimes/tensorflow/ubi9-python-3.12/Dockerfile.cuda:38-38
Timestamp: 2025-07-08T19:30:20.513Z
Learning: jiridanek requested GitHub issue creation for multi-architecture support in TensorFlow CUDA runtime image during PR #1333 review. Issue was created with comprehensive problem description covering hardcoded NVARCH limitation, multiple solution options using TARGETARCH build argument with architecture mapping, acceptance criteria for multi-architecture builds, implementation guidance with code examples, and proper context linking, continuing the established pattern of systematic code quality improvements through detailed issue tracking.

Learnt from: jiridanek
PR: opendatahub-io/notebooks#1320
File: scripts/buildinputs/dockerfile.go:35-35
Timestamp: 2025-07-10T15:02:13.228Z
Learning: jiridanek requested GitHub issue creation for improving architecture detection in buildinputs tool during PR #1320 review. Issue was created with comprehensive problem description covering qemu-user s390x build failures where runtime.GOARCH returns host architecture instead of target architecture, detailed impact analysis, four solution options (environment variable detection, configuration parameter, platform detection logic, hybrid approach), clear acceptance criteria for multi-architecture support, phased implementation guidance, and proper context linking, continuing the established pattern of systematic code quality improvements through detailed issue tracking.

Learnt from: jiridanek
PR: opendatahub-io/notebooks#1320
File: scripts/buildinputs/dockerfile.go:35-35
Timestamp: 2025-07-10T15:02:13.228Z
Learning: jiridanek requested GitHub issue creation for improving architecture detection in buildinputs tool during PR #1320 review. Issue was created with comprehensive problem description covering qemu-user s390x build failures where runtime.GOARCH returns host architecture instead of target architecture, detailed impact analysis, four solution options (environment variable detection, configuration parameter, platform detection logic, hybrid approach), clear acceptance criteria for multi-architecture support, phased implementation guidance, and proper context linking, continuing the established pattern of systematic code quality improvements through detailed issue tracking.

Learnt from: jiridanek
PR: opendatahub-io/notebooks#1320
File: scripts/buildinputs/dockerfile.go:35-35
Timestamp: 2025-07-10T15:02:13.228Z
Learning: jiridanek requested GitHub issue creation for improving architecture detection in buildinputs tool during PR #1320 review. Issue was created with comprehensive problem description covering qemu-user s390x build failures where runtime.GOARCH returns host architecture instead of target architecture, detailed impact analysis, four solution options (environment variable detection, configuration parameter, platform detection logic, hybrid approach), clear acceptance criteria for multi-architecture support, phased implementation guidance, and proper context linking, continuing the established pattern of systematic code quality improvements through detailed issue tracking.

Learnt from: jiridanek
PR: opendatahub-io/notebooks#2185
File: jupyter/pytorch/ubi9-python-3.12/Dockerfile.cuda:109-117
Timestamp: 2025-08-29T08:48:55.985Z
Learning: jiridanek prefers to implement systematic cleanup improvements through dedicated GitHub issues (like #2076) rather than applying individual point fixes during PR reviews, maintaining consistency with their established pattern of comprehensive code quality improvements.

Learnt from: jiridanek
PR: opendatahub-io/notebooks#2145
File: runtimes/rocm-tensorflow/ubi9-python-3.12/pyproject.toml:75-77
Timestamp: 2025-08-27T15:33:28.871Z
Learning: jiridanek requested GitHub issue creation for multi-architecture environment exploration during PR #2145 review, specifically to investigate architecture-specific environments for ROCm TensorFlow wheel optimization across x86_64, aarch64, ppc64le, and s390x architectures, noting uncertainty about implementation benefits but wanting systematic exploration through issue tracking. Issue #2158 was created with comprehensive analysis covering current limitations, investigation areas, multiple solution options, and clear acceptance criteria.

Learnt from: jiridanek
PR: opendatahub-io/notebooks#1333
File: runtimes/rocm-tensorflow/ubi9-python-3.12/Pipfile:13-15
Timestamp: 2025-07-09T08:07:24.937Z
Learning: jiridanek requested GitHub issue creation for tensorflow_rocm Python 3.12 compatibility problem during PR #1333 review. Issue #1354 was successfully created with comprehensive problem description covering missing cp312 wheels causing build failures, three solution options (upstream TensorFlow, Python 3.11 only, custom build), clear acceptance criteria, and proper context linking, continuing the established pattern of systematic code quality improvements through detailed issue tracking.

Learnt from: jiridanek
PR: opendatahub-io/notebooks#1333
File: runtimes/pytorch/ubi9-python-3.12/utils/bootstrapper.py:619-626
Timestamp: 2025-07-08T19:33:14.340Z
Learning: jiridanek requested GitHub issue creation for Python 3.12 version check bug in bootstrapper.py during PR #1333 review. Issue #1348 was created with comprehensive problem description covering version check exclusion affecting all Python 3.12 runtime images, detailed impact analysis of bootstrapper execution failures, clear solution with code examples, affected files list including all 6 runtime bootstrapper copies, acceptance criteria for testing and verification, implementation notes about code duplication and upstream reporting, and proper context linking, continuing the established pattern of systematic code quality improvements through detailed issue tracking.

Learnt from: jiridanek
PR: opendatahub-io/notebooks#0
File: :0-0
Timestamp: 2025-07-11T11:16:05.131Z
Learning: jiridanek requested GitHub issue creation for RStudio py311 Tekton push pipelines during PR #1379 review. Issue #1384 was successfully created covering two RStudio variants (CPU and CUDA) found in manifests/base/params-latest.env, with comprehensive problem description, implementation requirements following the same pattern as other workbench pipelines, clear acceptance criteria, and proper context linking, continuing the established pattern of systematic code quality improvements through detailed issue tracking.

.. && \
make -j$(nproc) VERBOSE=1 && \
make install -j$(nproc) && \
cd ../../python && \
pip install --no-cache-dir -r requirements-build.txt && \
PYARROW_WITH_PARQUET=1 \
PYARROW_WITH_DATASET=1 \
PYARROW_WITH_FILESYSTEM=1 \
PYARROW_WITH_JSON=1 \
PYARROW_WITH_CSV=1 \
PYARROW_PARALLEL=$(nproc) \
python setup.py build_ext --build-type=release --bundle-arrow-cpp bdist_wheel && \
mkdir -p /tmp/wheels && \
cp dist/pyarrow-*.whl /tmp/wheels/ && \
# Ensure wheels directory exists and has content
ls -la /tmp/wheels/; \
else \
# Create empty wheels directory for non-s390x
mkdir -p /tmp/wheels; \
fi

#######################
# runtime-datascience #
#######################
FROM cpu-base AS runtime-datascience

ARG TARGETARCH
ARG DATASCIENCE_SOURCE_CODE=runtimes/datascience/ubi9-python-3.12

LABEL name="odh-notebook-runtime-datascience-ubi9-python-3.12" \
Expand All @@ -54,17 +147,37 @@ LABEL name="odh-notebook-runtime-datascience-ubi9-python-3.12" \

WORKDIR /opt/app-root/bin

# Install Python packages from requirements.txt
USER 0
# Copy wheels from build stage (s390x only)
COPY --from=s390x-builder /tmp/wheels /tmp/wheels
RUN if [ "$TARGETARCH" = "s390x" ]; then \
pip install --no-cache-dir /tmp/wheels/*.whl && rm -rf /tmp/wheels; \
else \
echo "Skipping wheel install for $TARGETARCH"; \
fi


# Install Python packages from pylock.toml
COPY ${DATASCIENCE_SOURCE_CODE}/pylock.toml ./
# Copy Elyra dependencies for air-gapped enviroment
COPY ${DATASCIENCE_SOURCE_CODE}/utils ./utils/

RUN echo "Installing softwares and packages" && \
# This may have to download and compile some dependencies, and as we don't lock requirements from `build-system.requires`,
# we often don't know the correct hashes and `--require-hashes` would therefore fail on non amd64, where building is common.
uv pip install --strict --no-deps --no-cache --no-config --no-progress --verify-hashes --compile-bytecode --index-strategy=unsafe-best-match --requirements=./pylock.toml && \
# Fix permissions to support pip in Openshift environments \
RUN --mount=type=cache,target=/root/.cache/pip \
echo "Installing softwares and packages" && \
if [ "$TARGETARCH" = "s390x" ]; then \
# For s390x, we need special flags and environment variables for building packages
GRPC_PYTHON_BUILD_SYSTEM_OPENSSL=1 \
CFLAGS="-O3" CXXFLAGS="-O3" \
uv pip install --strict --no-deps --no-cache --no-config --no-progress --verify-hashes --compile-bytecode --index-strategy=unsafe-best-match --requirements=./pylock.toml; \
else \
# This may have to download and compile some dependencies, and as we don't lock requirements from `build-system.requires`,
# we often don't know the correct hashes and `--require-hashes` would therefore fail on non amd64, where building is common.
uv pip install --strict --no-deps --no-cache --no-config --no-progress --verify-hashes --compile-bytecode --index-strategy=unsafe-best-match --requirements=./pylock.toml; \
fi && \
Comment on lines +165 to +176
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Ensure Cargo is on PATH during s390x pip installs

Profile scripts aren’t sourced in non-login shells; builds needing Rust (e.g., rpds-py) may fail.

-        GRPC_PYTHON_BUILD_SYSTEM_OPENSSL=1 \
-        CFLAGS="-O3" CXXFLAGS="-O3" \
-        uv pip install --strict --no-deps --no-cache --no-config --no-progress --verify-hashes --compile-bytecode --index-strategy=unsafe-best-match --requirements=./pylock.toml; \
+        GRPC_PYTHON_BUILD_SYSTEM_OPENSSL=1 \
+        CFLAGS="-O3" CXXFLAGS="-O3" \
+        PATH="/opt/.cargo/bin:$PATH" CARGO_HOME="/opt/.cargo" \
+        uv pip install --strict --no-deps --no-cache --no-config --no-progress --verify-hashes --compile-bytecode --index-strategy=unsafe-best-match --requirements=./pylock.toml; \
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
RUN --mount=type=cache,target=/root/.cache/pip \
echo "Installing softwares and packages" && \
if [ "$TARGETARCH" = "s390x" ]; then \
# For s390x, we need special flags and environment variables for building packages
GRPC_PYTHON_BUILD_SYSTEM_OPENSSL=1 \
CFLAGS="-O3" CXXFLAGS="-O3" \
uv pip install --strict --no-deps --no-cache --no-config --no-progress --verify-hashes --compile-bytecode --index-strategy=unsafe-best-match --requirements=./pylock.toml; \
else \
# This may have to download and compile some dependencies, and as we don't lock requirements from `build-system.requires`,
# we often don't know the correct hashes and `--require-hashes` would therefore fail on non amd64, where building is common.
uv pip install --strict --no-deps --no-cache --no-config --no-progress --verify-hashes --compile-bytecode --index-strategy=unsafe-best-match --requirements=./pylock.toml; \
fi && \
RUN --mount=type=cache,target=/root/.cache/pip \
echo "Installing softwares and packages" && \
if [ "$TARGETARCH" = "s390x" ]; then \
# For s390x, we need special flags and environment variables for building packages
GRPC_PYTHON_BUILD_SYSTEM_OPENSSL=1 \
CFLAGS="-O3" CXXFLAGS="-O3" \
PATH="/opt/.cargo/bin:$PATH" CARGO_HOME="/opt/.cargo" \
uv pip install --strict --no-deps --no-cache --no-config --no-progress --verify-hashes --compile-bytecode --index-strategy=unsafe-best-match --requirements=./pylock.toml; \
else \
# This may have to download and compile some dependencies, and as we don't lock requirements from `build-system.requires`,
# we often don't know the correct hashes and `--require-hashes` would therefore fail on non amd64, where building is common.
uv pip install --strict --no-deps --no-cache --no-config --no-progress --verify-hashes --compile-bytecode --index-strategy=unsafe-best-match --requirements=./pylock.toml; \
fi && \
🤖 Prompt for AI Agents
In runtimes/datascience/ubi9-python-3.12/Dockerfile.cpu around lines 160–171,
the s390x branch runs pip builds that may require the Rust/Cargo toolchain but
does not ensure cargo is on PATH; modify the s390x branch so that before
invoking pip you export or prepend the cargo bin directory to PATH (for example
export PATH="$HOME/.cargo/bin:$PATH" or set CARGO_HOME and update PATH) so cargo
is available in the non-login shell used by RUN, then run the pip install as
before.

# Fix permissions to support pip in Openshift environments
chmod -R g+w /opt/app-root/lib/python3.12/site-packages && \
fix-permissions /opt/app-root -P

USER 1001

WORKDIR /opt/app-root/src
Loading
Loading