Skip to content

Optimize and Enable Runtime Datascience Images for IBM Z[s390x] (Python 3.11 & 3.12) #1513

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions ci/cached-builds/gen_gha_matrix_jobs.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,8 @@
"runtime-minimal-ubi9-python-3.12",
"jupyter-minimal-ubi9-python-3.11",
"jupyter-minimal-ubi9-python-3.12",
"runtime-datascience-ubi9-python-3.11",
"runtime-datascience-ubi9-python-3.12",
# add more here
}

Expand Down
124 changes: 117 additions & 7 deletions runtimes/datascience/ubi9-python-3.11/Dockerfile.cpu
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,40 @@ WORKDIR /opt/app-root/bin
# OS Packages needs to be installed as root
USER 0

# Install useful OS packages
RUN dnf install -y mesa-libGL skopeo && dnf clean all && rm -rf /var/cache/yum
ARG TARGETARCH

# Install useful OS packages (and dev tools for s390x only)
RUN --mount=type=cache,target=/var/cache/dnf \
echo "Building for architecture: ${TARGETARCH}" && \
if [ "$TARGETARCH" = "s390x" ]; then \
PACKAGES="mesa-libGL skopeo gcc gcc-c++ make openssl-devel autoconf automake libtool cmake python3-devel pybind11-devel openblas-devel unixODBC-devel openssl zlib-devel"; \
else \
PACKAGES="mesa-libGL skopeo"; \
fi && \
echo "Installing: $PACKAGES" && \
dnf install -y --nogpgcheck --allowerasing --nobest $PACKAGES && \
dnf clean all && rm -rf /var/cache/yum

# Install Rust and set environment variables (s390x only)
RUN if [ "$TARGETARCH" = "s390x" ]; then \
mkdir -p /opt/.cargo && \
export HOME=/root && \
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs -o rustup-init.sh && \
chmod +x rustup-init.sh && \
CARGO_HOME=/opt/.cargo ./rustup-init.sh -y --no-modify-path && \
rm -f rustup-init.sh && \
chown -R 1001:0 /opt/.cargo && \
echo 'export PATH=/opt/.cargo/bin:$PATH' >> /etc/profile.d/cargo.sh && \
echo 'export CARGO_HOME=/opt/.cargo' >> /etc/profile.d/cargo.sh && \
echo 'export GRPC_PYTHON_BUILD_SYSTEM_OPENSSL=1' >> /etc/profile.d/cargo.sh; \
fi
Comment on lines +25 to +37
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Execute remote scripts only with explicit checksums
rustup-init.sh is downloaded and executed without verifying a checksum or signature. This is a classic supply-chain attack vector and violates the project’s own #1241 guidance on binary verification.

-    curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs -o rustup-init.sh && \
+    curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs -o rustup-init.sh && \
+    echo "<SHA256  expected>  rustup-init.sh" | sha256sum -c - && \

Consider pinning to the official static install script hash (updated per release) or install the rust-toolset RPM once it appears for s390x.

🧰 Tools
🪛 GitHub Actions: Code static analysis

[warning] 26-26: Hadolint SC2016 info: Expressions don't expand in single quotes, use double quotes for that.

🤖 Prompt for AI Agents
In runtimes/datascience/ubi9-python-3.11/Dockerfile.cpu around lines 25 to 37,
the rustup-init.sh script is downloaded and executed without verifying its
checksum, which is a security risk. To fix this, add a step to download the
official checksum for the rustup-init.sh script and verify it matches the
downloaded file before execution. Alternatively, consider installing Rust via a
trusted package like the rust-toolset RPM for s390x once available, to avoid
executing remote scripts directly.


# Set python alternatives for s390x only
RUN if [ "$TARGETARCH" = "s390x" ]; then \
alternatives --install /usr/bin/python python /usr/bin/python3.11 1 && \
alternatives --install /usr/bin/python3 python3 /usr/bin/python3.11 1 && \
python --version && python3 --version; \
fi

# Other apps and tools installed as default user
USER 1001
Expand All @@ -25,11 +57,62 @@ RUN curl -L https://mirror.openshift.com/pub/openshift-v4/$(uname -m)/clients/oc
rm -f /tmp/openshift-client-linux.tar.gz
# Install the oc client end

##############################
# wheel-builder stage #
##############################
FROM base AS s390x-builder

USER 0
WORKDIR /tmp/build-wheels
ARG TARGETARCH

RUN echo "s390x-builder stage TARGETARCH: ${TARGETARCH}"

RUN --mount=type=cache,target=/root/.cache/pip \
--mount=type=cache,target=/root/.cache/dnf \
if [ "$TARGETARCH" = "s390x" ]; then \
echo "Building pyarrow wheel for s390x..." && \
dnf install -y cmake make gcc-c++ pybind11-devel wget && \
dnf clean all && \
git clone --depth 1 https://github.com/apache/arrow.git && \
cd arrow/cpp && \
mkdir release && cd release && \
Comment on lines +75 to +79
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add git to build-deps to prevent clone failure.

-        dnf install -y cmake make gcc-c++ pybind11-devel wget && \
+        dnf install -y cmake make gcc-c++ git pybind11-devel wget && \
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
dnf install -y cmake make gcc-c++ pybind11-devel wget && \
dnf clean all && \
git clone --depth 1 https://github.com/apache/arrow.git && \
cd arrow/cpp && \
mkdir release && cd release && \
dnf install -y cmake make gcc-c++ git pybind11-devel wget && \
dnf clean all && \
git clone --depth 1 https://github.com/apache/arrow.git && \
cd arrow/cpp && \
mkdir release && cd release && \
🤖 Prompt for AI Agents
In runtimes/datascience/ubi9-python-3.11/Dockerfile.cpu around lines 74 to 78,
the build dependencies do not include git, which can cause the git clone command
to fail. Add git to the list of packages installed by dnf before running the git
clone command to ensure git is available during the build process.

cmake -DCMAKE_BUILD_TYPE=Release \
-DCMAKE_INSTALL_PREFIX=/usr/local \
-DARROW_PYTHON=ON \
-DARROW_PARQUET=ON \
-DARROW_ORC=ON \
-DARROW_FILESYSTEM=ON \
-DARROW_JSON=ON \
-DARROW_CSV=ON \
-DARROW_DATASET=ON \
-DARROW_DEPENDENCY_SOURCE=BUNDLED \
-DCMAKE_CXX_FLAGS="-O3 -march=z14 -mtune=z14" \
-DCMAKE_C_FLAGS="-O3 -march=z14 -mtune=z14" \
.. && \
make -j$(nproc) && \
make install && \
cd ../../python && \
pip install --no-cache-dir -U pip wheel setuptools && \
pip install --no-cache-dir -r requirements-build.txt && \
export PYARROW_PARALLEL=$(nproc) && \
export ARROW_BUILD_TYPE=release && \
CFLAGS="-O3 -march=z14 -mtune=z14" \
CXXFLAGS="-O3 -march=z14 -mtune=z14" \
python setup.py build_ext --build-type=release --bundle-arrow-cpp bdist_wheel && \
mkdir -p /tmp/wheels && \
cp dist/pyarrow-*.whl /tmp/wheels/ && \
ls -la /tmp/wheels/; \
else \
echo "Not s390x, skipping wheel build" && mkdir -p /tmp/wheels; \
fi

Comment on lines +71 to +109
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Unpinned Arrow git clone hurts reproducibility
git clone --depth 1 https://github.com/apache/arrow.git fetches whatever HEAD happens to be at build time, making builds non-reproducible and brittle.

-        git clone --depth 1 https://github.com/apache/arrow.git && \
+        ARROW_TAG=apache-arrow-16.1.0 && \
+        git clone --branch ${ARROW_TAG} --depth 1 https://github.com/apache/arrow.git && \

Pin to a known tag/revision and add a comment when updating to a newer release.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
RUN --mount=type=cache,target=/root/.cache/pip \
--mount=type=cache,target=/root/.cache/dnf \
if [ "$TARGETARCH" = "s390x" ]; then \
echo "Building pyarrow wheel for s390x..." && \
dnf install -y cmake make gcc-c++ pybind11-devel wget && \
dnf clean all && \
git clone --depth 1 https://github.com/apache/arrow.git && \
cd arrow/cpp && \
mkdir release && cd release && \
cmake -DCMAKE_BUILD_TYPE=Release \
-DCMAKE_INSTALL_PREFIX=/usr/local \
-DARROW_PYTHON=ON \
-DARROW_PARQUET=ON \
-DARROW_ORC=ON \
-DARROW_FILESYSTEM=ON \
-DARROW_JSON=ON \
-DARROW_CSV=ON \
-DARROW_DATASET=ON \
-DARROW_DEPENDENCY_SOURCE=BUNDLED \
-DCMAKE_CXX_FLAGS="-O3 -march=z14 -mtune=z14" \
-DCMAKE_C_FLAGS="-O3 -march=z14 -mtune=z14" \
.. && \
make -j$(nproc) && \
make install && \
cd ../../python && \
pip install --no-cache-dir -U pip wheel setuptools && \
pip install --no-cache-dir -r requirements-build.txt && \
export PYARROW_PARALLEL=$(nproc) && \
export ARROW_BUILD_TYPE=release && \
CFLAGS="-O3 -march=z14 -mtune=z14" \
CXXFLAGS="-O3 -march=z14 -mtune=z14" \
python setup.py build_ext --build-type=release --bundle-arrow-cpp bdist_wheel && \
mkdir -p /tmp/wheels && \
cp dist/pyarrow-*.whl /tmp/wheels/ && \
ls -la /tmp/wheels/; \
else \
echo "Not s390x, skipping wheel build" && mkdir -p /tmp/wheels; \
fi
dnf clean all && \
- git clone --depth 1 https://github.com/apache/arrow.git && \
+ ARROW_TAG=apache-arrow-16.1.0 && \
+ git clone --branch ${ARROW_TAG} --depth 1 https://github.com/apache/arrow.git && \
cd arrow/cpp && \
mkdir release && cd release && \
cmake -DCMAKE_BUILD_TYPE=Release \
-DCMAKE_INSTALL_PREFIX=/usr/local \
-DARROW_PYTHON=ON \
-DARROW_PARQUET=ON \
-DARROW_ORC=ON \
-DARROW_FILESYSTEM=ON \
-DARROW_JSON=ON \
-DARROW_CSV=ON \
-DARROW_DATASET=ON \
-DARROW_DEPENDENCY_SOURCE=BUNDLED \
-DCMAKE_CXX_FLAGS="-O3 -march=z14 -mtune=z14" \
-DCMAKE_C_FLAGS="-O3 -march=z14 -mtune=z14" \
.. && \
make -j$(nproc) && \
make install && \
cd ../../python && \
pip install --no-cache-dir -U pip wheel setuptools && \
pip install --no-cache-dir -r requirements-build.txt && \
export PYARROW_PARALLEL=$(nproc) && \
export ARROW_BUILD_TYPE=release && \
CFLAGS="-O3 -march=z14 -mtune=z14" \
CXXFLAGS="-O3 -march=z14 -mtune=z14" \
python setup.py build_ext --build-type=release --bundle-arrow-cpp bdist_wheel && \
mkdir -p /tmp/wheels && \
cp dist/pyarrow-*.whl /tmp/wheels/ && \
ls -la /tmp/wheels/; \
🤖 Prompt for AI Agents
In runtimes/datascience/ubi9-python-3.11/Dockerfile.cpu around lines 69 to 107,
the git clone command for Apache Arrow uses the default branch HEAD, which makes
builds non-reproducible and brittle. To fix this, pin the clone to a specific
known tag or commit hash by adding a checkout command after cloning or cloning
directly with a tag reference. Also, add a comment explaining the pinned version
and instructions for updating it to maintain clarity and reproducibility.

#######################
# runtime-datascience #
#######################
FROM base AS runtime-datascience

ARG TARGETARCH
ARG DATASCIENCE_SOURCE_CODE=runtimes/datascience/ubi9-python-3.11

LABEL name="odh-notebook-runtime-datascience-ubi9-python-3.11" \
Expand All @@ -43,17 +126,44 @@ LABEL name="odh-notebook-runtime-datascience-ubi9-python-3.11" \
io.openshift.build.image="quay.io/opendatahub/workbench-images:runtime-datascience-ubi9-python-3.11"

WORKDIR /opt/app-root/bin
USER 0

# Install s390x-built wheels if available
COPY --from=s390x-builder /tmp/wheels /tmp/wheels
RUN if [ "$TARGETARCH" = "s390x" ]; then \
echo "Installing s390x wheels..." && \
WHEELS=$(find /tmp/wheels/ -name "pyarrow-*.whl") && \
if [ -n "$WHEELS" ]; then \
pip install --no-cache-dir $WHEELS && \
echo "Wheel install complete"; \
else \
echo "No pyarrow wheel found!" && exit 1; \
fi && rm -rf /tmp/wheels; \
else \
echo "Skipping wheel install on non-s390x (${TARGETARCH})"; \
fi

# Install Python packages from requirements.txt
COPY ${DATASCIENCE_SOURCE_CODE}/requirements.txt ./
# Copy Elyra dependencies for air-gapped enviroment
COPY ${DATASCIENCE_SOURCE_CODE}/utils ./utils/

RUN echo "Installing softwares and packages" && \
# This may have to download and compile some dependencies, and as we don't lock requirements from `build-system.requires`,
# we often don't know the correct hashes and `--require-hashes` would therefore fail on non amd64, where building is common.
uv pip install --strict --no-deps --no-cache --no-config --no-progress --verify-hashes --compile-bytecode --index-strategy=unsafe-best-match --requirements=./requirements.txt --build-constraints=./requirements.txt && \
# Fix permissions to support pip in Openshift environments \
RUN --mount=type=cache,target=/root/.cache/pip \
echo "Installing softwares and packages" && \
if [ "$TARGETARCH" = "s390x" ]; then \
# For s390x, we need special flags and environment variables for building packages
GRPC_PYTHON_BUILD_SYSTEM_OPENSSL=1 \
CFLAGS="-O3" CXXFLAGS="-O3" \
uv pip install --strict --no-deps --no-cache --no-config --no-progress \
--verify-hashes --compile-bytecode --index-strategy=unsafe-best-match \
--requirements=./requirements.txt --build-constraints=./requirements.txt; \
else \
# This may have to download and compile some dependencies, and as we don't lock requirements from `build-system.requires`,
# we often don't know the correct hashes and `--require-hashes` would therefore fail on non amd64, where building is common.
uv pip install --strict --no-deps --no-cache --no-config --no-progress \
--verify-hashes --compile-bytecode --index-strategy=unsafe-best-match \
--requirements=./requirements.txt --build-constraints=./requirements.txt; \
fi && \
chmod -R g+w /opt/app-root/lib/python3.11/site-packages && \
fix-permissions /opt/app-root -P

Expand Down
7 changes: 6 additions & 1 deletion runtimes/datascience/ubi9-python-3.11/Pipfile
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,12 @@ scikit-learn = "~=1.6.1"
scipy = "~=1.15.2"
skl2onnx = "~=1.18.0"
onnxconverter-common = "~=1.13.0" # Required for skl2onnx, as upgraded version is not compatible with protobuf
codeflare-sdk = "~=0.30.0"

# Exclude these packages on s390x architecture due to compatibility
codeflare-sdk = {version = "~=0.30.0", markers = "platform_machine != 's390x'"}
py-spy = {version = "~=0.4.0", markers = "platform_machine != 's390x'"}
ray = {version = "~=2.47.1", markers = "platform_machine != 's390x'", extras = ["data", "default"]}
pyarrow = {version = "~=21.0.0", markers = "platform_machine != 's390x'"}
Comment on lines +21 to +25
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Version coherence between architectures may diverge – tag the Arrow checkout to match ~=21.0.0.

pyarrow is excluded for s390x in the Pipfile but is re-installed later from a wheel you build from the Arrow main branch.
That wheel is almost certainly not 21.0.0 (today main is already 16+). The result is:

  • x86_64/arm64pyarrow==21.0.0 (from lock)
  • s390xpyarrow==<current-HEAD> (from wheel)

Down-stream notebooks that rely on consistent feature / ABI levels can now break only on s390x.

Recommend pinning the build to the same version tag that the Pipfile declares, e.g.:

- git clone --depth 1 https://github.com/apache/arrow.git
+ git clone --depth 1 --branch apache-arrow-21.0.0 https://github.com/apache/arrow.git

and keep the Pipfile/lock in sync whenever you upgrade.


🏁 Script executed:

#!/bin/bash
set -e

echo "1) Listing Dockerfiles under runtimes/datascience/ubi9-python-3.11:"
fd -t f Dockerfile runtimes/datascience/ubi9-python-3.11 || echo "No Dockerfile found in that path"

echo
echo "2) Searching for 'arrow' references in those Dockerfiles:"
for df in $(fd -t f Dockerfile runtimes/datascience/ubi9-python-3.11); do
  echo "---- $df ----"
  grep -n "arrow" "$df" || echo "  (no 'arrow' references)"
done

echo
echo "3) Searching entire repo for 'git clone' commands cloning Arrow:"
rg "git clone.*arrow" -n . || echo "No git clone arrow found in repo"

Length of output: 1602


Pin Arrow checkout to version 21.0.0 in the s390x build

The s390x Docker build (runtimes/datascience/ubi9-python-3.11/Dockerfile.cpu:76) currently does:

git clone --depth 1 https://github.com/apache/arrow.git

This pulls main (now ≫21.0.0), so:

  • x86_64/arm64 → pyarrow==21.0.0 (from Pipfile.lock)
  • s390x → pyarrow== (built wheel)

downstream notebooks expecting a consistent ABI/featureset may break only on s390x.

Please update the clone to pin the tag matching your Pipfile:

- git clone --depth 1 https://github.com/apache/arrow.git
+ git clone --depth 1 --branch apache-arrow-21.0.0 https://github.com/apache/arrow.git

• Apply the same change in the 3.12 Dockerfile (runtimes/datascience/ubi9-python-3.12/Dockerfile.cpu:80) when bumping pyarrow.
• Keep the Pipfile/lock and Dockerfile tag in sync on future upgrades.

🤖 Prompt for AI Agents
In runtimes/datascience/ubi9-python-3.11/Pipfile lines 21 to 25, the pyarrow
package is pinned to version 21.0.0, but the s390x Docker build clones the
Apache Arrow repository without specifying a tag, causing it to use the latest
main branch instead of version 21.0.0. To fix this, update the s390x Dockerfile
(runtimes/datascience/ubi9-python-3.11/Dockerfile.cpu line 76) to clone the
Apache Arrow repository at the tag corresponding to version 21.0.0, ensuring
consistency between the Pipfile and the built wheel. Also, apply the same change
to the 3.12 Dockerfile when upgrading pyarrow in the future.


# DB connectors
pymongo = "~=4.11.2"
Expand Down
126 changes: 120 additions & 6 deletions runtimes/datascience/ubi9-python-3.12/Dockerfile.cpu
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,48 @@
####################
FROM registry.access.redhat.com/ubi9/python-312:latest AS base
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Pin the base image – avoid :latest.

Using ubi9/python-312:latest makes rebuilds non-deterministic and may silently pull incompatible minors. Please pin to a specific digest or tag (tracked in issue #1242).

🤖 Prompt for AI Agents
In runtimes/datascience/ubi9-python-3.12/Dockerfile.cpu at line 4, the base
image is currently using the `:latest` tag, which can cause non-deterministic
builds and potential incompatibilities. Replace `ubi9/python-312:latest` with a
specific, fixed tag or digest version to ensure consistent and reproducible
builds. Check the relevant issue #1242 for the recommended pinned version to
use.


ARG TARGETARCH

WORKDIR /opt/app-root/bin

# OS Packages needs to be installed as root
USER 0

# Install useful OS packages
RUN dnf install -y mesa-libGL skopeo libxcrypt-compat && dnf clean all && rm -rf /var/cache/yum
RUN --mount=type=cache,target=/var/cache/dnf \
echo "Building for architecture: ${TARGETARCH}" && \
PACKAGES="mesa-libGL skopeo libxcrypt-compat" && \
# Additional dev tools only for s390x
if [ "$TARGETARCH" = "s390x" ]; then \
PACKAGES="$PACKAGES gcc gcc-c++ make openssl-devel autoconf automake libtool cmake python3-devel pybind11-devel openblas-devel unixODBC-devel openssl zlib-devel"; \
fi && \
if [ -n "$PACKAGES" ]; then \
dnf install -y --nogpgcheck --allowerasing --nobest $PACKAGES && \
dnf clean all && rm -rf /var/cache/yum; \
fi

# For s390x only, set ENV vars and install Rust
RUN if [ "$TARGETARCH" = "s390x" ]; then \
# Install Rust and set up environment
mkdir -p /opt/.cargo && \
export HOME=/root && \
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs -o rustup-init.sh && \
chmod +x rustup-init.sh && \
CARGO_HOME=/opt/.cargo HOME=/root ./rustup-init.sh -y --no-modify-path && \
rm -f rustup-init.sh && \
chown -R 1001:0 /opt/.cargo && \
# Set environment variables
echo 'export PATH=/opt/.cargo/bin:$PATH' >> /etc/profile.d/cargo.sh && \
echo 'export CARGO_HOME=/opt/.cargo' >> /etc/profile.d/cargo.sh && \
echo 'export GRPC_PYTHON_BUILD_SYSTEM_OPENSSL=1' >> /etc/profile.d/cargo.sh; \
fi
Comment on lines +26 to +40
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Download-and-execute of rustup-init.sh lacks integrity checks
Executing a remote script without verifying a checksum is a classic supply-chain risk and contradicts repo policy tracked in issue #1241. At minimum verify the SHA256 or import the official GPG key.

-    curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs -o rustup-init.sh && \
+    curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs -o rustup-init.sh && \
+    echo "<EXPECTED_SHA256>  rustup-init.sh" | sha256sum -c - && \
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# For s390x only, set ENV vars and install Rust
RUN if [ "$TARGETARCH" = "s390x" ]; then \
# Install Rust and set up environment
mkdir -p /opt/.cargo && \
export HOME=/root && \
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs -o rustup-init.sh && \
chmod +x rustup-init.sh && \
CARGO_HOME=/opt/.cargo HOME=/root ./rustup-init.sh -y --no-modify-path && \
rm -f rustup-init.sh && \
chown -R 1001:0 /opt/.cargo && \
# Set environment variables
echo 'export PATH=/opt/.cargo/bin:$PATH' >> /etc/profile.d/cargo.sh && \
echo 'export CARGO_HOME=/opt/.cargo' >> /etc/profile.d/cargo.sh && \
echo 'export GRPC_PYTHON_BUILD_SYSTEM_OPENSSL=1' >> /etc/profile.d/cargo.sh; \
fi
# For s390x only, set ENV vars and install Rust
RUN if [ "$TARGETARCH" = "s390x" ]; then \
# Install Rust and set up environment
mkdir -p /opt/.cargo && \
export HOME=/root && \
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs -o rustup-init.sh && \
echo "<EXPECTED_SHA256> rustup-init.sh" | sha256sum -c - && \
chmod +x rustup-init.sh && \
CARGO_HOME=/opt/.cargo HOME=/root ./rustup-init.sh -y --no-modify-path && \
rm -f rustup-init.sh && \
chown -R 1001:0 /opt/.cargo && \
# Set environment variables
echo 'export PATH=/opt/.cargo/bin:$PATH' >> /etc/profile.d/cargo.sh && \
echo 'export CARGO_HOME=/opt/.cargo' >> /etc/profile.d/cargo.sh && \
echo 'export GRPC_PYTHON_BUILD_SYSTEM_OPENSSL=1' >> /etc/profile.d/cargo.sh; \
fi
🧰 Tools
🪛 GitHub Actions: Code static analysis

[warning] 27-27: Hadolint info SC2016: Expressions don't expand in single quotes, use double quotes for that.

🤖 Prompt for AI Agents
In runtimes/datascience/ubi9-python-3.12/Dockerfile.cpu around lines 26 to 40,
the rustup-init.sh script is downloaded and executed without verifying its
integrity, which poses a security risk. To fix this, add a step to verify the
SHA256 checksum of the downloaded rustup-init.sh file against the official
checksum before executing it. Alternatively, import and verify the script using
the official GPG key if available. Only proceed with execution if the
verification passes, otherwise fail the build.


# Set python alternatives only for s390x (not needed for other arches)
RUN if [ "$TARGETARCH" = "s390x" ]; then \
alternatives --install /usr/bin/python python /usr/bin/python3.12 1 && \
alternatives --install /usr/bin/python3 python3 /usr/bin/python3.12 1 && \
python --version && python3 --version; \
fi

# Other apps and tools installed as default user
USER 1001
Expand All @@ -25,11 +60,69 @@ RUN curl -L https://mirror.openshift.com/pub/openshift-v4/$(uname -m)/clients/oc
rm -f /tmp/openshift-client-linux.tar.gz
# Install the oc client end

##############################
# wheel-builder stage #
# NOTE: Only used in s390x
##############################
FROM base AS s390x-builder

ARG TARGETARCH
USER 0
WORKDIR /tmp/build-wheels

# Build pyarrow optimized for s390x
RUN --mount=type=cache,target=/root/.cache/pip \
--mount=type=cache,target=/root/.cache/dnf \
if [ "$TARGETARCH" = "s390x" ]; then \
# Install build dependencies (shared for pyarrow and onnx)
dnf install -y cmake make gcc-c++ pybind11-devel wget && \
dnf clean all && \
# Build and collect pyarrow wheel
git clone --depth 1 https://github.com/apache/arrow.git && \
cd arrow/cpp && \
mkdir release && cd release && \
cmake -DCMAKE_BUILD_TYPE=Release \
Comment on lines +81 to +84
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Pin Apache Arrow revision for reproducible builds.
Unpinned git clone fetches whatever HEAD is current, making image rebuilds nondeterministic.

-        git clone --depth 1 https://github.com/apache/arrow.git && \
+        ARROW_TAG=apache-arrow-16.1.0 && \
+        git clone --branch ${ARROW_TAG} --depth 1 https://github.com/apache/arrow.git && \
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
git clone --depth 1 https://github.com/apache/arrow.git && \
cd arrow/cpp && \
mkdir release && cd release && \
cmake -DCMAKE_BUILD_TYPE=Release \
ARROW_TAG=apache-arrow-16.1.0 && \
git clone --branch ${ARROW_TAG} --depth 1 https://github.com/apache/arrow.git && \
cd arrow/cpp && \
mkdir release && cd release && \
cmake -DCMAKE_BUILD_TYPE=Release \
🤖 Prompt for AI Agents
In runtimes/datascience/ubi9-python-3.12/Dockerfile.cpu around lines 80 to 83,
the git clone command for Apache Arrow does not specify a commit or tag, causing
nondeterministic builds. Modify the Dockerfile to clone a specific, fixed
revision of the Apache Arrow repository by adding a checkout command after
cloning to pin the build to that commit, ensuring reproducible image builds.

-DCMAKE_INSTALL_PREFIX=/usr/local \
-DARROW_PYTHON=ON \
-DARROW_PARQUET=ON \
-DARROW_ORC=ON \
-DARROW_FILESYSTEM=ON \
-DARROW_JSON=ON \
-DARROW_CSV=ON \
-DARROW_DATASET=ON \
-DARROW_DEPENDENCY_SOURCE=BUNDLED \
-DARROW_WITH_LZ4=OFF \
-DARROW_WITH_ZSTD=OFF \
-DARROW_WITH_SNAPPY=OFF \
-DARROW_BUILD_TESTS=OFF \
-DARROW_BUILD_BENCHMARKS=OFF \
.. && \
make -j$(nproc) VERBOSE=1 && \
make install -j$(nproc) && \
cd ../../python && \
pip install --no-cache-dir -r requirements-build.txt && \
PYARROW_WITH_PARQUET=1 \
PYARROW_WITH_DATASET=1 \
PYARROW_WITH_FILESYSTEM=1 \
PYARROW_WITH_JSON=1 \
PYARROW_WITH_CSV=1 \
PYARROW_PARALLEL=$(nproc) \
python setup.py build_ext --build-type=release --bundle-arrow-cpp bdist_wheel && \
mkdir -p /tmp/wheels && \
cp dist/pyarrow-*.whl /tmp/wheels/ && \
# Ensure wheels directory exists and has content
ls -la /tmp/wheels/; \
else \
# Create empty wheels directory for non-s390x
mkdir -p /tmp/wheels; \
fi

Comment on lines +67 to +119
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

git is missing – arrow clone will fail on s390x.

The builder installs cmake make gcc-c++ pybind11-devel wget but forgets git, yet immediately executes git clone (line 80).
Without git, the build will error out early and pyarrow wheels won’t be produced.

-        dnf install -y cmake make gcc-c++ pybind11-devel wget && \
+        dnf install -y cmake make gcc-c++ git pybind11-devel wget && \
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
FROM base AS s390x-builder
ARG TARGETARCH
USER 0
WORKDIR /tmp/build-wheels
# Build pyarrow optimized for s390x
RUN --mount=type=cache,target=/root/.cache/pip \
--mount=type=cache,target=/root/.cache/dnf \
if [ "$TARGETARCH" = "s390x" ]; then \
# Install build dependencies (shared for pyarrow and onnx)
dnf install -y cmake make gcc-c++ pybind11-devel wget && \
dnf clean all && \
# Build and collect pyarrow wheel
git clone --depth 1 https://github.com/apache/arrow.git && \
cd arrow/cpp && \
mkdir release && cd release && \
cmake -DCMAKE_BUILD_TYPE=Release \
-DCMAKE_INSTALL_PREFIX=/usr/local \
-DARROW_PYTHON=ON \
-DARROW_PARQUET=ON \
-DARROW_ORC=ON \
-DARROW_FILESYSTEM=ON \
-DARROW_JSON=ON \
-DARROW_CSV=ON \
-DARROW_DATASET=ON \
-DARROW_DEPENDENCY_SOURCE=BUNDLED \
-DARROW_WITH_LZ4=OFF \
-DARROW_WITH_ZSTD=OFF \
-DARROW_WITH_SNAPPY=OFF \
-DARROW_BUILD_TESTS=OFF \
-DARROW_BUILD_BENCHMARKS=OFF \
.. && \
make -j$(nproc) VERBOSE=1 && \
make install -j$(nproc) && \
cd ../../python && \
pip install --no-cache-dir -r requirements-build.txt && \
PYARROW_WITH_PARQUET=1 \
PYARROW_WITH_DATASET=1 \
PYARROW_WITH_FILESYSTEM=1 \
PYARROW_WITH_JSON=1 \
PYARROW_WITH_CSV=1 \
PYARROW_PARALLEL=$(nproc) \
python setup.py build_ext --build-type=release --bundle-arrow-cpp bdist_wheel && \
mkdir -p /tmp/wheels && \
cp dist/pyarrow-*.whl /tmp/wheels/ && \
# Ensure wheels directory exists and has content
ls -la /tmp/wheels/; \
else \
# Create empty wheels directory for non-s390x
mkdir -p /tmp/wheels; \
fi
FROM base AS s390x-builder
ARG TARGETARCH
USER 0
WORKDIR /tmp/build-wheels
# Build pyarrow optimized for s390x
RUN --mount=type=cache,target=/root/.cache/pip \
--mount=type=cache,target=/root/.cache/dnf \
if [ "$TARGETARCH" = "s390x" ]; then \
# Install build dependencies (shared for pyarrow and onnx)
dnf install -y cmake make gcc-c++ git pybind11-devel wget && \
dnf clean all && \
# Build and collect pyarrow wheel
git clone --depth 1 https://github.com/apache/arrow.git && \
cd arrow/cpp && \
mkdir release && cd release && \
cmake -DCMAKE_BUILD_TYPE=Release \
-DCMAKE_INSTALL_PREFIX=/usr/local \
-DARROW_PYTHON=ON \
-DARROW_PARQUET=ON \
-DARROW_ORC=ON \
-DARROW_FILESYSTEM=ON \
-DARROW_JSON=ON \
-DARROW_CSV=ON \
-DARROW_DATASET=ON \
-DARROW_DEPENDENCY_SOURCE=BUNDLED \
-DARROW_WITH_LZ4=OFF \
-DARROW_WITH_ZSTD=OFF \
-DARROW_WITH_SNAPPY=OFF \
-DARROW_BUILD_TESTS=OFF \
-DARROW_BUILD_BENCHMARKS=OFF \
.. && \
make -j$(nproc) VERBOSE=1 && \
make install -j$(nproc) && \
cd ../../python && \
pip install --no-cache-dir -r requirements-build.txt && \
PYARROW_WITH_PARQUET=1 \
PYARROW_WITH_DATASET=1 \
PYARROW_WITH_FILESYSTEM=1 \
PYARROW_WITH_JSON=1 \
PYARROW_WITH_CSV=1 \
PYARROW_PARALLEL=$(nproc) \
python setup.py build_ext --build-type=release --bundle-arrow-cpp bdist_wheel && \
mkdir -p /tmp/wheels && \
cp dist/pyarrow-*.whl /tmp/wheels/ && \
# Ensure wheels directory exists and has content
ls -la /tmp/wheels/; \
else \
# Create empty wheels directory for non-s390x
mkdir -p /tmp/wheels; \
fi
🧰 Tools
🪛 GitHub Actions: Code static analysis

[warning] 69-69: Hadolint DL3002 warning: Last USER should not be root.

🤖 Prompt for AI Agents
In runtimes/datascience/ubi9-python-3.12/Dockerfile.cpu around lines 66 to 118,
the build step for pyarrow on s390x installs several dependencies but omits git,
which causes the git clone command to fail. To fix this, add git to the list of
packages installed by dnf in the RUN command before cloning the arrow
repository, ensuring the git clone operation succeeds and the pyarrow wheel is
built correctly.

#######################
# runtime-datascience #
#######################
FROM base AS runtime-datascience

ARG TARGETARCH
ARG DATASCIENCE_SOURCE_CODE=runtimes/datascience/ubi9-python-3.12

LABEL name="odh-notebook-runtime-datascience-ubi9-python-3.12" \
Expand All @@ -44,16 +137,37 @@ LABEL name="odh-notebook-runtime-datascience-ubi9-python-3.12" \

WORKDIR /opt/app-root/bin

USER 0
# Copy wheels from build stage (s390x only)
COPY --from=s390x-builder /tmp/wheels /tmp/wheels
RUN if [ "$TARGETARCH" = "s390x" ]; then \
pip install --no-cache-dir /tmp/wheels/*.whl && rm -rf /tmp/wheels; \
else \
echo "Skipping wheel install for $TARGETARCH"; \
fi

Comment on lines +140 to +148
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Final image runs as root – switch back to user 1001.

USER 0 is set for the wheel copy, but never reset, so the resulting runtime image violates best-practice (Hadolint DL3002) and OpenShift will forcibly remap the UID anyway.

 RUN if [ "$TARGETARCH" = "s390x" ]; then \
     pip install --no-cache-dir /tmp/wheels/*.whl && rm -rf /tmp/wheels; \
 else \
     echo "Skipping wheel install for $TARGETARCH"; \
 fi
+
+# Drop privileges
+USER 1001
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
USER 0
# Copy wheels from build stage (s390x only)
COPY --from=s390x-builder /tmp/wheels /tmp/wheels
RUN if [ "$TARGETARCH" = "s390x" ]; then \
pip install --no-cache-dir /tmp/wheels/*.whl && rm -rf /tmp/wheels; \
else \
echo "Skipping wheel install for $TARGETARCH"; \
fi
USER 0
# Copy wheels from build stage (s390x only)
COPY --from=s390x-builder /tmp/wheels /tmp/wheels
RUN if [ "$TARGETARCH" = "s390x" ]; then \
pip install --no-cache-dir /tmp/wheels/*.whl && rm -rf /tmp/wheels; \
else \
echo "Skipping wheel install for $TARGETARCH"; \
fi
# Drop privileges
USER 1001
🧰 Tools
🪛 GitHub Actions: Code static analysis

[warning] 139-139: Hadolint DL3002 warning: Last USER should not be root.

🤖 Prompt for AI Agents
In runtimes/datascience/ubi9-python-3.12/Dockerfile.cpu around lines 139 to 147,
the Dockerfile switches to root user with USER 0 to copy and install wheels but
does not switch back to user 1001 afterward. To fix this, add a line setting
USER 1001 after the wheel installation block to ensure the final image runs as a
non-root user, complying with best practices and OpenShift requirements.

# Install Python packages from requirements.txt
COPY ${DATASCIENCE_SOURCE_CODE}/requirements.txt ./
# Copy Elyra dependencies for air-gapped enviroment
COPY ${DATASCIENCE_SOURCE_CODE}/utils ./utils/

RUN echo "Installing softwares and packages" && \
# This may have to download and compile some dependencies, and as we don't lock requirements from `build-system.requires`,
# we often don't know the correct hashes and `--require-hashes` would therefore fail on non amd64, where building is common.
uv pip install --strict --no-deps --no-cache --no-config --no-progress --verify-hashes --compile-bytecode --index-strategy=unsafe-best-match --requirements=./requirements.txt --build-constraints=./requirements.txt && \
# Fix permissions to support pip in Openshift environments \
RUN --mount=type=cache,target=/root/.cache/pip \
echo "Installing softwares and packages" && \
if [ "$TARGETARCH" = "s390x" ]; then \
# For s390x, we need special flags and environment variables for building packages
GRPC_PYTHON_BUILD_SYSTEM_OPENSSL=1 \
CFLAGS="-O3" CXXFLAGS="-O3" \
uv pip install --strict --no-deps --no-cache --no-config --no-progress \
--verify-hashes --compile-bytecode --index-strategy=unsafe-best-match \
--requirements=./requirements.txt --build-constraints=./requirements.txt; \
else \
# This may have to download and compile some dependencies, and as we don't lock requirements from `build-system.requires`,
# we often don't know the correct hashes and `--require-hashes` would therefore fail on non amd64, where building is common.
uv pip install --strict --no-deps --no-cache --no-config --no-progress \
--verify-hashes --compile-bytecode --index-strategy=unsafe-best-match \
--requirements=./requirements.txt --build-constraints=./requirements.txt; \
fi && \
# Fix permissions to support pip in Openshift environments
chmod -R g+w /opt/app-root/lib/python3.12/site-packages && \
fix-permissions /opt/app-root -P

Expand Down
7 changes: 6 additions & 1 deletion runtimes/datascience/ubi9-python-3.12/Pipfile
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,12 @@ scikit-learn = "~=1.6.1"
scipy = "~=1.15.2"
skl2onnx = "~=1.18.0"
onnxconverter-common = "~=1.13.0" # Required for skl2onnx, as upgraded version is not compatible with protobuf
codeflare-sdk = "~=0.29.0"

# Exclude these packages on s390x architecture due to compatibility
codeflare-sdk = {version = "~=0.29.0", markers = "platform_machine != 's390x'"}
py-spy = {version = "~=0.4.0", markers = "platform_machine != 's390x'"}
ray = {version = "~=2.46.0", markers = "platform_machine != 's390x'", extras = ["data", "default"]}
pyarrow = {version = "~=21.0.0", markers = "platform_machine != 's390x'"}
Comment on lines +21 to +25
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Confirm pyarrow version stays in sync between Pipfile and Dockerfile install path

The pyarrow dependency is omitted on s390x via a marker, while the Dockerfile compiles & installs the wheel manually for that architecture.
Please double-check that the version baked in the Dockerfile (pyarrow-X.Y.Z-*.whl) is kept identical to the ~=21.0.0 pin here; otherwise users will see diverging behaviour across architectures.

If the Dockerfile currently hard-codes another version, either:

  1. Align the pin below and regenerate the lock, or
  2. Pass ${PYARROW_VERSION} through build-args so the two sources can never drift.

🏁 Script executed:

#!/bin/bash
set -e

# Locate Dockerfile(s) for the datascience UBI9 Python 3.12 runtime
dockerfiles=$(find runtimes/datascience/ubi9-python-3.12 -type f -name 'Dockerfile*')

echo "Checking for pyarrow version pins in the following Dockerfile(s):"
echo "$dockerfiles"
echo

# Search each Dockerfile for hard-coded pyarrow wheel installs or build args
for df in $dockerfiles; do
  echo "=== $df ==="
  grep -En "pyarrow-.*\\.whl|PYARROW_VERSION" "$df" \
    || echo "No pyarrow reference found"
  echo
done

Length of output: 725


🏁 Script executed:

#!/bin/bash
# Display the first 200 lines of the CPU Dockerfile to inspect how the pyarrow wheel is built/pinned
sed -n '1,200p' runtimes/datascience/ubi9-python-3.12/Dockerfile.cpu

Length of output: 6576


Pin PyArrow Version in s390x Builder to Match Pipfile

The s390x-builder stage currently clones Apache Arrow’s main branch, resulting in a wheel version that will drift from the ~=21.0.0 pin in your Pipfile. To ensure consistency across architectures, update the Dockerfile to accept or define a PYARROW_VERSION build-arg and clone the corresponding tag.

Files to update:

  • runtimes/datascience/ubi9-python-3.12/Dockerfile.cpu (wheel-builder stage)

Suggested diff:

--- a/runtimes/datascience/ubi9-python-3.12/Dockerfile.cpu
+++ b/runtimes/datascience/ubi9-python-3.12/Dockerfile.cpu
@@
 FROM base AS s390x-builder
 
+ARG PYARROW_VERSION="21.0.0"
 ARG TARGETARCH 
 USER 0
 WORKDIR /tmp/build-wheels
 
 # Build pyarrow optimized for s390x
 RUN --mount=type=cache,target=/root/.cache/pip \
     --mount=type=cache,target=/root/.cache/dnf \
     if [ "$TARGETARCH" = "s390x" ]; then \
         # Install build dependencies...
-        git clone --depth 1 https://github.com/apache/arrow.git && \
+        git clone --branch "apache-arrow-${PYARROW_VERSION}" --depth 1 https://github.com/apache/arrow.git && \
         cd arrow/cpp && \
         mkdir release && cd release && \
         cmake ... && \

With this change you can:

  1. Build with a matching version:
    docker build . --build-arg PYARROW_VERSION=21.0.0 …
  2. Avoid any drift between the marker-pinned Pipfile dependency and the manually built wheel.
🤖 Prompt for AI Agents
In runtimes/datascience/ubi9-python-3.12/Dockerfile.cpu within the wheel-builder
stage, the Apache Arrow source is cloned from the main branch causing version
drift from the PyArrow version pinned in the Pipfile (~=21.0.0). Modify the
Dockerfile to accept a build argument PYARROW_VERSION and use it to clone the
corresponding Apache Arrow tag instead of the main branch. This ensures the
built wheel version matches the Pipfile pin and maintains consistency across
architectures.


# DB connectors
pymongo = "~=4.11.2"
Expand Down
Loading