-
Notifications
You must be signed in to change notification settings - Fork 117
Open
Description
Problem Description
The current s390x wheel-builder stage in jupyter/datascience/ubi9-python-3.12/Dockerfile.cpu
clones the Apache Arrow repository from the default branch (HEAD), which creates reproducibility issues and potential compatibility problems:
- Reproducibility Risk: Building from HEAD means different builds may use different Arrow versions, leading to inconsistent wheel outputs
- Version Mismatch: The pylock.toml specifies pyarrow 20.0.0 (with s390x exclusion), but building from HEAD may produce a different version
- Build Instability: HEAD builds may include breaking changes or unstable features that could cause build failures
Affected Files:
jupyter/datascience/ubi9-python-3.12/Dockerfile.cpu
(lines ~122-173)- Related:
jupyter/datascience/ubi9-python-3.12/pylock.toml
(pyarrow version specification)
Current Implementation
git clone --depth 1 https://github.com/apache/arrow.git && \
Proposed Solution
Pin the pyarrow build to match the version specified in pylock.toml:
ARG PYARROW_TAG=apache-arrow-20.0.0
# Build pyarrow optimized for s390x
RUN --mount=type=cache,target=/root/.cache/pip \
--mount=type=cache,target=/root/.cache/dnf \
if [ "$TARGETARCH" = "s390x" ]; then \
# Install build dependencies (shared for pyarrow and onnx)
dnf install -y cmake make gcc-c++ pybind11-devel wget && \
dnf clean all && \
# Build and collect pyarrow wheel
git clone --depth 1 --branch ${PYARROW_TAG} https://github.com/apache/arrow.git && \
# ... rest of build process ...
cp dist/pyarrow-20.*.whl /tmp/wheels/ && \
Additional Considerations
Consider enabling common codecs (LZ4, Zstd, Snappy) if image size permits, or document the feature limitations when these codecs are disabled.
Acceptance Criteria
- pyarrow build is pinned to apache-arrow-20.0.0 tag
- Build process uses consistent Arrow version across builds
- Wheel filename pattern matches expected version (pyarrow-20.*.whl)
- s390x builds remain functional
- Build reproducibility is improved
Implementation Notes
- Add ARG PYARROW_TAG=apache-arrow-20.0.0 before the build RUN command
- Update git clone to use --branch ${PYARROW_TAG}
- Update wheel copy pattern to match version-specific naming
- Consider adding inline documentation about codec limitations
Context
PR: #2432 - s390x(jupyter/datascience): make image buildable on s390x
Review Comment: #2432 (comment)
This issue addresses build reproducibility concerns identified during the s390x architecture support implementation.
Metadata
Metadata
Assignees
Labels
No labels
Type
Projects
Status
✅Done