-
Notifications
You must be signed in to change notification settings - Fork 113
Description
Problem Description
The s390x pyarrow build in runtimes/datascience/ubi9-python-3.12/Dockerfile.cpu
(lines 99-103) explicitly disables core Arrow codecs:
-DARROW_WITH_LZ4=OFF \
-DARROW_WITH_ZSTD=OFF \
-DARROW_WITH_SNAPPY=OFF \
This configuration prevents reading most real-world Parquet and Arrow datasets that use these common compression formats, significantly limiting the functionality of the datascience runtime on s390x architecture.
Impact Analysis
- Data Compatibility: Users cannot read Parquet files compressed with LZ4, Zstd, or Snappy (the most common compression formats)
- Runtime Failures: Applications attempting to read compressed datasets will fail with codec-related errors
- User Experience: s390x datascience runtime becomes significantly less capable than other architectures
Root Cause
The codecs were likely disabled to avoid build complexity or dependencies, but with -DARROW_DEPENDENCY_SOURCE=BUNDLED
already set, the required codec libraries should be built in-tree without requiring additional system dependencies.
Solution
Enable the core codecs in the Arrow build configuration:
# Change from:
-DARROW_WITH_LZ4=OFF \
-DARROW_WITH_ZSTD=OFF \
-DARROW_WITH_SNAPPY=OFF \
# To:
-DARROW_WITH_LZ4=ON \
-DARROW_WITH_ZSTD=ON \
-DARROW_WITH_SNAPPY=ON \
Acceptance Criteria
- Core Arrow codecs (LZ4, Zstd, Snappy) are enabled in s390x pyarrow build
- s390x datascience runtime can successfully read Parquet files compressed with these formats
- Build time impact is acceptable (should be minimal with BUNDLED dependencies)
- No regression in build success rate for s390x architecture
Files Affected
runtimes/datascience/ubi9-python-3.12/Dockerfile.cpu
Context
Identified during PR #1513 review: #1513 (comment)
The current implementation prioritizes build simplicity over runtime functionality, but enabling these codecs should not introduce significant complexity given the bundled dependency strategy.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status