Skip to content

Commit 51c6649

Browse files
committed
[SPARK-54839][PYTHON] Upgrade the minimum version of numpy to 2.0.0
### What changes were proposed in this pull request? Upgrade the minimum version of `numpy` to 2.0.0 ### Why are the changes needed? Numpy 1.22 was released at 1 Jan, 2022, 2.0.0 was relased at 16 Jun, 2024, and the latest version is 2.4.0 ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? ci ### Was this patch authored or co-authored using generative AI tooling? no Closes #53603 from zhengruifeng/bump_numpy_mini. Authored-by: Ruifeng Zheng <[email protected]> Signed-off-by: Ruifeng Zheng <[email protected]>
1 parent a021fec commit 51c6649

File tree

8 files changed

+11
-11
lines changed

8 files changed

+11
-11
lines changed

.github/workflows/build_and_test.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -368,7 +368,7 @@ jobs:
368368
- name: Install Python packages (Python 3.11)
369369
if: (contains(matrix.modules, 'sql') && !contains(matrix.modules, 'sql-')) || contains(matrix.modules, 'connect') || contains(matrix.modules, 'yarn')
370370
run: |
371-
python3.11 -m pip install 'numpy>=1.22' pyarrow pandas pyyaml scipy unittest-xml-reporting 'lxml==4.9.4' 'grpcio==1.76.0' 'grpcio-status==1.76.0' 'protobuf==6.33.0' 'zstandard==0.25.0'
371+
python3.11 -m pip install 'numpy>=2.0.0' pyarrow pandas pyyaml scipy unittest-xml-reporting 'lxml==4.9.4' 'grpcio==1.76.0' 'grpcio-status==1.76.0' 'protobuf==6.33.0' 'zstandard==0.25.0'
372372
python3.11 -m pip list
373373
# Run the tests.
374374
- name: Run tests

.github/workflows/maven_test.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -181,7 +181,7 @@ jobs:
181181
- name: Install Python packages (Python 3.11)
182182
if: contains(matrix.modules, 'resource-managers#yarn') || (contains(matrix.modules, 'sql#core')) || contains(matrix.modules, 'connect')
183183
run: |
184-
python3.11 -m pip install 'numpy>=1.22' pyarrow pandas pyyaml scipy unittest-xml-reporting 'grpcio==1.76.0' 'grpcio-status==1.76.0' 'protobuf==6.33.0' 'zstandard==0.25.0'
184+
python3.11 -m pip install 'numpy>=2.0.0' pyarrow pandas pyyaml scipy unittest-xml-reporting 'grpcio==1.76.0' 'grpcio-status==1.76.0' 'protobuf==6.33.0' 'zstandard==0.25.0'
185185
python3.11 -m pip list
186186
# Run the tests using script command.
187187
# BSD's script command doesn't support -c option, and the usage is different from Linux's one.

.github/workflows/pages.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -61,7 +61,7 @@ jobs:
6161
- name: Install Python dependencies
6262
run: |
6363
pip install 'sphinx==4.5.0' mkdocs 'pydata_sphinx_theme>=0.13' sphinx-copybutton nbsphinx numpydoc jinja2 markupsafe 'pyzmq<24.0.0' \
64-
ipython ipython_genutils sphinx_plotly_directive 'numpy>=1.22' pyarrow 'pandas==2.3.3' 'plotly>=4.8' 'docutils<0.18.0' \
64+
ipython ipython_genutils sphinx_plotly_directive 'numpy>=2.0.0' pyarrow 'pandas==2.3.3' 'plotly>=4.8' 'docutils<0.18.0' \
6565
'flake8==3.9.0' 'mypy==1.8.0' 'pytest==7.1.3' 'pytest-mypy-plugins==1.9.3' 'black==23.12.1' \
6666
'pandas-stubs==1.2.0.53' 'grpcio==1.76.0' 'grpcio-status==1.76.0' 'protobuf==6.33.0' 'grpc-stubs==1.24.11' 'googleapis-common-protos-stubs==2.2.0' \
6767
'sphinxcontrib-applehelp==1.0.4' 'sphinxcontrib-devhelp==1.0.2' 'sphinxcontrib-htmlhelp==2.0.1' 'sphinxcontrib-qthelp==1.0.3' 'sphinxcontrib-serializinghtml==1.1.5'

dev/requirements.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
py4j>=0.10.9.9
33

44
# PySpark dependencies (optional)
5-
numpy>=1.22
5+
numpy>=2.0.0
66
pyarrow>=15.0.0
77
six==1.16.0
88
pandas>=2.2.0

dev/spark-test-image/python-minimum/Dockerfile

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ LABEL org.opencontainers.image.ref.name="Apache Spark Infra Image For PySpark wi
2424
# Overwrite this label to avoid exposing the underlying Ubuntu OS version label
2525
LABEL org.opencontainers.image.version=""
2626

27-
ENV FULL_REFRESH_DATE=20250703
27+
ENV FULL_REFRESH_DATE=20251224
2828

2929
ENV DEBIAN_FRONTEND=noninteractive
3030
ENV DEBCONF_NONINTERACTIVE_SEEN=true
@@ -62,7 +62,7 @@ RUN apt-get update && apt-get install -y \
6262
wget \
6363
zlib1g-dev
6464

65-
ARG BASIC_PIP_PKGS="numpy==1.22.4 pyarrow==15.0.0 pandas==2.2.0 six==1.16.0 scipy scikit-learn coverage unittest-xml-reporting"
65+
ARG BASIC_PIP_PKGS="numpy==2.0.0 pyarrow==15.0.0 pandas==2.2.0 six==1.16.0 scipy scikit-learn coverage unittest-xml-reporting"
6666
# Python deps for Spark Connect
6767
ARG CONNECT_PIP_PKGS="grpcio==1.76.0 grpcio-status==1.76.0 googleapis-common-protos==1.71.0 zstandard==0.25.0 graphviz==0.20 protobuf"
6868

dev/spark-test-image/python-ps-minimum/Dockerfile

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ LABEL org.opencontainers.image.ref.name="Apache Spark Infra Image For Pandas API
2424
# Overwrite this label to avoid exposing the underlying Ubuntu OS version label
2525
LABEL org.opencontainers.image.version=""
2626

27-
ENV FULL_REFRESH_DATE=20250708
27+
ENV FULL_REFRESH_DATE=20251224
2828

2929
ENV DEBIAN_FRONTEND=noninteractive
3030
ENV DEBCONF_NONINTERACTIVE_SEEN=true
@@ -63,7 +63,7 @@ RUN apt-get update && apt-get install -y \
6363
zlib1g-dev
6464

6565

66-
ARG BASIC_PIP_PKGS="pyarrow==15.0.0 pandas==2.2.0 six==1.16.0 numpy scipy coverage unittest-xml-reporting"
66+
ARG BASIC_PIP_PKGS="numpy==2.0.0 pyarrow==15.0.0 pandas==2.2.0 six==1.16.0 scipy coverage unittest-xml-reporting"
6767
# Python deps for Spark Connect
6868
ARG CONNECT_PIP_PKGS="grpcio==1.76.0 grpcio-status==1.76.0 googleapis-common-protos==1.71.0 zstandard==0.25.0 graphviz==0.20 protobuf"
6969

python/docs/source/getting_started/install.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -278,7 +278,7 @@ Installable with ``pip install "pyspark[ml]"``.
278278
======= ================= ======================================
279279
Package Supported version Note
280280
======= ================= ======================================
281-
`numpy` >=1.22 Required for MLlib DataFrame-based API
281+
`numpy` >=2.0.0 Required for MLlib DataFrame-based API
282282
======= ================= ======================================
283283

284284
Additional libraries that enhance functionality but are not included in the installation packages:
@@ -298,7 +298,7 @@ Installable with ``pip install "pyspark[mllib]"``.
298298
======= ================= ==================
299299
Package Supported version Note
300300
======= ================= ==================
301-
`numpy` >=1.22 Required for MLlib
301+
`numpy` >=2.0.0 Required for MLlib
302302
======= ================= ==================
303303

304304
Declarative Pipelines

python/pyspark/sql/pandas/utils.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -98,7 +98,7 @@ def require_minimum_pyarrow_version() -> None:
9898

9999
def require_minimum_numpy_version() -> None:
100100
"""Raise ImportError if minimum version of NumPy is not installed"""
101-
minimum_numpy_version = "1.22"
101+
minimum_numpy_version = "2.0.0"
102102

103103
try:
104104
import numpy

0 commit comments

Comments
 (0)