Skip to content

Commit 9c5ff9e

Browse files
authored
Merge pull request numpy#26505 from rgommers/update-roadmap
DOC: update the NumPy Roadmap
2 parents fae3738 + 2a5f278 commit 9c5ff9e

File tree

1 file changed

+129
-51
lines changed

1 file changed

+129
-51
lines changed

doc/neps/roadmap.rst

Lines changed: 129 additions & 51 deletions
Original file line numberDiff line numberDiff line change
@@ -18,25 +18,19 @@ may include (among other things) interoperability protocols, better duck typing
1818
support and ndarray subclass handling.
1919

2020
The key goal is: *make it easy for code written for NumPy to also work with
21-
other NumPy-like projects.* This will enable GPU support via, e.g, CuPy or JAX,
21+
other NumPy-like projects.* This will enable GPU support via, e.g, CuPy, JAX or PyTorch,
2222
distributed array support via Dask, and writing special-purpose arrays (either
2323
from scratch, or as a ``numpy.ndarray`` subclass) that work well with SciPy,
24-
scikit-learn and other such packages.
24+
scikit-learn and other such packages. A large step forward in this area was
25+
made in NumPy 2.0, with adoption of and compliance with the array API standard
26+
(v2022.12, see :ref:`NEP47`). Future work in this direction will include
27+
support for newer versions of the array API standard, and adding features as
28+
needed based on real-world experience and needs.
2529

26-
The ``__array_ufunc__`` and ``__array_function__`` protocols are stable, but
27-
do not cover the whole API. New protocols for overriding other functionality
28-
in NumPy are needed. Work in this area aims to bring to completion one or more
29-
of the following proposals:
30-
31-
- :ref:`NEP30`
32-
- :ref:`NEP31`
33-
- :ref:`NEP35`
34-
- :ref:`NEP37`
35-
36-
In addition we aim to provide ways to make it easier for other libraries to
37-
implement a NumPy-compatible API. This may include defining consistent subsets
38-
of the API, as discussed in `this section of NEP 37
39-
<https://numpy.org/neps/nep-0037-array-module.html#requesting-restricted-subsets-of-numpy-s-api>`__.
30+
In addition, the ``__array_ufunc__`` and ``__array_function__`` protocols
31+
fulfill a role here - they are stable and used by several downstream projects.
32+
They do not cover the whole API, so use of the array API standard is preferred
33+
for new code.
4034

4135

4236
Performance
@@ -46,17 +40,26 @@ Improvements to NumPy's performance are important to many users. We have
4640
focused this effort on Universal SIMD (see :ref:`NEP38`) intrinsics which
4741
provide nice improvements across various hardware platforms via an abstraction
4842
layer. The infrastructure is in place, and we welcome follow-on PRs to add
49-
SIMD support across all relevant NumPy functions.
43+
SIMD support across relevant NumPy functionality.
44+
45+
Transitioning from C to C++, both in the SIMD infrastructure and in NumPy
46+
internals more widely, is in progress. We have also started to make use of
47+
Google Highway (see :ref:`NEP54`), and that usage is likely to expand. Work
48+
towards support for newer SIMD instruction sets, like SVE on arm64, is ongoing.
5049

5150
Other performance improvement ideas include:
5251

53-
- A better story around parallel execution.
52+
- A better story around parallel execution (related is support for free-threaded
53+
CPython, see further down).
5454
- Optimizations in individual functions.
55-
- Reducing ufunc and ``__array_function__`` overhead.
5655

5756
Furthermore we would like to improve the benchmarking system, in terms of coverage,
58-
easy of use, and publication of the results (now
59-
`here <https://pv.github.io/numpy-bench>`__) as part of the docs or website.
57+
easy of use, and publication of the results. Benchmarking PRs/branches compared
58+
to the `main` branch is a primary purpose, and required for PRs that are
59+
performance-focused (e.g., adding SIMD acceleration to a function). In
60+
addition, we'd like a performance overview like the one we had `here
61+
<https://pv.github.io/numpy-bench>`__, set up in a way that is more
62+
maintainable long-term.
6063

6164

6265
Documentation and website
@@ -68,69 +71,144 @@ documentation on many topics are missing or outdated. See :ref:`NEP44` for
6871
planned improvements. Adding more tutorials is underway in the
6972
`numpy-tutorials repo <https://github.com/numpy/numpy-tutorials>`__.
7073

71-
Our website (https://numpy.org) was completely redesigned recently. We aim to
72-
further improve it by adding translations, more case studies and other
73-
high-level content, and more (see `this tracking issue <https://github.com/numpy/numpy.org/issues/266>`__).
74+
We also intend to make all the example code in our documentation interactive -
75+
work is underway to do so via ``jupyterlite-sphinx`` and Pyodide.
76+
77+
Our website (https://numpy.org) is in good shape. Further work on expanding the
78+
number of languages that the website is translated in is desirable. As are
79+
improvements to the interactive notebook widget, through JupyterLite.
7480

7581

7682
Extensibility
7783
-------------
7884

79-
We aim to make it much easier to extend NumPy. The primary topic here is to
80-
improve the dtype system - see :ref:`NEP41` and related NEPs linked from it.
81-
Concrete goals for the dtype system rewrite are:
82-
83-
- Easier custom dtypes:
85+
We aim to continue making it easier to extend NumPy. The primary topic here is to
86+
improve the dtype system - see for example :ref:`NEP41` and related NEPs linked
87+
from it. In NumPy 2.0, a `new C API for user-defined dtypes <https://numpy.org/devdocs/reference/c-api/array.html#custom-data-types>`__
88+
was made public. We aim to encourage its usage and improve this API further,
89+
including support for writing a dtype in Python.
8490

85-
- Simplify and/or wrap the current C-API
86-
- More consistent support for dtype metadata
87-
- Support for writing a dtype in Python
91+
Ideas for new dtypes that may be developed outside of the main NumPy repository
92+
first, and that could potentially be upstreamed into NumPy later, include:
8893

89-
- Allow adding (a) new string dtype(s). This could be encoded strings with
90-
fixed-width storage (e.g., ``utf8`` or ``latin1``), and/or a variable length
91-
string dtype. The latter could share an implementation with ``dtype=object``,
92-
but be explicitly type-checked.
93-
One of these should probably be the default for text data. The current
94-
string dtype support is neither efficient nor user friendly.
94+
- A quad-precision (128-bit) dtype
95+
- A ``bfloat16`` dtype
96+
- A fixed-width string dtype which supports encodings (e.g., ``utf8`` or
97+
``latin1``)
98+
- A unit dtype
9599

96100

97101
User experience
98102
---------------
99103

100104
Type annotations
101105
````````````````
102-
NumPy 1.20 adds type annotations for most NumPy functionality, so users can use
103-
tools like `mypy`_ to type check their code and IDEs can improve their support
106+
Type annotations for most NumPy functionality is complete (although some
107+
submodules like ``numpy.ma`` are missing return types), so users can use tools
108+
like `mypy`_ to type check their code and IDEs can improve their support
104109
for NumPy. Improving those type annotations, for example to support annotating
105-
array shapes and dtypes, is ongoing.
110+
array shapes (see `gh-16544 <https://github.com/numpy/numpy/issues/16544>`__),
111+
is ongoing.
106112

107113
Platform support
108114
````````````````
109115
We aim to increase our support for different hardware architectures. This
110116
includes adding CI coverage when CI services are available, providing wheels on
111-
PyPI for POWER8/9 (``ppc64le``), providing better build and install
112-
documentation, and resolving build issues on other platforms like AIX.
117+
PyPI for platforms that are in high enough demand (e.g., we added ``musllinux``
118+
ones for NumPy 2.0), and resolving build issues on platforms that we don't test
119+
in CI (e.g., AIX).
120+
121+
We intend to write a NEP covering the support levels we provide and what is
122+
required for a platform to move to a higher tier of support, similar to
123+
`PEP 11 <https://peps.python.org/pep-0011/>`__.
124+
125+
Support for free-threaded CPython
126+
`````````````````````````````````
127+
CPython 3.13 will be the first release to offer a free-threaded build (i.e.,
128+
a CPython build with the GIL disabled). Work is in progress to support this
129+
well in NumPy. After that is stable and complete, there may be opportunities to
130+
actually make use of the potential for performance improvements from
131+
free-threaded CPython, or make it easier to do so for NumPy's users.
132+
133+
Binary size reduction
134+
`````````````````````
135+
The number of downloads of NumPy from PyPI and other platforms continues to
136+
increase - as of May 2024 we're at >200 million downloads/month from PyPI
137+
alone. Reducing the size of an installed NumPy package has many benefits:
138+
faster installs, lower disk space usage, smaller load on PyPI, less
139+
environmental impact, easier to fit more packages on top of NumPy in
140+
resource-constrained environments and platforms like AWS Lambda, lower latency
141+
for Pyodide users, and so on. We aim for significant reductions, as well as
142+
making it easier for end users and packagers to produce smaller custom builds
143+
(e.g., we added support for stripping tests before 2.1.0). See
144+
`gh-25737 <https://github.com/numpy/numpy/issues/25737>`__ for details.
145+
146+
Support use of CPython's limited C API
147+
``````````````````````````````````````
148+
Use of the CPython limited C API, allowing producing ``abi3`` wheels that use
149+
the stable ABI and are hence independent of CPython feature releases, has
150+
benefits for both downstream packages that use NumPy's C API and for NumPy
151+
itself. In NumPy 2.0, work was done to enable using the limited C API with
152+
the Cython support in NumPy (see `gh-25531 <https://github.com/numpy/numpy/pull/25531`__).
153+
More work and testing is needed to ensure full support for downstream packages.
154+
155+
We also want to explore what is needed for NumPy itself to use the limited
156+
C API - this would make testing new CPython dev and pre-release versions across
157+
the ecosystem easier, and significantly reduce the maintenance effort for CI
158+
jobs in NumPy itself.
159+
160+
Create a header-only package for NumPy
161+
``````````````````````````````````````
162+
We have reduced the platform-dependent content in the public NumPy headers to
163+
almost nothing. It is now feasible to create a separate package with only
164+
NumPy headers and a discovery mechanism for them, in order to enable downstream
165+
packages to build against the NumPy C API without having NumPy installed.
166+
This will make it easier/cheaper to use NumPy's C API, especially on more
167+
niche platforms for which we don't provide wheels.
168+
169+
170+
NumPy 2.0 stabilization & downstream usage
171+
------------------------------------------
172+
173+
We made a very large amount of changes (and improvements!) in NumPy 2.0. The
174+
release process has taken a very long time, and part of the ecosystem is still
175+
catching up. We may need to slow down for a while, and possibly help the rest
176+
of the ecosystem with adapting to the ABI and API changes.
177+
178+
We will need to assess the costs and benefits to NumPy itself,
179+
downstream package authors, and end users. Based on that assessment, we need to
180+
come to a conclusion on whether it's realistic to do another ABI-breaking
181+
release again in the future or not. This will also inform the future evolution
182+
of our C API.
183+
184+
185+
Security
186+
--------
187+
188+
NumPy is quite secure - we get only a limited number of reports about potential
189+
vulnerabilities, and most of those are incorrect. We have made strides with a
190+
documented security policy, a private disclosure method, and maintaining an
191+
OpenSSF scorecard (with a high score). However, we have not changed much in how
192+
we approach supply chain security in quite a while. We aim to make improvements
193+
here, for example achieving fully reproducible builds for all the build
194+
artifacts we publish - and providing full provenance information for them.
113195

114196

115197
Maintenance
116198
-----------
117199

118-
- ``MaskedArray`` needs to be improved, ideas include:
200+
- ``numpy.ma`` is still in poor shape and under-maintained. It needs to be
201+
improved, ideas include:
119202

120203
- Rewrite masked arrays to not be a ndarray subclass -- maybe in a separate project?
121204
- MaskedArray as a duck-array type, and/or
122205
- dtypes that support missing values
123206

124-
- Fortran integration via ``numpy.f2py`` requires a number of improvements, see
125-
`this tracking issue <https://github.com/numpy/numpy/issues/14938>`__.
126-
- A backend system for ``numpy.fft`` (so that e.g. ``fft-mkl`` doesn't need to monkeypatch numpy).
127207
- Write a strategy on how to deal with overlap between NumPy and SciPy for ``linalg``.
128-
- Deprecate ``np.matrix`` (very slowly).
208+
- Deprecate ``np.matrix`` (very slowly) - this is feasible once the switch-over
209+
from sparse matrices to sparse arrays in SciPy is complete.
129210
- Add new indexing modes for "vectorized indexing" and "outer indexing" (see :ref:`NEP21`).
130211
- Make the polynomial API easier to use.
131-
- Integrate an improved text file loader.
132-
- Ufunc and gufunc improvements, see `gh-8892 <https://github.com/numpy/numpy/issues/8892>`__
133-
and `gh-11492 <https://github.com/numpy/numpy/issues/11492>`__.
134212

135213

136214
.. _`mypy`: https://mypy.readthedocs.io

0 commit comments

Comments
 (0)