DOC: update the NumPy Roadmap

rgommers · rgommers · commit a94d2078f3db · 2024-05-22T20:51:45.000+02:00
With NumPy 2.0 RCs available, that release is feature-complete
and a lot on the roadmap is outdated. So time for a large update.

[skip actions] [skip cirrus] [skip azp]
diff --git a/doc/neps/roadmap.rst b/doc/neps/roadmap.rst
@@ -18,25 +18,19 @@ may include (among other things) interoperability protocols, better duck typing
 support and ndarray subclass handling.
 
 The key goal is: *make it easy for code written for NumPy to also work with
-other NumPy-like projects.* This will enable GPU support via, e.g, CuPy or JAX,
+other NumPy-like projects.* This will enable GPU support via, e.g, CuPy, JAX or PyTorch,
 distributed array support via Dask, and writing special-purpose arrays (either
 from scratch, or as a ``numpy.ndarray`` subclass) that work well with SciPy,
-scikit-learn and other such packages.
+scikit-learn and other such packages. A large step forward in this area was
+made in NumPy 2.0, with adoption of and compliance with the array API standard
+(v2022.12, see :ref:`NEP47`). Future work in this direction will include
+support for newer versions of the array API standard, and adding features as
+needed based on real-world experience and needs.
 
-The ``__array_ufunc__`` and ``__array_function__`` protocols are stable, but
-do not cover the whole API.  New protocols for overriding other functionality
-in NumPy are needed. Work in this area aims to bring to completion one or more
-of the following proposals:
-
-- :ref:`NEP30`
-- :ref:`NEP31`
-- :ref:`NEP35`
-- :ref:`NEP37`
-
-In addition we aim to provide ways to make it easier for other libraries to
-implement a NumPy-compatible API. This may include defining consistent subsets
-of the API, as discussed in `this section of NEP 37
-<https://numpy.org/neps/nep-0037-array-module.html#requesting-restricted-subsets-of-numpy-s-api>`__.
+In addition, the ``__array_ufunc__`` and ``__array_function__`` protocols
+fulfill a role here - they are stable and used by several downstream projects.
+They do not cover the whole API, so use of the array API standard is preferred
+for new code.
 
 
 Performance
@@ -46,17 +40,25 @@ Improvements to NumPy's performance are important to many users. We have
 focused this effort on Universal SIMD (see :ref:`NEP38`) intrinsics which
 provide nice improvements across various hardware platforms via an abstraction
 layer.  The infrastructure is in place, and we welcome follow-on PRs to add
-SIMD support across all relevant NumPy functions.
+SIMD support across relevant NumPy functionality.
+
+Transitioning from C to C++, both in the SIMD infrastructure and in NumPy
+internals more widely, is in progress. We have also started to make use of
+Google Highway (see :ref:`NEP54`), and that usage is likely to expand. Work
+towards support for newer SIMD instruction sets, like SVE on arm64, is ongoing.
 
 Other performance improvement ideas include:
 
 - A better story around parallel execution.
 - Optimizations in individual functions.
-- Reducing ufunc and ``__array_function__`` overhead.
 
 Furthermore we would like to improve the benchmarking system, in terms of coverage,
-easy of use, and publication of the results (now
-`here <https://pv.github.io/numpy-bench>`__) as part of the docs or website.
+easy of use, and publication of the results. Benchmarking PRs/branches compared
+to the `main` branch is a primary purpose, and required for PRs that are
+performance-focused (e.g., adding SIMD acceleration to a function). In
+addition, we'd like a performance overview like the one we had `here
+<https://pv.github.io/numpy-bench>`__, set up in a way that is more
+maintainable long-term.
 
 
 Documentation and website
@@ -68,69 +70,115 @@ documentation on many topics are missing or outdated. See :ref:`NEP44` for
 planned improvements. Adding more tutorials is underway in the
 `numpy-tutorials repo <https://github.com/numpy/numpy-tutorials>`__.
 
-Our website (https://numpy.org) was completely redesigned recently. We aim to
-further improve it by adding translations, more case studies and other
-high-level content, and more (see `this tracking issue <https://github.com/numpy/numpy.org/issues/266>`__).
+We also intend to make all the example code in our documentation interactive -
+work is underway to do so via ``jupyterlite-sphinx`` and Pyodide.
+
+Our website (https://numpy.org) is in good shape. Further work on expanding the
+number of languages that the website is translated in is desirable. As are
+improvements to the interactive notebook widget, through JupyterLite.
 
 
 Extensibility
 -------------
 
-We aim to make it much easier to extend NumPy. The primary topic here is to
-improve the dtype system - see :ref:`NEP41` and related NEPs linked from it.
-Concrete goals for the dtype system rewrite are:
-
-- Easier custom dtypes:
+We aim to continue making it easier to extend NumPy. The primary topic here is to
+improve the dtype system - see for example :ref:`NEP41` and related NEPs linked
+from it. In NumPy 2.0, a new C API for user-defined dtypes was made public. We aim
+to encourage its usage and improve this API further.
 
-  - Simplify and/or wrap the current C-API
-  - More consistent support for dtype metadata
-  - Support for writing a dtype in Python
+Ideas for new dtypes that may be developed outside of the main NumPy repository
+first, and that could potentially be upstreamed into NumPy later, include:
 
-- Allow adding (a) new string dtype(s). This could be encoded strings with
-  fixed-width storage (e.g., ``utf8`` or ``latin1``), and/or a variable length
-  string dtype. The latter could share an implementation with ``dtype=object``,
-  but be explicitly type-checked.
-  One of these should probably be the default for text data. The current
-  string dtype support is neither efficient nor user friendly.
+- A quad-precision (128-bit) dtype
+- A ``bfloat16`` dtype
+- A fixed-width string dtype which supports encodings (e.g., ``utf8`` or
+  ``latin1``)
+- A unit dtype
 
 
 User experience
 ---------------
 
 Type annotations
 ````````````````
-NumPy 1.20 adds type annotations for most NumPy functionality, so users can use
-tools like `mypy`_ to type check their code and IDEs can improve their support
+Type annotations for most NumPy functionality is complete (although some
+submodules like ``numpy.ma`` are missing return types), so users can use tools
+like `mypy`_ to type check their code and IDEs can improve their support
 for NumPy. Improving those type annotations, for example to support annotating
-array shapes and dtypes, is ongoing.
+array shapes (see `gh-16544 <https://github.com/numpy/numpy/issues/16544>`__),
+is ongoing.
 
 Platform support
 ````````````````
 We aim to increase our support for different hardware architectures. This
 includes adding CI coverage when CI services are available, providing wheels on
-PyPI for POWER8/9 (``ppc64le``), providing better build and install
-documentation, and resolving build issues on other platforms like AIX.
+PyPI for platforms that are in high enough demand (e.g., we added ``musllinux``
+ones for NumPy 2.0), and resolving build issues on platforms that we don't test
+in CI (e.g., AIX).
+
+We intend to write a NEP covering the support levels we provide and what is
+required for a platform to move to a higher tier of support, similar to
+`PEP 11 <https://peps.python.org/pep-0011/>`__.
+
+CPython 3.13 will be the first release to offer a free-threaded build (i.e.,
+a CPython build with the GIL disabled). Work is in progress to support this
+well in NumPy. After that is stable and complete, there may be opportunities to
+actually make use of the potential for performance improvements from
+free-threaded CPython, or make it easier to do so for NumPy's users.
+
+Binary size reduction
+`````````````````````
+The number of downloads of NumPy from PyPI and other platforms continues to
+increase - as of May 2024 we're at >200 million downloads/month from PyPI
+alone). Reducing the size of an installed NumPy package has many benefits:
+faster installs, lower disk space usage, smaller load on PyPI, less
+environmental impact, easier to fit more packages on top of NumPy into an AWS
+Lambda layer, lower latency for Pyodide users, and so on. We aim for
+significant reductions, as well as making it easier for end users and packagers
+to produce smaller custom builds (e.g., we added support for stripping tests
+before 2.1.0). See `gh-25737 <https://github.com/numpy/numpy/issues/25737>`__
+for details.
+
+
+NumPy 2.0 stabilization & downstream usage
+------------------------------------------
+
+We made a very large amount of changes (and improvements!) in NumPy 2.0. The
+release process has taken a very long time, and part of the ecosystem is still
+catching up. We may need to slow down for a while, and possible help the rest
+of the ecosystem with adapting to the ABI and API changes.
+
+We will need to assess the costs and benefits to NumPy itself,
+downstream package authors, and end users. Based on that assessment
+
+
+Security
+--------
+
+NumPy is quite secure - we get only a limited number of reports about potential
+vulnerabilities, and most of those are incorrect. We have made strides with a
+documented security policy, a private disclosure method, and maintaining an
+OpenSSF scorecard (with a high score). However, we have not changed much in how
+we approach supply chain security in quite a while. We aim to make improvements
+here, for example achieving fully reproducible builds for all the build
+artifacts we publish - and providing full provenance information for them.
 
 
 Maintenance
 -----------
 
-- ``MaskedArray`` needs to be improved, ideas include:
+- ``numpy.ma`` is still in poor shape and under-maintained. It needs to be
+  improved, ideas include:
 
   - Rewrite masked arrays to not be a ndarray subclass -- maybe in a separate project?
   - MaskedArray as a duck-array type, and/or
   - dtypes that support missing values
 
-- Fortran integration via ``numpy.f2py`` requires a number of improvements, see
-  `this tracking issue <https://github.com/numpy/numpy/issues/14938>`__.
-- A backend system for ``numpy.fft`` (so that e.g. ``fft-mkl`` doesn't need to monkeypatch numpy).
 - Write a strategy on how to deal with overlap between NumPy and SciPy for ``linalg``.
-- Deprecate ``np.matrix`` (very slowly).
+- Deprecate ``np.matrix`` (very slowly) - this is feasible ones the switch-over
+  from sparse matrices to sparse arrays in SciPy is complete.
 - Add new indexing modes for "vectorized indexing" and "outer indexing" (see :ref:`NEP21`).
 - Make the polynomial API easier to use.
-- Integrate an improved text file loader.
-- Ufunc and gufunc improvements, see `gh-8892 <https://github.com/numpy/numpy/issues/8892>`__
-  and `gh-11492 <https://github.com/numpy/numpy/issues/11492>`__.
 
 
 .. _`mypy`: https://mypy.readthedocs.io