wjakob · hpkfft · Oct 19, 2025 · hpkfft · Oct 17, 2025 · wjakob
diff --git a/docs/api_extra.rst b/docs/api_extra.rst
@@ -1108,6 +1108,11 @@ convert into an equivalent representation in one of the following frameworks:
 
    Builtin Python ``memoryview`` for CPU-resident data.
 
+.. cpp:class:: array_api
+
+   An object that both implements the buffer protocol and also has the
+   ``__dlpack__`` and ``_dlpack_device__`` attributes.
+
 Eigen convenience type aliases
 ------------------------------
 

diff --git a/docs/changelog.rst b/docs/changelog.rst
@@ -22,6 +22,14 @@ Version TBD (not yet released)
   Clang-based Intel compiler). Continuous integration tests have been added to
   ensure compatibility with these compilers on an ongoing basis.
 
+- The framework ``nb::array_api`` is now available to return an nd-array from
+  C++ to Python as an object that supports both the Python buffer protocol as
+  well as the DLPack methods ``__dlpack__`` and ``_dlpack_device__``.
+  Nanobind now supports importing and exporting nd-arrays via capsules that
+  contain the ``DLManagedTensorVersioned`` struct, which has a flag bit
+  indicating the nd-array is read-only.
+  (PR `#1175 <https://github.com/wjakob/nanobind/pull/1175>`__).
+
 Version 2.9.2 (Sep 4, 2025)
 ---------------------------
 

diff --git a/docs/ndarray.rst b/docs/ndarray.rst
@@ -275,12 +275,19 @@ desired Python type.
 - :cpp:class:`nb::tensorflow <tensorflow>`: create a ``tensorflow.python.framework.ops.EagerTensor``.
 - :cpp:class:`nb::jax <jax>`: create a ``jaxlib.xla_extension.DeviceArray``.
 - :cpp:class:`nb::cupy <cupy>`: create a ``cupy.ndarray``.
+- :cpp:class:`nb::memview <memview>`: create a Python ``memoryview``.
+- :cpp:class:`nb::array_api <array_api>`: create an object that supports the
+  Python buffer protocol (i.e., is accepted as an argument to ``memoryview()``)
+  and also has the DLPack attributes  ``__dlpack__`` and ``_dlpack_device__``
+  (i.e., it is accepted as an argument to a framework's ``from_dlpack()``
+  function).
 - No framework annotation. In this case, nanobind will create a raw Python
   ``dltensor`` `capsule <https://docs.python.org/3/c-api/capsule.html>`__
-  representing the `DLPack <https://github.com/dmlc/dlpack>`__ metadata.
+  representing the `DLPack <https://github.com/dmlc/dlpack>`__ metadata of
+  a ``DLManagedTensor``.
 
 This annotation also affects the auto-generated docstring of the function,
-which in this case becomes:
+which in this example's case becomes:
 
 .. code-block:: python
 
@@ -458,6 +465,21 @@ interpreted as follows:
 - :cpp:enumerator:`rv_policy::move` is unsupported and demoted to
   :cpp:enumerator:`rv_policy::copy`.
 
+Note that when a copy is returned, the copy is made by the framework, not by
+nanobind itself.
+For example, ``numpy.array()`` is passed the keyword argument ``copy`` with
+value ``True``, or the PyTorch tensor's ``clone()`` method is immediately
+called to create the copy.
+This design has a couple of advantages.
+First, nanobind does not have a build-time dependency on the libraries and
+frameworks (NumPy, PyTorch, CUDA, etc.) that would otherwise be necessary
+to perform the copy.
+Second, frameworks have the opportunity to optimize how the copy is created.
+The copy is owned by the framework, so the framework can choose to use a custom
+memory allocator, over-align the data, etc. based on the nd-array's size,
+the specific CPU, GPU, or memory types detected, etc.
+
+
 .. _ndarray-temporaries:
 
 Returning temporaries
@@ -643,26 +665,92 @@ support inter-framework data exchange, custom array types should implement the
 - `__dlpack__ <https://data-apis.org/array-api/latest/API_specification/generated/array_api.array.__dlpack__.html#array_api.array.__dlpack__>`__ and
 - `__dlpack_device__ <https://data-apis.org/array-api/latest/API_specification/generated/array_api.array.__dlpack_device__.html#array_api.array.__dlpack_device__>`__
 
-methods. This is easy thanks to the nd-array integration in nanobind. An example is shown below:
+methods.
+These, as well as the buffer protocol, are implemented in the object returned
+by nanobind when specifying :cpp:class:`nb::array_api <array_api>` as the
+framework template parameter.
+For example:
 
 .. code-block:: cpp
 
-   nb::class_<MyArray>(m, "MyArray")
-      // ...
-      .def("__dlpack__", [](nb::kwargs kwargs) {
-          return nb::ndarray<>( /* ... */);
-      })
-      .def("__dlpack_device__", []() {
-          return std::make_pair(nb::device::cpu::value, 0);
-      });
+    class MyArray {
+        double* d;
+     public:
+        MyArray() { d = new double[5] { 0.0, 1.0, 2.0, 3.0, 4.0 }; }
+        ~MyArray() { delete[] d; }
+        double* data() const { return d; }
+    };
+
+    nb::class_<MyArray>(m, "MyArray")
+       .def(nb::init<>())
+       .def("array_api", [](const MyArray& self) {
+               return nb::ndarray<nb::array_api, double>(self.data(), {5});
+           }, nb::rv_policy::reference_internal);
+
+which can be used as follows:
+
+.. code-block:: pycon
 
-Returning a raw :cpp:class:`nb::ndarray <ndarray>` without framework annotation
-will produce a DLPack capsule, which is what the interface expects.
+    >>> import my_extension
+    >>> ma = my_extension.MyArray()
+    >>> aa = ma.array_api()
+    >>> aa.__dlpack_device__()
+    (1, 0)
+    >>> import numpy as np
+    >>> x = np.from_dlpack(aa)
+    >>> x
+    array([0., 1., 2., 3., 4.])
+
+The DLPack methods can also be provided for the class itself, by implementing
+``__dlpack__()`` as a wrapper function.
+For example, by adding the following lines to the binding:
+
+.. code-block:: cpp
+
+       .def("__dlpack__", [](nb::pointer_and_handle<MyArray> self,
+                             nb::kwargs kwargs) {
+               using array_api_t = nb::ndarray<nb::array_api, double>;
+               nb::object aa = nb::cast(array_api_t(self.p->data(), {5}),
+                                        nb::rv_policy::reference_internal,
+                                        self.h);
+               nb::object max = kwargs.get("max_version", nb::none());
+               return aa.attr("__dlpack__")(nb::arg("max_version") = max);
+           })
+       .def("__dlpack_device__", [](nb::handle /*self*/) {
+               return std::make_pair(nb::device::cpu::value, 0);
+           })
+
+the class can be used as follows:
+
+.. code-block:: pycon
+
+    >>> import my_extension
+    >>> ma = my_extension.MyArray()
+    >>> ma.__dlpack_device__()
+    (1, 0)
+    >>> import numpy as np
+    >>> y = np.from_dlpack(ma)
+    >>> y
+    array([0., 1., 2., 3., 4.])
+
+
+The ``kwargs`` argument in the implementation of ``__dlpack__`` above can be
+used to support additional parameters (e.g., to allow the caller to request a
+copy).  Please see the DLPack documentation for details.
+
+The caller may or may not supply the keyword argument ``max_version``.
+If it is not supplied or has the value ``None``, nanobind will return an
+unversioned ``DLManagedTensor`` in a capsule named ``dltensor``.
+If its value is a tuple of integers ``(major_version, minor_version)`` and the
+major version is at least 1, nanobind will return a ``DLManagedTensorVersioned``
+in a capsule named ``dltensor_versioned``.
+Nanobind ignores other keyword arguments.
+In particular, it cannot transfer the array's data to another device (such as
+a GPU), nor can it make a copy of the data.
+A custom class (such as ``MyArray`` above) could provide such functionality.
+Often, the caller framework takes care of copying and inter-device data
+transfer and does not ask the producer, ``MyArray``, to perform them.
 
-The ``kwargs`` argument can be used to provide additional parameters (for
-example to request a copy), please see the DLPack documentation for details.
-Note that nanobind does not yet implement the versioned DLPack protocol. The
-version number should be ignored for now.
 
 Frequently asked questions
 --------------------------
@@ -708,7 +796,3 @@ be more restrictive. Presently supported dtypes include signed/unsigned
 integers, floating point values, complex numbers, and boolean values. Some
 :ref:`nonstandard arithmetic types <ndarray-nonstandard>` can be supported as
 well.
-
-Nanobind can receive and return *read-only* arrays via the buffer protocol when
-exhanging data with NumPy. The DLPack interface currently ignores this
-annotation.
diff --git a/include/nanobind/nb_defs.h b/include/nanobind/nb_defs.h
@@ -209,7 +209,7 @@
     X(const X &) = delete;                                                     \
     X &operator=(const X &) = delete;
 
-#define NB_MOD_STATE_SIZE 80
+#define NB_MOD_STATE_SIZE 96
 
 // Helper macros to ensure macro arguments are expanded before token pasting/stringification
 #define NB_MODULE_IMPL(name, variable) NB_MODULE_IMPL2(name, variable)

diff --git a/include/nanobind/nb_lib.h b/include/nanobind/nb_lib.h
@@ -12,8 +12,8 @@ NAMESPACE_BEGIN(NB_NAMESPACE)
 NAMESPACE_BEGIN(dlpack)
 
 // The version of DLPack that is supported by libnanobind
-static constexpr uint32_t major_version = 0;
-static constexpr uint32_t minor_version = 0;
+static constexpr uint32_t major_version = 1;
+static constexpr uint32_t minor_version = 1;
 
 // Forward declarations for types in ndarray.h (1)
 struct dltensor;
@@ -289,7 +289,7 @@ NB_CORE PyObject *capsule_new(const void *ptr, const char *name,
 struct func_data_prelim_base;
 
 /// Create a Python function object for the given function record
-NB_CORE PyObject *nb_func_new(const func_data_prelim_base *data) noexcept;
+NB_CORE PyObject *nb_func_new(const func_data_prelim_base *f) noexcept;
 
 // ========================================================================
 
@@ -481,7 +481,7 @@ NB_CORE ndarray_handle *ndarray_import(PyObject *o,
                                        cleanup_list *cleanup) noexcept;
 
 // Describe a local ndarray object using a DLPack capsule
-NB_CORE ndarray_handle *ndarray_create(void *value, size_t ndim,
+NB_CORE ndarray_handle *ndarray_create(void *data, size_t ndim,
                                        const size_t *shape, PyObject *owner,
                                        const int64_t *strides,
                                        dlpack::dtype dtype, bool ro,

diff --git a/include/nanobind/ndarray.h b/include/nanobind/ndarray.h
@@ -18,11 +18,16 @@
 
 NAMESPACE_BEGIN(NB_NAMESPACE)
 
-/// dlpack API/ABI data structures are part of a separate namespace
+/// DLPack API/ABI data structures are part of a separate namespace.
 NAMESPACE_BEGIN(dlpack)
 
 enum class dtype_code : uint8_t {
-    Int = 0, UInt = 1, Float = 2, Bfloat = 4, Complex = 5, Bool = 6
+    Int = 0, UInt = 1, Float = 2, Bfloat = 4, Complex = 5, Bool = 6,
+    Float8_E3M4 = 7, Float8_E4M3 = 8, Float8_E4M3B11FNUZ = 9,
+    Float8_E4M3FN = 10, Float8_E4M3FNUZ = 11, Float8_E5M2 = 12,
+    Float8_E5M2FNUZ = 13, Float8_E8M0FNU = 14,
+    Float6_E2M3FN = 15, Float6_E3M2FN = 16,
+    Float4_E2M1FN = 17
 };
 
 struct device {
@@ -86,6 +91,7 @@ NB_FRAMEWORK(tensorflow, 3, "tensorflow.python.framework.ops.EagerTensor");
 NB_FRAMEWORK(jax, 4, "jaxlib.xla_extension.DeviceArray");
 NB_FRAMEWORK(cupy, 5, "cupy.ndarray");
 NB_FRAMEWORK(memview, 6, "memoryview");
+NB_FRAMEWORK(array_api, 7, "ArrayLike");
 
 NAMESPACE_BEGIN(device)
 NB_DEVICE(none, 0); NB_DEVICE(cpu, 1); NB_DEVICE(cuda, 2);

diff --git a/src/nb_internals.cpp b/src/nb_internals.cpp
@@ -168,6 +168,8 @@ PyTypeObject *nb_meta_cache = nullptr;
 static const char* interned_c_strs[pyobj_name::string_count] {
     "value",
     "copy",
+    "clone",
+    "array",
     "from_dlpack",
     "__dlpack__",
     "max_version",

diff --git a/src/nb_internals.h b/src/nb_internals.h
@@ -426,6 +426,8 @@ struct pyobj_name {
     enum : int {
         value_str = 0,      // string "value"
         copy_str,           // string "copy"
+        clone_str,          // string "clone"
+        array_str,          // string "array"
         from_dlpack_str,    // string "from_dlpack"
         dunder_dlpack_str,  // string "__dlpack__"
         max_version_str,    // string "max_version"
@@ -490,11 +492,12 @@ inline void *inst_ptr(nb_inst *self) {
 }
 
 template <typename T> struct scoped_pymalloc {
-    scoped_pymalloc(size_t size = 1) {
-        ptr = (T *) PyMem_Malloc(size * sizeof(T));
+    scoped_pymalloc(size_t size = 1, size_t extra_bytes = 0) {
+        // Tip: construct objects in the extra bytes using placement new.
+        ptr = (T *) PyMem_Malloc(size * sizeof(T) + extra_bytes);
         if (!ptr)
             fail("scoped_pymalloc(): could not allocate %llu bytes of memory!",
-                 (unsigned long long) size);
+                 (unsigned long long) (size * sizeof(T) + extra_bytes));
     }
     ~scoped_pymalloc() { PyMem_Free(ptr); }
     T *release() {