Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions docs/api_extra.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1108,6 +1108,11 @@ convert into an equivalent representation in one of the following frameworks:

Builtin Python ``memoryview`` for CPU-resident data.

.. cpp:class:: array_api

An object that both implements the buffer protocol and also has the
``__dlpack__`` and ``_dlpack_device__`` attributes.

Eigen convenience type aliases
------------------------------

Expand Down
8 changes: 8 additions & 0 deletions docs/changelog.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,14 @@ Version TBD (not yet released)
Clang-based Intel compiler). Continuous integration tests have been added to
ensure compatibility with these compilers on an ongoing basis.

- The framework ``nb::array_api`` is now available to return an nd-array from
C++ to Python as an object that supports both the Python buffer protocol as
well as the DLPack methods ``__dlpack__`` and ``_dlpack_device__``.
Nanobind now supports importing and exporting nd-arrays via capsules that
contain the ``DLManagedTensorVersioned`` struct, which has a flag bit
indicating the nd-array is read-only.
(PR `#1175 <https://github.com/wjakob/nanobind/pull/1175>`__).

Version 2.9.2 (Sep 4, 2025)
---------------------------

Expand Down
126 changes: 105 additions & 21 deletions docs/ndarray.rst
Original file line number Diff line number Diff line change
Expand Up @@ -275,12 +275,19 @@ desired Python type.
- :cpp:class:`nb::tensorflow <tensorflow>`: create a ``tensorflow.python.framework.ops.EagerTensor``.
- :cpp:class:`nb::jax <jax>`: create a ``jaxlib.xla_extension.DeviceArray``.
- :cpp:class:`nb::cupy <cupy>`: create a ``cupy.ndarray``.
- :cpp:class:`nb::memview <memview>`: create a Python ``memoryview``.
- :cpp:class:`nb::array_api <array_api>`: create an object that supports the
Python buffer protocol (i.e., is accepted as an argument to ``memoryview()``)
and also has the DLPack attributes ``__dlpack__`` and ``_dlpack_device__``
(i.e., it is accepted as an argument to a framework's ``from_dlpack()``
function).
- No framework annotation. In this case, nanobind will create a raw Python
``dltensor`` `capsule <https://docs.python.org/3/c-api/capsule.html>`__
representing the `DLPack <https://github.com/dmlc/dlpack>`__ metadata.
representing the `DLPack <https://github.com/dmlc/dlpack>`__ metadata of
a ``DLManagedTensor``.

This annotation also affects the auto-generated docstring of the function,
which in this case becomes:
which in this example's case becomes:

.. code-block:: python

Expand Down Expand Up @@ -458,6 +465,21 @@ interpreted as follows:
- :cpp:enumerator:`rv_policy::move` is unsupported and demoted to
:cpp:enumerator:`rv_policy::copy`.

Note that when a copy is returned, the copy is made by the framework, not by
nanobind itself.
For example, ``numpy.array()`` is passed the keyword argument ``copy`` with
value ``True``, or the PyTorch tensor's ``clone()`` method is immediately
called to create the copy.
This design has a couple of advantages.
First, nanobind does not have a build-time dependency on the libraries and
frameworks (NumPy, PyTorch, CUDA, etc.) that would otherwise be necessary
to perform the copy.
Second, frameworks have the opportunity to optimize how the copy is created.
The copy is owned by the framework, so the framework can choose to use a custom
memory allocator, over-align the data, etc. based on the nd-array's size,
the specific CPU, GPU, or memory types detected, etc.


.. _ndarray-temporaries:

Returning temporaries
Expand Down Expand Up @@ -643,26 +665,92 @@ support inter-framework data exchange, custom array types should implement the
- `__dlpack__ <https://data-apis.org/array-api/latest/API_specification/generated/array_api.array.__dlpack__.html#array_api.array.__dlpack__>`__ and
- `__dlpack_device__ <https://data-apis.org/array-api/latest/API_specification/generated/array_api.array.__dlpack_device__.html#array_api.array.__dlpack_device__>`__

methods. This is easy thanks to the nd-array integration in nanobind. An example is shown below:
methods.
These, as well as the buffer protocol, are implemented in the object returned
by nanobind when specifying :cpp:class:`nb::array_api <array_api>` as the
framework template parameter.
For example:

.. code-block:: cpp

nb::class_<MyArray>(m, "MyArray")
// ...
.def("__dlpack__", [](nb::kwargs kwargs) {
return nb::ndarray<>( /* ... */);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this works (implementing a Python method as a lambda), or I'm missing something interesting.
I don't see how the /* ... */ can access the this pointer (to get a pointer to the actual data in MyArray).
In the new documentation, I added member functions to MyArray and used them by name in the binding.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can implement a lambda function that accesses self, either by taking a C++ type as first argument, by taking a nb::handle as first argument, or by taking nb::pointer_and_handle<T> that gives you both.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I changed the example to use a lambda, since it's nice to show that these methods can be added in a binding without having to change the C++ class.

})
.def("__dlpack_device__", []() {
return std::make_pair(nb::device::cpu::value, 0);
});
class MyArray {
double* d;
public:
MyArray() { d = new double[5] { 0.0, 1.0, 2.0, 3.0, 4.0 }; }
~MyArray() { delete[] d; }
double* data() const { return d; }
};

nb::class_<MyArray>(m, "MyArray")
.def(nb::init<>())
.def("array_api", [](const MyArray& self) {
return nb::ndarray<nb::array_api, double>(self.data(), {5});
}, nb::rv_policy::reference_internal);

which can be used as follows:

.. code-block:: pycon

Returning a raw :cpp:class:`nb::ndarray <ndarray>` without framework annotation
will produce a DLPack capsule, which is what the interface expects.
>>> import my_extension
>>> ma = my_extension.MyArray()
>>> aa = ma.array_api()
>>> aa.__dlpack_device__()
(1, 0)
>>> import numpy as np
>>> x = np.from_dlpack(aa)
>>> x
array([0., 1., 2., 3., 4.])

The DLPack methods can also be provided for the class itself, by implementing
``__dlpack__()`` as a wrapper function.
For example, by adding the following lines to the binding:

.. code-block:: cpp

.def("__dlpack__", [](nb::pointer_and_handle<MyArray> self,
nb::kwargs kwargs) {
using array_api_t = nb::ndarray<nb::array_api, double>;
nb::object aa = nb::cast(array_api_t(self.p->data(), {5}),
nb::rv_policy::reference_internal,
self.h);
nb::object max = kwargs.get("max_version", nb::none());
return aa.attr("__dlpack__")(nb::arg("max_version") = max);
})
.def("__dlpack_device__", [](nb::handle /*self*/) {
return std::make_pair(nb::device::cpu::value, 0);
})

the class can be used as follows:

.. code-block:: pycon

>>> import my_extension
>>> ma = my_extension.MyArray()
>>> ma.__dlpack_device__()
(1, 0)
>>> import numpy as np
>>> y = np.from_dlpack(ma)
>>> y
array([0., 1., 2., 3., 4.])


The ``kwargs`` argument in the implementation of ``__dlpack__`` above can be
used to support additional parameters (e.g., to allow the caller to request a
copy). Please see the DLPack documentation for details.

The caller may or may not supply the keyword argument ``max_version``.
If it is not supplied or has the value ``None``, nanobind will return an
unversioned ``DLManagedTensor`` in a capsule named ``dltensor``.
If its value is a tuple of integers ``(major_version, minor_version)`` and the
major version is at least 1, nanobind will return a ``DLManagedTensorVersioned``
in a capsule named ``dltensor_versioned``.
Nanobind ignores other keyword arguments.
In particular, it cannot transfer the array's data to another device (such as
a GPU), nor can it make a copy of the data.
A custom class (such as ``MyArray`` above) could provide such functionality.
Often, the caller framework takes care of copying and inter-device data
transfer and does not ask the producer, ``MyArray``, to perform them.

The ``kwargs`` argument can be used to provide additional parameters (for
example to request a copy), please see the DLPack documentation for details.
Note that nanobind does not yet implement the versioned DLPack protocol. The
version number should be ignored for now.

Frequently asked questions
--------------------------
Expand Down Expand Up @@ -708,7 +796,3 @@ be more restrictive. Presently supported dtypes include signed/unsigned
integers, floating point values, complex numbers, and boolean values. Some
:ref:`nonstandard arithmetic types <ndarray-nonstandard>` can be supported as
well.

Nanobind can receive and return *read-only* arrays via the buffer protocol when
exhanging data with NumPy. The DLPack interface currently ignores this
annotation.
2 changes: 1 addition & 1 deletion include/nanobind/nb_defs.h
Original file line number Diff line number Diff line change
Expand Up @@ -209,7 +209,7 @@
X(const X &) = delete; \
X &operator=(const X &) = delete;

#define NB_MOD_STATE_SIZE 80
#define NB_MOD_STATE_SIZE 96

// Helper macros to ensure macro arguments are expanded before token pasting/stringification
#define NB_MODULE_IMPL(name, variable) NB_MODULE_IMPL2(name, variable)
Expand Down
8 changes: 4 additions & 4 deletions include/nanobind/nb_lib.h
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,8 @@ NAMESPACE_BEGIN(NB_NAMESPACE)
NAMESPACE_BEGIN(dlpack)

// The version of DLPack that is supported by libnanobind
static constexpr uint32_t major_version = 0;
static constexpr uint32_t minor_version = 0;
static constexpr uint32_t major_version = 1;
static constexpr uint32_t minor_version = 1;

// Forward declarations for types in ndarray.h (1)
struct dltensor;
Expand Down Expand Up @@ -289,7 +289,7 @@ NB_CORE PyObject *capsule_new(const void *ptr, const char *name,
struct func_data_prelim_base;

/// Create a Python function object for the given function record
NB_CORE PyObject *nb_func_new(const func_data_prelim_base *data) noexcept;
NB_CORE PyObject *nb_func_new(const func_data_prelim_base *f) noexcept;

// ========================================================================

Expand Down Expand Up @@ -481,7 +481,7 @@ NB_CORE ndarray_handle *ndarray_import(PyObject *o,
cleanup_list *cleanup) noexcept;

// Describe a local ndarray object using a DLPack capsule
NB_CORE ndarray_handle *ndarray_create(void *value, size_t ndim,
NB_CORE ndarray_handle *ndarray_create(void *data, size_t ndim,
const size_t *shape, PyObject *owner,
const int64_t *strides,
dlpack::dtype dtype, bool ro,
Expand Down
10 changes: 8 additions & 2 deletions include/nanobind/ndarray.h
Original file line number Diff line number Diff line change
Expand Up @@ -18,11 +18,16 @@

NAMESPACE_BEGIN(NB_NAMESPACE)

/// dlpack API/ABI data structures are part of a separate namespace
/// DLPack API/ABI data structures are part of a separate namespace.
NAMESPACE_BEGIN(dlpack)

enum class dtype_code : uint8_t {
Int = 0, UInt = 1, Float = 2, Bfloat = 4, Complex = 5, Bool = 6
Int = 0, UInt = 1, Float = 2, Bfloat = 4, Complex = 5, Bool = 6,
Float8_E3M4 = 7, Float8_E4M3 = 8, Float8_E4M3B11FNUZ = 9,
Float8_E4M3FN = 10, Float8_E4M3FNUZ = 11, Float8_E5M2 = 12,
Float8_E5M2FNUZ = 13, Float8_E8M0FNU = 14,
Float6_E2M3FN = 15, Float6_E3M2FN = 16,
Float4_E2M1FN = 17
};

struct device {
Expand Down Expand Up @@ -86,6 +91,7 @@ NB_FRAMEWORK(tensorflow, 3, "tensorflow.python.framework.ops.EagerTensor");
NB_FRAMEWORK(jax, 4, "jaxlib.xla_extension.DeviceArray");
NB_FRAMEWORK(cupy, 5, "cupy.ndarray");
NB_FRAMEWORK(memview, 6, "memoryview");
NB_FRAMEWORK(array_api, 7, "ArrayLike");

NAMESPACE_BEGIN(device)
NB_DEVICE(none, 0); NB_DEVICE(cpu, 1); NB_DEVICE(cuda, 2);
Expand Down
2 changes: 2 additions & 0 deletions src/nb_internals.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -168,6 +168,8 @@ PyTypeObject *nb_meta_cache = nullptr;
static const char* interned_c_strs[pyobj_name::string_count] {
"value",
"copy",
"clone",
"array",
"from_dlpack",
"__dlpack__",
"max_version",
Expand Down
9 changes: 6 additions & 3 deletions src/nb_internals.h
Original file line number Diff line number Diff line change
Expand Up @@ -426,6 +426,8 @@ struct pyobj_name {
enum : int {
value_str = 0, // string "value"
copy_str, // string "copy"
clone_str, // string "clone"
array_str, // string "array"
from_dlpack_str, // string "from_dlpack"
dunder_dlpack_str, // string "__dlpack__"
max_version_str, // string "max_version"
Expand Down Expand Up @@ -490,11 +492,12 @@ inline void *inst_ptr(nb_inst *self) {
}

template <typename T> struct scoped_pymalloc {
scoped_pymalloc(size_t size = 1) {
ptr = (T *) PyMem_Malloc(size * sizeof(T));
scoped_pymalloc(size_t size = 1, size_t extra_bytes = 0) {
// Tip: construct objects in the extra bytes using placement new.
ptr = (T *) PyMem_Malloc(size * sizeof(T) + extra_bytes);
if (!ptr)
fail("scoped_pymalloc(): could not allocate %llu bytes of memory!",
(unsigned long long) size);
(unsigned long long) (size * sizeof(T) + extra_bytes));
}
~scoped_pymalloc() { PyMem_Free(ptr); }
T *release() {
Expand Down
Loading