-
Notifications
You must be signed in to change notification settings - Fork 255
Add support to ndarray for DLPack version 1 #1175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @hpkfft,
this looks great, here is a first batch of comments from me. I feel like this change also needs some documentation.
If I have a project using nb::ndarray
, what do I need to benefit from the new interfaces? Can I opt out? What are the implications on compatibility? These questions are both relevant for code accepting dlpack-capable objects, and for returning them.
Thanks!
if (framework == numpy::value) { | ||
try { | ||
static PyObject* const array_str = PyUnicode_FromString("array"); | ||
#if PY_VERSION_HEX < 0x03090000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious what's going on here. Is this a performance optimization? Why is it needed? Should we instead improve operator()
to dispatch the call more efficiently?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, using PyObject_VectorcallMethod()
directly is only done as a performance optimization. It's faster to customize the call site and use static objects (the pre-made tuple copy_tpl
for the kwnames
argument).
I don't see how to improve operator()
generally since it has to work at any call site. In other words, it must create a tuple of keyword names at runtime. I suppose there's a way to do something (similar to JIT compiling), but that's beyond the scope of this PR.
Philosophically, I think it OK to use the low-level Python C-API from within nanobind itself to squeeze that last drop of performance.
nb::class_<MyArray>(m, "MyArray") | ||
// ... | ||
.def("__dlpack__", [](nb::kwargs kwargs) { | ||
return nb::ndarray<>( /* ... */); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this works (implementing a Python method as a lambda), or I'm missing something interesting.
I don't see how the /* ... */
can access the this
pointer (to get a pointer to the actual data in MyArray
).
In the new documentation, I added member functions to MyArray
and used them by name in the binding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can implement a lambda function that accesses self
, either by taking a C++ type as first argument, by taking a nb::handle
as first argument, or by taking nb::pointer_and_handle<T>
that gives you both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I changed the example to use a lambda, since it's nice to show that these methods can be added in a binding without having to change the C++ class.
Is this still a draft PR? |
docs/api_extra.rst
Outdated
|
||
Builtin Python ``memoryview`` for CPU-resident data. | ||
|
||
.. cpp:class:: arrayapi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about array_api?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I slightly prefer arrayapi
, without the underscore, copying the style of memview
.
In the example code, I think
using arrayapi_t = nb::ndarray<nb::arrayapi, double>;
looks a bit nicer than
using array_api_t = nb::ndarray<nb::array_api, double>;
But I'd be happy to change it if you prefer, so don't hesitate to say so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer array_api
since these are separate words (always written with a space in public communications). For the memview I did not put a separator because even the python type does not use one. (Though arguably I should have written the long version "memoryview" to be 100% consistent, oh well..)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I probably did not do that because it was already taken.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
include/nanobind/ndarray.h
Outdated
enum class dtype_code : uint8_t { | ||
Int = 0, UInt = 1, Float = 2, Bfloat = 4, Complex = 5, Bool = 6 | ||
Int = 0, UInt = 1, Float = 2, Bfloat = 4, Complex = 5, Bool = 6, | ||
Float8_e3m4 = 7, Float8_e4m3 = 8, Float8_e4m3b11fnuz = 9, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: I would prefer the letters to be uppercase. e.g. Float8_E4M3.
Nothing. No. Only goodness. When nanobind imports a DLPack-capable object, it first tries to call the object's In the case of a versioned capsule, a flag bit can be set to indicate that the tensor is read-only. Nanobind honors this and creates a read-only nd-array. On export, it depends on the framework.
Tensorflow is unchanged. An unversioned capsule is passed to
NumPy is unchanged. It first makes a new
PyTorch, JAX, and CuPy: nanobind creates a new |
Beautiful, thank you for this clarification. I guess there could be a performance cost when we try to import a tensor from an older framework that doesn't support versioned capsules (due to calling dlpack multiple times), correct? But I suppose the impact of that should diminish over time. |
One more potential optimization opportunity. Do you think that it would be possible to use the static object table to reduce all of these costly API calls and string comparisons to a few pointer comparisons? (this is from the function that checks if an object is an ndarray). PyObject *name = nb_type_name((PyObject *) tp);
check(name, "Could not obtain type name! (1)");
const char *tp_name = PyUnicode_AsUTF8AndSize(name, nullptr);
check(tp_name, "Could not obtain type name! (2)");
bool result =
// PyTorch
strcmp(tp_name, "torch.Tensor") == 0 ||
// XLA
strcmp(tp_name, "jaxlib.xla_extension.ArrayImpl") == 0 ||
// Tensorflow
strcmp(tp_name, "tensorflow.python.framework.ops.EagerTensor") == 0 ||
// Cupy
strcmp(tp_name, "cupy.ndarray") == 0; |
Yes, if
Yes.
I don't think it would help. The problem is that the pointer comparison
This short-circuiting is good, since the pointer comparison is cheap and should be expected to succeed, because keyword argument names used across API boundaries ought to be interned by both sides (in order to support this optimization). [but see footnote 1] Now, consider If the result will be false, then the pointer compare will be false, and we'll have to do either The frameworks should implement
will be fast. [footnote 1] The current (and past) release of NumPy does not intern |
My assumption was that the python type construction will intern type and module names so that pointer equality is legal. |
That doesn't seem to be the case. Using Python 3.11.2 and adding the following to
I get
when running
|
This commit adds support for the struct ``DLManagedTensorVersioned`` as defined by DLPack version 1. It also adds the ndarray framework ``nb::array_api``, which returns an object that provides the buffer interface and provides the two DLPack methods ``__dlpack__()`` and ``__dlpack_device__()``.
This PR adds support for DLPack version 1 and adds the ndarray framework
nb::arrayapi
, which returns an object that provides the buffer interface and has the two DLPack methods__dlpack__()
and__dlpack_device__()
.Given the following:
I measure performance as follows:
using Python 3.14 and