-
Notifications
You must be signed in to change notification settings - Fork 914
Description
Suppose we have a Rust API which takes a callback function:
struct Foo {}
fn do_thing(callback: impl FnOnce(&mut Foo) -> Result<()>) -> Result<()> {
let mut foo = get_foo();
let result = callback(&mut foo)?;
// cleanup
result
}Because of the &mut Foo reference (and similar for &Foo immutable references), this currently doesn't map well to PyO3 with a Python callback. The main way to do this in the PyO3 API would be to wrap the Foo callback argument into an owned value, something like this:
#[pyclass]
struct PyFoo(Foo);
#[pyfunction]
fn do_thing_python(callback: Bound<'_, PyAny>) -> PyResult<()> {
let mut foo = PyFoo(get_foo());
let result = callback.call1((foo,))?;
// cleanup
result
}This is sort of fine in the example above, but completely falls apart if Foo cannot be owned. Existing workarounds include using struct PyFoo(Arc<Mutex<Foo>>) and allocating & locking. This can also be even more complex if Foo contains any lifetimes.
Original credit to huggingface/tokenizers#1890, there is a pattern which I'd like to explore here within PyO3. The idea that the huggingface team have come up with is essentially that:
- the Rust reference
&mut Foois erased into*mut Foo, which is stored in thePyFoo(underneath some guarding abstraction) - the guarding abstraction only allows conversion of the
Foopointer back into&mut Foowhile the callback is running - once the callback is exited, the
PyFooobject can continue to exist, but attempts to access it will fail gracefully
I've wanted this sort of pattern downstream in pydantic too (which has plenty of Python callbacks). I suspect this pops up fairly frequently across the ecosystem. I think there's value in attempting to make a well-tested implementation of this pattern in PyO3 so that it can be re-used across the ecosystem.
I think the broad features of it are something like this:
&Treferences can be implemented using something like aRwLock<*const T>which allows any number of readers, as soon as the callback exits a write is done to null out the pointer and readers fail gracefully after that.&mut Treferences can be implemented using similar, maybeRwLock<*mut T>, still allowing any number of readers or ONE writer (as per Rust rules of aliasing). When the callback exits, again the pointer is nulled out with a write and future readers & writers will fail gracefully.
The hardest bit I can foresee is what to do if T is not 'static, i.e. itself contains references. I think if T<'a> is covariant on the lifetime, it's probably possible to safely handle &T<'a> references.
I think &mut T<'a> may not be possible, because &mut T is invariant in T.
... exploration needed. I'm quite keen to have a go unless someone else is keen to run with this.