Thread-safe wrapping of Rust references inside Python callbacks

Suppose we have a Rust API which takes a callback function:

```rust
struct Foo {}

fn do_thing(callback: impl FnOnce(&mut Foo) -> Result<()>) -> Result<()> {
    let mut foo = get_foo();
    let result = callback(&mut foo)?;
    // cleanup
    result
}
```

Because of the `&mut Foo` reference (and similar for `&Foo` immutable references), this currently doesn't map well to PyO3 with a Python callback. The main way to do this in the PyO3 API would be to wrap the `Foo` callback argument into an owned value, something like this:

```rust
#[pyclass]
struct PyFoo(Foo);

#[pyfunction]
fn do_thing_python(callback: Bound<'_, PyAny>) -> PyResult<()> {
    let mut foo = PyFoo(get_foo());
    let result = callback.call1((foo,))?;
    // cleanup
    result
}
```

This is sort of fine in the example above, but completely falls apart if `Foo` cannot be owned. Existing workarounds include using `struct PyFoo(Arc<Mutex<Foo>>)` and allocating & locking. This can also be even more complex if `Foo` contains any lifetimes.

Original credit to https://github.com/huggingface/tokenizers/issues/1890, there is a pattern which I'd like to explore here within PyO3. The idea that the huggingface team have come up with is essentially that:
- the Rust reference `&mut Foo` is erased into `*mut Foo`, which is stored in the `PyFoo` (underneath some guarding abstraction)
- the guarding abstraction only allows conversion of the `Foo` pointer back into `&mut Foo` while the callback is running
- once the callback is exited, the `PyFoo` object can continue to exist, but attempts to access it will fail gracefully

I've wanted this sort of pattern downstream in `pydantic` too (which has plenty of Python callbacks). I suspect this pops up fairly frequently across the ecosystem. I think there's value in attempting to make a well-tested implementation of this pattern in PyO3 so that it can be re-used across the ecosystem.

I think the broad features of it are something like this:
- `&T` references can be implemented using something like a `RwLock<*const T>` which allows any number of readers, as soon as the callback exits a write is done to null out the pointer and readers fail gracefully after that.
- `&mut T` references can be implemented using similar, maybe `RwLock<*mut T>`, still allowing any number of readers or ONE writer (as per Rust rules of aliasing). When the callback exits, again the pointer is nulled out with a write and future readers & writers will fail gracefully.

The hardest bit I can foresee is what to do if `T` is not `'static`, i.e. itself contains references. I think if `T<'a>` is covariant on the lifetime, it's probably possible to safely handle `&T<'a>` references.

I think `&mut T<'a>` may not be possible, because `&mut T` is invariant in T.

... exploration needed. I'm quite keen to have a go unless someone else is keen to run with this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Thread-safe wrapping of Rust references inside Python callbacks #5664

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Thread-safe wrapping of Rust references inside Python callbacks #5664

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions