Skip to content

Improve documentation for PyCapsule #5820

@tliron

Description

@tliron

I humbly suggest adding something like this to the PyCapsule documentation. Or, indeed it might be worth an entry in the PyO3 user guide (which currently does not have a chapter on capsules):

A common use case is creating a capsule in one Python extension and reading it in another. This has caveats beyond the obvious need for both extensions to use the same memory structure for the value.

Specifically you also need to make sure that your type's functionality does not rely on Rust's global state (i.e. static or const variables) because that state will exist independently for each dynamically loaded Python extension. Such a dependency is neither undefined behavior nor an error and yet it can lead to subtle behavioral bugs.

Instead of relying on Rust's static variables you can use Python's global state, which indeed will be truly global for all loaded extensions. For example, global capsules can be stored as attributes in the "builtins" module and then loaded with [PyCapsule::import].

Does this seem obvious to you? Great. :) But let me tell you about how I spent a few days debugging an issue before coming to the a-ha moment.

In a capsule I am passing a data structure that includes a hashmap implementation (behind an Arc). Everything worked fine until I started experimenting with alternative BuildHasher implementations. That's when things became weird. Switching to aHash was fine, but both rapidhash and foldhash broke the behavior of the hashmap in some uses. After some debugging I realized that the latter two hashers were somehow generating the "wrong" hashes. Or, rather, the "providing" Python library had different results from the "receiving" Python library.

But ... why would this be the case? I dove into the mechanics of capsules and even experimented with Pin. Were these two hashes somehow sensitive to being cast into and out of pointers? But why would that happen? What made them special?

Until it dawned at me that the Python extension use case involves multiple global states. That's what makes these two hashers different: They rely on it internally, and each dynamically loaded instance ends up using a different secrets table. aHash's implementation is self-contained, so there are no issues.

Until I delved into their source code I would not have known this. The conclusion is that capsules should be used naïvely, especially when relying on 3rd-party code. It's not so obvious after all, right? ;)

(You have permission to use my documentation text verbatim if interested.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions