Skip to content

Conversation

honzasp
Copy link

@honzasp honzasp commented Jul 8, 2025

In the PyCapsule C API, the caller has to specify the correct name of the PyCapsule to get the pointer stored in the capsule. This is intended as a safety check: casting void* to an arbitrary pointer is inherently quite dangerous and the capsule name provides the only (albeit imperfect) way to check that the pointer has the expected type.

However, the Rust API exposed by pyo3 didn't require the capsule name to get the pointer or reference stored in the capsule:

trait PyCapsuleMethods {
    fn pointer(&self) -> *mut c_void;
    unsafe fn reference<T>(&self) -> &'py T;
    fn is_valid(&self) -> bool;
}

This PR changes these methods to be more in line with the C API and thus harder to misuse:

trait PyCapsuleMethods {
    fn pointer(&self, name: Option<&CStr>) -> PyResult<NonNull<c_void>>;
    unsafe fn reference<T>(&self, name: Option<&CStr>) -> PyResult<&'py T>;
    fn is_valid(&self, name: Option<&CStr>) -> bool;
}

This is a breaking change, but I believe that the benefit of increased safety is worth the cost of API breakage.

@honzasp honzasp force-pushed the pycapsule branch 2 times, most recently from 3255558 to 774676a Compare July 8, 2025 10:52
@honzasp honzasp changed the title Align the PyCapsule API with the original C API Align the PyCapsule API with the original C API, making it safer Jul 8, 2025
@honzasp honzasp force-pushed the pycapsule branch 5 times, most recently from 4b5a69c to 48fad84 Compare July 8, 2025 12:58
@honzasp
Copy link
Author

honzasp commented Jul 21, 2025

Hi @davidhewitt, I apologize for bugging you, does this PR have any chance of being merged? I'm happy to make changes to address any concern with the changes.

@davidhewitt
Copy link
Member

Sorry for the delayed review here.

I am somewhat neutral with this PR, I understand the motivation for it and was last discussed in #2485. I noted that pybind11 made the same choice not to validate the capsule name: #2485 (comment)

So it seemed fine for us to make the same choice as pybind11, even if potentially not correct it was justifiable, and avoided a change of behaviour.

The original design for the capsule C API seemed to come from https://bugs.python.org/issue5630. See also https://discuss.python.org/t/rationale-for-changing-a-capsules-name-or-destructor/48504 where there is the discussion about the fact that a capsule's name could be changed after creation, and what it may or may not mean for safety.

One major plus compared to when #2485 was introduced is that c"" literals now make it easy to pass &CStr arguments for the name. So it is a lot more workable to pass a name statically than it used to be.

I would be open to hearing other maintainers' views on this PR.

If we choose to move forward with this, we should introduce new methods and deprecate the existing ones, as I think the existing behaviour is not sufficiently wrong that we need to break all users immediately upon release.

@Icxolu
Copy link
Contributor

Icxolu commented Jul 25, 2025

I think I'm on similar stance here. While not strictly opposed to this change I'm also not really convinced that it significatly improves safety. I think we already are a bit safer because these methods are on a statically typed Bound<PyCapsule> object, so a few failure cases are already prevented by that. The name is already checked at construction time, so I'm not sure if it's worth checking each call. (I guess it's also possible to downcast to a capsule, but don't think there is a sane way to use it for anything, because it unknown whats inside) If there are concers about which capsule is used, that feels to me like its always a bug and as such would be better served by an assertion in that code (as something went terribly wrong) rather than an error (I'm not sure what a reasonable way to recover would look like here)

@honzasp
Copy link
Author

honzasp commented Jul 28, 2025

Thank you for your responses and for the additional context!

The name is already checked at construction time, so I'm not sure if it's worth checking each call.

The motivation for checking the name comes when you receive the PyCapsule from code that is not under your control, not when you construct the capsule yourself. In this case the name of the capsule is the only check that you can do to verify that the opaque pointer in the capsule has the expected type.

If the API requires the programmer to provide the name when getting the pointer, it is a strong nudge to check the name of the capsule before working with the pointer. I think that in Rust, it's better to preserve the nudge, rather than going to some lengths to remove it.


The particular use case that I had in mind when opening this PR was the Arrow PyCapsule interface, which specifies a PyCapsule name for each object in the Arrow C Interface. For example, the C interface type ArrowArrayStream is stored in a PyCapsule with name "arrow_array_stream". This interface also defines methods that Python objects can implement to be exportable into the C interface; for example, a Python object can implement method __arrow_c_stream__() which should return a PyCapsule that contains ArrowArrayStream.

In practice, the code can look like this:

// get the PyCapsule from a Python object that is exportable to an Arrow stream
let py_stream: &Bound<'py, PyAny> = ...;
let stream_capsule = py_stream.call_method0(intern!(py, "__arrow_c_stream__"))?
    .downcast_into::<PyCapsule>()?;

// get the pointer from the capsule, checking that the capsule has the expected name.
// note that with the changes proposed in this PR, the programmer is nudged to do the check
let stream_ptr = stream_capsule.pointer(Some(c"arrow_array_stream"))?;
let stream_ptr = stream_ptr.as_ptr() as *mut arrow::ffi_stream::FFI_ArrowArrayStream;

I did some sleuthing to find out why pybind11 also bypasses the safety nudge:

  1. In the very first pybind11 commit, the C++ capsule implemented a cast operator (!) into an arbitrary pointer, which passed NULL as the name to PyCapsule_GetPointer, so it only worked with unnamed capsules.
  2. Then a PR added an option to create a named capsule, and it also bypassed the name check, so the implicit cast operator now returned a pointer regardless of the capsule name. There doesn't seem to be any discussion of this behavior in the PR, this approach appears to have been chosen because it was the smallest change at the moment.
  3. Later another PR added a method get_pointer() with the same behavior as the cast.

So it seems that pybind ended up with bypassing the safety check not by a deliberate choice, but because it was the path of least resistance. I believe that in PyO3, the best lesson that we can learn from pybind11 in this regard is to reintroduce the satefy nudge before it's too late :)

@Icxolu
Copy link
Contributor

Icxolu commented Jul 28, 2025

Thanks for the example. I have a much better idea about how this can be used now. I think in my head I only had the very simple examples about importing a known capsule, not that you could obtain them dynamically like this and still do something useful with them. In this scenario the api makes a lot more sense and an incorrect name should also just be a regular error and not an assertion like I was thinking previously.

With this I think this gets a 👍 from me.


Alternatively I have this (maybe slightly crazy) idea of using type state for this, but I have no idea if it would be worth it (or even possible within pyo3). Never the less I might as well write it down 🙃

We could have something like PyCapsule<Unverified> and PyCapsule<Verified>, where only the unverified one can be extracted/downcasted. It could then be turned into the verified one by checking the name and the verified one would give access to the pointer. Importing or creating would then directly give you a verified one.

@honzasp
Copy link
Author

honzasp commented Aug 1, 2025

Hmm, I don't think that using a typestate for the capsule would be very useful for most use cases:

  1. When you are given a PyCapsule, you typically check the name and get the pointer just once. With the pointer in hand, there is little reason to continue using the capsule instead of using the pointer directly.
  2. The name of the PyCapsule is not immutable and can change together with the pointer, so defensive code should check the capsule's name every time it extracts the pointer.
  3. If the typestate is just binary Unverified/Verified, it does not carry information about what was verified: is the capsule an "arrow_array", "arrow_schema" or "arrow_array_stream"? We could use the expected pointer type as the typestate, but in that case we can just as well work with the pointer itself (point 1).

Copy link

codspeed-hq bot commented Sep 26, 2025

CodSpeed Performance Report

Merging #5229 will not alter performance

Comparing KeyrockEU:pycapsule (0e81a79) with main (6a6ed99)

🎉 Hooray! codspeed-rust just leveled up to 3.0.5!

A heads-up, this is a breaking change and it might affect your current performance baseline a bit. But here's the exciting part - it's packed with new, cool features and promises improved result stability 🥳!
Curious about what's new? Visit our releases page to delve into all the awesome details about this new version.

Summary

✅ 98 untouched

@honzasp
Copy link
Author

honzasp commented Sep 26, 2025

@davidhewitt @Icxolu I reworked the PR to add new methods (suffixed with _checked) and deprecate the existing methods without removing them. (I marked the deprecations as since = "0.27.0", not sure if that's correct?)

Copy link
Member

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I think the API proposed here is the right one. Due to the potential soundness issue identified I'm going to push to this PR to adjust the reference API, merge it, and get on with preparing a new release. Please forgive me for taking over.

Probably when we get to 0.29 and remove the deprecated methods I will rename the _checked methods to drop the suffix, as I think it'll be good enough without it and read better.

unsafe { &*self.pointer().cast() }
}

unsafe fn reference_checked<T>(&self, name: Option<&CStr>) -> PyResult<&'py T> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Staring at this, it looks like a potential use-after-free due to lifetime extension, the 'py lifetime can potentially be too long.

e.g.

Python::attach(|py| {
    let data = PyCapsule::new("foobar".to_string());
    let string_ref: &String = unsafe { data.reference() };
    drop(data);
    // use `string_ref` here for UAF
});

Looks like that is good justification to land this API and probably backport it to 0.26.

We at least get the slight "relief" that .reference() is already an unsafe fn with an easily broken invariant on the Python side (user could provide any old data and even modify it while this reference is up), but it's still not great.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #5474

Copy link
Author

@honzasp honzasp Sep 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it will be best to keep only the pointer_checked() API, and let the user dereference the pointer as they see fit? The reference()/reference_checked() method does not provide much in terms of ergonomics (literally saves the user a single *) and makes strong but quite implicit safety assumptions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering about similar, I am very tempted to remove it. If users complain during the deprecation period, we can always reconsider later.

use std::ffi::{CStr, CString};
use std::ptr::{self, NonNull};

/// Represents a Python Capsule
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably add some information here about the name checking, I will do that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants