Skip to content

Conversation

@emmatyping
Copy link
Owner

@emmatyping emmatyping commented Nov 14, 2025

This PR introduces a plausible approach for introducing Rust as an optional build dependency for CPython. A few changes are necessary to accomplish this:

  • Configure detects cargo
  • maksetup is updated to handle Rust extension modules modules
  • A Cargo workspace is introduced for all CPython modules.
  • Bindings of the C API via bindgen are generated in Modules/cpython-sys. This isn't super well integrated into the Makefile right now. It requires running make regen-rust-wrapper-h to generate the wrapper first, then you can run make normally.
  • An example extension module _base64 is added, which has a single function standard_b64encode

There are also a lot of other caveats and limitations. I tested this with gcc and clang on a Linux system, with a GIL-ful build. There's no Mac or Windows support yet, or support for other platforms, but I believe Mac WASI Android and iOS may not be too hard since they share the same build system as Linux..

Windows will require a different approach. I'm not sure yet on how exactly we should do that.

Rough series of commands to get things working:
Install Rust and make sure cargo is available.

./configure --with-rust-base64
make regen-rust-wrapper-h
make -j$(nproc)

@emmatyping emmatyping changed the title MVP of Rust in CPython Proof of Concept of Rust in CPython Nov 17, 2025
emmatyping and others added 6 commits November 16, 2025 23:59
This commit updates the build system to automatically detect cargo and
enable/disable _base64 without needing to pass a flag. If cargo is unavailable, _base64 is disabled.

It also updates cpython-sys to use a hand written header (which is what
Linux seems to do) and splits off the parser bindings to be handled in
the future (since the files are included differently).
There are still some issues with compilation, but those can be sorted out in a future PR.
Also make PyObject an UnsafeCell<ffi::_object> so it can be passed around by & reference
Copy link

@alex alex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few thoughts. I'm very excited!

}

let input_len = view_len as usize;
let input = unsafe { slice::from_raw_parts(buffer.as_ptr(), input_len) };
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately this is technically unsound. See https://alexgaynor.net/2022/oct/23/buffers-on-the-edge/.

In pyca/cryptography we just accept the issue (https://github.com/pyca/cryptography/blob/main/src/rust/src/buf.rs#L114-L122), but I think its worth a comment.

Copy link
Owner Author

@emmatyping emmatyping Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose to be fully safe we'd need to copy the input buffer if the reference count was >1?

Even that probably isn't enough, since new references could be made concurrently. Yeah I guess we should just add a comment here.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://docs.rs/pyo3/latest/pyo3/buffer/struct.PyBuffer.html#method.as_slice is how we model it in pyo3 -- we use a type that does atomic reads.

Like I said, in pyca/cryptography we just accept the issue. (Someone somewhere along the line expressed interest in a PEP to improve the bufffer interface for this, but it never happened.)

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think this is reinforcing to me that we should focus on safe stable API abstractions first then build on those. I'll probably start working on those in another branch to have something to propose.

}
return Err(());
}
let dest = unsafe { slice::from_raw_parts_mut(dest_ptr.cast::<u8>(), output_len) };
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is sound -- PyBytes_FromStringAndSize isn't guaranteed to initialize the values, and &[u8] is required to be initialized bytes.

We need to either use MaybeUninit or to initialize the values.

edition = "2024"

[dependencies]
cpython-sys ={ path = "../cpython-sys" }
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should specify this as a workspace dep for simplicity.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed!

@@ -0,0 +1,2 @@
[toolchain]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to specify a default toolchain?

What we do for pyca/cryptography is specify an MSRV and then people can bring whatever version of rust they link. (And in CI we test on MSRV, possible-next-MSRV, stable, beta, and nightly.)

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we should just specify an MSRV, this should probably be removed now. Thanks for the review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants