Skip to content

Conversation

@robsdedude
Copy link
Contributor

In free-threaded Python, to_vec needs to make sure to run inside a critical section so that no other Python thread is mutating the bytearray causing UB.

See also #4736

Unfortunately it seems I can't write proper tests for this as Python 3.13t is not yet part of the test matrix. I'm aware that support for testing with 3.13 and 3.13t is still in it's early stages and for instance virtualenv does not yet support it.

In free-threaded Python, to_vec needs to make sure to run inside a critical
section so that no other Python thread is mutating the bytearray causing UB.

See also PyO3#4736
@robsdedude robsdedude changed the title Towards soundness of PyByteArrayMethods::to_vec Towards soundness of PyByteArray::to_vec Nov 29, 2024
@robsdedude robsdedude marked this pull request as ready for review November 29, 2024 09:28
Copy link
Member

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! We actually do have tests running for the free-threaded build, I would have been unhappy to declare support running without them! Similarly I have had virtualenv working just fine with 3.13t (haven't tried windows, though).

I think we could write a test which spawns a thread which does something to attempt to invalidate the data (maybe write to it using py.run or PySequenceMethods::set_slice) and confirm that the data read is the original data inserted, not the conflicting data (which should hopefully now block on either the GIL or the critical section depending on the build).

Co-authored-by: David Hewitt <[email protected]>
@davidhewitt davidhewitt mentioned this pull request Nov 29, 2024
@robsdedude
Copy link
Contributor Author

robsdedude commented Nov 29, 2024

@davidhewitt I tried to write a test runing bytearray.extend in one thread while reading the bytearray with to_vec() in another thread and found that I was able to read inconsistent (more precisely partially uninitialized memory) regardless whether the critical section change was in place or not. Digging deeper, I'm not surprised. If you look the the C implementation of bytearray, you'll see that no critical section is used throughout the whole file. All the memcpy and memmove calls are unprotected 😕

Not sure where to go from here.

However, no matter how hard I tried, I couldn't get it to segfault. So maybe there's something more to it that I'm not aware of.

@davidhewitt
Copy link
Member

I think that's not a suprise that it's hard to segfault; you'd have to do something like turn the uninitialized read into a cast on the bytes to create a structure in an invalid state.

Nevertheless, invalid reads alone are a clear security issue. This problem clearly gets a lot worse in freethreaded Python. My knee jerk reaction is to make all bytearray methods in PyO3 unsafe.

cc @ngoldbaum @colesbury is there any upstream opinion on how to handle bytearray objects on the free threaded build?

@ngoldbaum
Copy link
Contributor

I can't find any discussion about bytearray and free-threading in the CPython issue tracker, you may want to file an issue, especially if you can make a pure-python reproducer using the threading module. There are still lots of thread safety issues in CPython itself and we should make sure they all get tracked as we run into them.

@robsdedude
Copy link
Contributor Author

python/cpython#127472

@davidhewitt
Copy link
Member

Thanks for that. I'm a bit unsure what the way forward here is. Without upstream also using critical sections, as you observe, adding the single section here seems a bit moot. I think we cannot change our API in a patch release so I think the likely path at the moment is that we make all the methods unsafe in PyO3 0.24?

@alex
Copy link
Contributor

alex commented Dec 3, 2024 via email

@ngoldbaum
Copy link
Contributor

Hopefully a future Python release will fix the thread safety issues you identified and we can at least make the free-threaded build have similar guarantees compared with the GIL-enabled build.

@robsdedude
Copy link
Contributor Author

robsdedude commented Dec 4, 2024

so I think the likely path at the moment is that we make all the methods unsafe in PyO3 0.24?

I can see arguments for and against that. I'm slightly gravitating towards not doing it though. The way I see it is that PyO3's memory safety stands and falls with the soundness of the linked Python implementation. You have to assume that it's sound. If you don't, every PyO3 API would be unsafe (which I guess is Rust's standpoint, as every FFI call is unsafe). So in this particular case to_vec were safe if CPython were sound (by using critical sections around bytearray operations). In that sense PyO3 is as safe here as using pure Python is and maybe that should be the criteria whether you mark an API safe/unsafe in PyO3.

Just my 2 cents and you're much deeper into the world of this wonderful crate so ofc. it's up to you to decide 😇

adding the single section here seems a bit moot

I guess it is 🫤 Feel free to close the PR ⚰️

@robsdedude
Copy link
Contributor Author

It seems a fix in CPython is being worked on 👀
python/cpython#129108

If that get's merged, I'll pick up this PR again.

@ngoldbaum
Copy link
Contributor

bytearray is now thread-safe, but the fixes are going to have to wait until 3.14 I think. There are a number of thread-safety fixes that only live in the main branch of CPython.

Once 3.14b1 comes out, we'll merge the open PR adding 3.14 support to PyO3 and then I think we can move this forward. We'd probably also need to do a little thinking about how to handle 3.13 having known thread safety issues, if anything.

@robsdedude
Copy link
Contributor Author

but the fixes are going to have to wait until 3.14 I think

You are right, guess I was a bit too eager 😅 unless the fixes get backported to 3.13

We'd probably also need to do a little thinking about how to handle 3.13 having known thread safety issues, if anything.

Not sure there's much to be done. For one, 3.13t is an experimental CPython build so users can't expect a fully working and stable piece of software b) PyO3 is just as memory unsafe here as plain Python, so one could argue "PyO3 is not making it any worse, so it's good enough from our perspective". However, a note in the API docs around this might be appropriate.

@davidhewitt
Copy link
Member

With 3.14 now well into the beta phase, I think we should proceed with merging this. @robsdedude do you have any remnants of that test you previously wrote?

@davidhewitt davidhewitt mentioned this pull request Jul 19, 2025
@robsdedude
Copy link
Contributor Author

I had a quick look and it seems I do indeed have some things left on my machine. Let me see if I can polish it up and push it.

@robsdedude
Copy link
Contributor Author

robsdedude commented Jul 20, 2025

While doing so, I realized that it's incredibly hard to provoke the data race in to_vec. Here's what would need to happen (and I had it happen once in running the test for many minutes):

  • Thread t1 manipulates a bytearray b. I opted for b.extend(b)
  • Thread t2 calls b.to_vec such that
    • extend already increased the size, but didn't finish initializing the data
    • therefore, unsafe { self.as_bytes() }.to_vec() starts to copy that now bigger byte slice
    • however, to_vec will likely (just like extend) process the data sequentially. So for to_vec to read inconsistent state it would need to catch up how far extend came with initializing the state.

Here's an alternative suggestion that will be easier to test:
Instead of testing the full to_vec implementation, how about the test just tests the assumption that a critical section is sufficient for soundness. So instead, the test would start a critical section and assert that b.as_bytes() gives a consistent view on the data.

Copy link
Member

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this looks great to me. Just two small refinements, then let's merge.

let mut handles = [&mut handle1, &mut handle2, &mut handle3];

let t0 = std::time::Instant::now();
while t0.elapsed() < Duration::from_secs(10) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be helpful to reduce this to 0.5 secs (as long as that's still sufficient iterations), else this week have a noticeable impact on CI times.

Copy link
Contributor Author

@robsdedude robsdedude Jul 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is good point. The problem with race conditions is that they're inherently non-deterministic. So there is no definitive number of "sufficient iterations". This is 100% a trade-off between CI time and how likely you want the test to be catching the issue. Further, reducing this number to the smallest possible value on my machine such that it still satisfy my feeling of sufficiently accurate is absolutely no guarantee that this number will be a good fit for other machines (e.g., the CI runners).

That being said, on my machine it usually takes 1-2 seconds for it to fail against CPython 3.13t. But that number is ofc. also depending on how fast a machine can actually run the iterations. So maybe I should just put a cap on the number of iterations an call it a day. Again, on my machine, each iterations amounts to just shy of 0.2 seconds.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So after a fair bit of fiddling, I saw on pass that took 17 rounds for the bug to occur. So I pinned the max number of iterations to 25 (adding some margin of error). This, however, still amounts to roughly 5 seconds. 🤷 I don't think this can be reasonably reduced without accepting that the test might not actually test what it sets out to test :/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it reproduce with smaller test data? 200 MiB seems like a lot.

Copy link
Contributor Author

@robsdedude robsdedude Jul 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same tradeoff. The smaller the size, the faster the extend, the lower the chance of intercepting it and catching the race condition. I'll try to see if I can further lower the size without lowering the accuracy too much. Again: all of it will be tailored to my specific machine. I have no idea how well the numbers I'll end with will generalize.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as there's enough iterations that eventually a CI run would trigger it (doesn't need to be every CI run) then I'm happy. If the chance that CI running all month would have zero chance of triggering, then I'd think it wasn't enough :)

Copy link
Member

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for following up, I think this is good to merge now 👍

@davidhewitt davidhewitt added this pull request to the merge queue Jul 23, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jul 23, 2025
}

// CPython 3.13t is unsound => test fails
#[cfg(any(Py_3_14, not(all(Py_3_13, Py_GIL_DISABLED))))]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also needs to be skipped on emscripten

this failure mode happens often enough and wastes human time - it's probably worth adding a linter checking that all tests using threads are disabled on platforms that don't support spawning threads.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's probably worth adding a linter

Do you mean upstream in clippy or something else?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure - can you write custom clippy lints like that? If you can that's probably the easiest way. You might also be able to do it with a sufficiently advanced regex...

Also just to be clear my comment above wasn't an ask for you to do that, if it came across that way 😄

Copy link
Member

@davidhewitt davidhewitt Jul 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can write custom lints, so regex might be the most feasible option albeit painful. I might just pretend the lint is harder to write than just losing the time repeatedly for now 🙈

@ngoldbaum ngoldbaum added this pull request to the merge queue Jul 24, 2025
Merged via the queue into PyO3:main with commit aa0fa4c Jul 24, 2025
43 of 46 checks passed
@robsdedude robsdedude deleted the byte-array-soundness branch July 24, 2025 18:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants