-
Notifications
You must be signed in to change notification settings - Fork 888
Towards soundness of PyByteArray::to_vec #4742
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
In free-threaded Python, to_vec needs to make sure to run inside a critical section so that no other Python thread is mutating the bytearray causing UB. See also PyO3#4736
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! We actually do have tests running for the free-threaded build, I would have been unhappy to declare support running without them! Similarly I have had virtualenv working just fine with 3.13t (haven't tried windows, though).
I think we could write a test which spawns a thread which does something to attempt to invalidate the data (maybe write to it using py.run or PySequenceMethods::set_slice) and confirm that the data read is the original data inserted, not the conflicting data (which should hopefully now block on either the GIL or the critical section depending on the build).
Co-authored-by: David Hewitt <[email protected]>
|
@davidhewitt I tried to write a test runing Not sure where to go from here. However, no matter how hard I tried, I couldn't get it to segfault. So maybe there's something more to it that I'm not aware of. |
|
I think that's not a suprise that it's hard to segfault; you'd have to do something like turn the uninitialized read into a cast on the bytes to create a structure in an invalid state. Nevertheless, invalid reads alone are a clear security issue. This problem clearly gets a lot worse in freethreaded Python. My knee jerk reaction is to make all bytearray methods in PyO3 unsafe. cc @ngoldbaum @colesbury is there any upstream opinion on how to handle bytearray objects on the free threaded build? |
|
I can't find any discussion about bytearray and free-threading in the CPython issue tracker, you may want to file an issue, especially if you can make a pure-python reproducer using the |
|
Thanks for that. I'm a bit unsure what the way forward here is. Without upstream also using critical sections, as you observe, adding the single section here seems a bit moot. I think we cannot change our API in a patch release so I think the likely path at the moment is that we make all the methods |
|
Seems right
…On Tue, Dec 3, 2024, 9:54 AM David Hewitt ***@***.***> wrote:
Thanks for that. I'm a bit unsure what the way forward here is. Without
upstream also using critical sections, as you observe, adding the single
section here seems a bit moot. I think we cannot change our API in a patch
release so I think the likely path at the moment is that we make all the
methods unsafe in PyO3 0.24?
—
Reply to this email directly, view it on GitHub
<#4742 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAAGBEGMF7BEHP27MJKJMD2DXAZJAVCNFSM6AAAAABSWTQ2IKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKMJUG44TKMBRHE>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
|
Hopefully a future Python release will fix the thread safety issues you identified and we can at least make the free-threaded build have similar guarantees compared with the GIL-enabled build. |
I can see arguments for and against that. I'm slightly gravitating towards not doing it though. The way I see it is that PyO3's memory safety stands and falls with the soundness of the linked Python implementation. You have to assume that it's sound. If you don't, every PyO3 API would be unsafe (which I guess is Rust's standpoint, as every FFI call is unsafe). So in this particular case Just my 2 cents and you're much deeper into the world of this wonderful crate so ofc. it's up to you to decide 😇
I guess it is 🫤 Feel free to close the PR ⚰️ |
|
It seems a fix in CPython is being worked on 👀 If that get's merged, I'll pick up this PR again. |
|
bytearray is now thread-safe, but the fixes are going to have to wait until 3.14 I think. There are a number of thread-safety fixes that only live in the Once 3.14b1 comes out, we'll merge the open PR adding 3.14 support to PyO3 and then I think we can move this forward. We'd probably also need to do a little thinking about how to handle 3.13 having known thread safety issues, if anything. |
You are right, guess I was a bit too eager 😅 unless the fixes get backported to 3.13
Not sure there's much to be done. For one, 3.13t is an experimental CPython build so users can't expect a fully working and stable piece of software b) PyO3 is just as memory unsafe here as plain Python, so one could argue "PyO3 is not making it any worse, so it's good enough from our perspective". However, a note in the API docs around this might be appropriate. |
|
With 3.14 now well into the beta phase, I think we should proceed with merging this. @robsdedude do you have any remnants of that test you previously wrote? |
|
I had a quick look and it seems I do indeed have some things left on my machine. Let me see if I can polish it up and push it. |
|
While doing so, I realized that it's incredibly hard to provoke the data race in
Here's an alternative suggestion that will be easier to test: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this looks great to me. Just two small refinements, then let's merge.
src/types/bytearray.rs
Outdated
| let mut handles = [&mut handle1, &mut handle2, &mut handle3]; | ||
|
|
||
| let t0 = std::time::Instant::now(); | ||
| while t0.elapsed() < Duration::from_secs(10) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be helpful to reduce this to 0.5 secs (as long as that's still sufficient iterations), else this week have a noticeable impact on CI times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is good point. The problem with race conditions is that they're inherently non-deterministic. So there is no definitive number of "sufficient iterations". This is 100% a trade-off between CI time and how likely you want the test to be catching the issue. Further, reducing this number to the smallest possible value on my machine such that it still satisfy my feeling of sufficiently accurate is absolutely no guarantee that this number will be a good fit for other machines (e.g., the CI runners).
That being said, on my machine it usually takes 1-2 seconds for it to fail against CPython 3.13t. But that number is ofc. also depending on how fast a machine can actually run the iterations. So maybe I should just put a cap on the number of iterations an call it a day. Again, on my machine, each iterations amounts to just shy of 0.2 seconds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So after a fair bit of fiddling, I saw on pass that took 17 rounds for the bug to occur. So I pinned the max number of iterations to 25 (adding some margin of error). This, however, still amounts to roughly 5 seconds. 🤷 I don't think this can be reasonably reduced without accepting that the test might not actually test what it sets out to test :/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it reproduce with smaller test data? 200 MiB seems like a lot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same tradeoff. The smaller the size, the faster the extend, the lower the chance of intercepting it and catching the race condition. I'll try to see if I can further lower the size without lowering the accuracy too much. Again: all of it will be tailored to my specific machine. I have no idea how well the numbers I'll end with will generalize.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As long as there's enough iterations that eventually a CI run would trigger it (doesn't need to be every CI run) then I'm happy. If the chance that CI running all month would have zero chance of triggering, then I'd think it wasn't enough :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for following up, I think this is good to merge now 👍
src/types/bytearray.rs
Outdated
| } | ||
|
|
||
| // CPython 3.13t is unsound => test fails | ||
| #[cfg(any(Py_3_14, not(all(Py_3_13, Py_GIL_DISABLED))))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also needs to be skipped on emscripten
this failure mode happens often enough and wastes human time - it's probably worth adding a linter checking that all tests using threads are disabled on platforms that don't support spawning threads.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's probably worth adding a linter
Do you mean upstream in clippy or something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure - can you write custom clippy lints like that? If you can that's probably the easiest way. You might also be able to do it with a sufficiently advanced regex...
Also just to be clear my comment above wasn't an ask for you to do that, if it came across that way 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we can write custom lints, so regex might be the most feasible option albeit painful. I might just pretend the lint is harder to write than just losing the time repeatedly for now 🙈
In free-threaded Python, to_vec needs to make sure to run inside a critical section so that no other Python thread is mutating the bytearray causing UB.
See also #4736
Unfortunately it seems I can't write proper tests for this as Python 3.13t is not yet part of the test matrix. I'm aware that support for testing with 3.13 and 3.13t is still in it's early stages and for instance virtualenv does not yet support it.