Skip to content

Conversation

colesbury
Copy link
Contributor

@colesbury colesbury commented Aug 13, 2025

There was a deadlock originally seen by Memray when a daemon thread enabled or disabled profiling while the interpreter was shutting down. I think this could also happen with garbage collection, but I haven't seen that in practice.

The daemon thread could be hung while trying acquire the global rwmutex that prevents overlapping global and per-interpreter stop-the-world events. Since it already held the main interpreter's stop-the-world lock, it also deadlocked the main thread, which is trying to perform interpreter finalization.

Swap the order of lock acquisition to prevent this deadlock. Additionally, refactor _PyParkingLot_Park so that the global buckets hashtable is left in a clean state if the thread is hung in PyEval_AcquireThread.

There was a deadlock originally seen by Memray when a daemon thread
enabled or disabled profiling while the interpreter was shutting down.
I think this could also happen with garbage collection, but I haven't
seen that in practice.

The daemon thread could be hung while trying acquire the global rwmutex
that prevents overlapping global and per-interpreter stop-the-world events.
Since it already held the main interpreter's stop-the-world lock, it
also deadlocked the main thread, which is trying to perform interpreter
finalization.

Swap the order of lock acquisition to prevent this deadlock.
Additionally, refactor `_PyParkingLot_Park` so that the global buckets
hashtable is left in a clean state if the thread is hung in
`PyEval_AcquireThread`.
@gpshead gpshead added the needs backport to 3.14 bugs and security fixes label Sep 14, 2025
@colesbury colesbury marked this pull request as ready for review September 15, 2025 10:34
Copy link
Member

@ZeroIntensity ZeroIntensity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I'm not a huge fan of using PyEval_AcquireThread/PyEval_ReleaseThread over _PyThreadState_Attach/_PyThreadState_Detach, but since those were there already I'm not going to bother complaining about it.

@colesbury colesbury merged commit 90fe325 into python:main Sep 16, 2025
85 of 87 checks passed
@miss-islington-app
Copy link

Thanks @colesbury for the PR 🌮🎉.. I'm working now to backport this PR to: 3.14.
🐍🍒⛏🤖 I'm not a witch! I'm not a witch!

@colesbury colesbury deleted the gh-137433-stop-the-world-daemon-deadlock branch September 16, 2025 08:22
miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Sep 16, 2025
…ythongh-137735)

There was a deadlock originally seen by Memray when a daemon thread
enabled or disabled profiling while the interpreter was shutting down.
I think this could also happen with garbage collection, but I haven't
seen that in practice.

The daemon thread could be hung while trying acquire the global rwmutex
that prevents overlapping global and per-interpreter stop-the-world events.
Since it already held the main interpreter's stop-the-world lock, it
also deadlocked the main thread, which is trying to perform interpreter
finalization.

Swap the order of lock acquisition to prevent this deadlock.
Additionally, refactor `_PyParkingLot_Park` so that the global buckets
hashtable is left in a clean state if the thread is hung in
`PyEval_AcquireThread`.
(cherry picked from commit 90fe325)

Co-authored-by: Sam Gross <[email protected]>
@bedevere-app
Copy link

bedevere-app bot commented Sep 16, 2025

GH-138965 is a backport of this pull request to the 3.14 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.14 bugs and security fixes label Sep 16, 2025
encukou pushed a commit that referenced this pull request Oct 7, 2025
…gh-137735) (GH-138965)

There was a deadlock originally seen by Memray when a daemon thread
enabled or disabled profiling while the interpreter was shutting down.
I think this could also happen with garbage collection, but I haven't
seen that in practice.

The daemon thread could be hung while trying acquire the global rwmutex
that prevents overlapping global and per-interpreter stop-the-world events.
Since it already held the main interpreter's stop-the-world lock, it
also deadlocked the main thread, which is trying to perform interpreter
finalization.

Swap the order of lock acquisition to prevent this deadlock.
Additionally, refactor `_PyParkingLot_Park` so that the global buckets
hashtable is left in a clean state if the thread is hung in
`PyEval_AcquireThread`.

(cherry picked from commit 90fe325)

Co-authored-by: Sam Gross <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants