-
-
Notifications
You must be signed in to change notification settings - Fork 33.1k
Description
Feature or enhancement
Background
The gc.collect()
documentation states:
The effect of calling
gc.collect()
while the interpreter is already performing a collection is undefined.
In practice, gc.collect()
will return immediately if it's already running, including re-entrant calls from the same thread (i.e., from object finalizers) and if the GC happens to be running in another thread. Note that even with the GIL, the GC isn't completely atomic: running tp_clear
and object finalizes (i.e., __del__
functions) can release the GIL and allow another thread to run concurrently.
Motivation
Both CPython and third-party packages often use gc.collect()
in tests to ensure that cycles are collected and, in the free threading build, any delayed reference count operations are processed when testing for things related to object lifetimes.
The problem is that these tests can be flaky if multiple threads are used, because if one thread happens to be running the GC, then the gc.collect()
in another thread returns immediately without doing anything. This happens more frequently in the free threading build because there are more opportunities for interleaving between threads and because of things like biased reference counting that can lead to delayed object destructors.
Proposal
Add a keyword-only argument wait_if_running
to gc.collect()
. The default value is wait_if_running=False
, which preserves the current behavior. If wait_if_running=True
:
- If the GC is running in a different thread, then the call will wait until that GC finishes and then run the GC.
- If the GC is running in the caller's thread, then the call will raise an exception so as to avoid deadlock.
cc @hawkinsp
Has this already been discussed elsewhere?
This is a minor feature, which does not need previous discussion elsewhere
Links to previous discussion of this feature:
No response