Skip to content

Commit 350c58b

Browse files
authored
GH-135552: Make the GC clear weakrefs later (GH-136189)
Fix a bug caused by the garbage collector clearing weakrefs too early. The weakrefs in the ``tp_subclasses`` dictionary are needed in order to correctly invalidate type caches (for example, by calling ``PyType_Modified()``). Clearing weakrefs before calling finalizers causes the caches to not be correctly invalidated. That can cause crashes since the caches can refer to invalid objects. Defer the clearing of weakrefs without callbacks until after finalizers are executed.
1 parent deb385a commit 350c58b

File tree

9 files changed

+318
-141
lines changed

9 files changed

+318
-141
lines changed

InternalDocs/garbage_collector.md

Lines changed: 16 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -329,15 +329,16 @@ Once the GC knows the list of unreachable objects, a very delicate process start
329329
with the objective of completely destroying these objects. Roughly, the process
330330
follows these steps in order:
331331

332-
1. Handle and clear weak references (if any). Weak references to unreachable objects
333-
are set to `None`. If the weak reference has an associated callback, the callback
334-
is enqueued to be called once the clearing of weak references is finished. We only
335-
invoke callbacks for weak references that are themselves reachable. If both the weak
336-
reference and the pointed-to object are unreachable we do not execute the callback.
337-
This is partly for historical reasons: the callback could resurrect an unreachable
338-
object and support for weak references predates support for object resurrection.
339-
Ignoring the weak reference's callback is fine because both the object and the weakref
340-
are going away, so it's legitimate to say the weak reference is going away first.
332+
1. Handle weak references with callbacks (if any). If the weak reference has
333+
an associated callback, the callback is enqueued to be called after the weak
334+
reference is cleared. We only invoke callbacks for weak references that
335+
are themselves reachable. If both the weak reference and the pointed-to
336+
object are unreachable we do not execute the callback. This is partly for
337+
historical reasons: the callback could resurrect an unreachable object
338+
and support for weak references predates support for object resurrection.
339+
Ignoring the weak reference's callback is fine because both the object and
340+
the weakref are going away, so it's legitimate to say the weak reference is
341+
going away first.
341342
2. If an object has legacy finalizers (`tp_del` slot) move it to the
342343
`gc.garbage` list.
343344
3. Call the finalizers (`tp_finalize` slot) and mark the objects as already
@@ -346,7 +347,12 @@ follows these steps in order:
346347
4. Deal with resurrected objects. If some objects have been resurrected, the GC
347348
finds the new subset of objects that are still unreachable by running the cycle
348349
detection algorithm again and continues with them.
349-
5. Call the `tp_clear` slot of every object so all internal links are broken and
350+
5. Clear any weak references that still refer to unreachable objects. The
351+
`wr_object` attribute for these weakrefs are set to `None`. Note that some
352+
of these weak references maybe have been newly created during the running of
353+
finalizers in step 3. Also, clear any weak references that are part of the
354+
unreachable set.
355+
6. Call the `tp_clear` slot of every object so all internal links are broken and
350356
the reference counts fall to 0, triggering the destruction of all unreachable
351357
objects.
352358

Lib/test/test_finalization.py

Lines changed: 30 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -174,7 +174,7 @@ def test_simple(self):
174174
gc.collect()
175175
self.assert_del_calls(ids)
176176
self.assert_survivors([])
177-
self.assertIs(wr(), None)
177+
self.assertIsNone(wr())
178178
gc.collect()
179179
self.assert_del_calls(ids)
180180
self.assert_survivors([])
@@ -188,12 +188,12 @@ def test_simple_resurrect(self):
188188
gc.collect()
189189
self.assert_del_calls(ids)
190190
self.assert_survivors(ids)
191-
self.assertIsNot(wr(), None)
191+
self.assertIsNotNone(wr())
192192
self.clear_survivors()
193193
gc.collect()
194194
self.assert_del_calls(ids)
195195
self.assert_survivors([])
196-
self.assertIs(wr(), None)
196+
self.assertIsNone(wr())
197197

198198
@support.cpython_only
199199
def test_non_gc(self):
@@ -265,7 +265,7 @@ def test_simple(self):
265265
gc.collect()
266266
self.assert_del_calls(ids)
267267
self.assert_survivors([])
268-
self.assertIs(wr(), None)
268+
self.assertIsNone(wr())
269269
gc.collect()
270270
self.assert_del_calls(ids)
271271
self.assert_survivors([])
@@ -276,19 +276,24 @@ def test_simple_resurrect(self):
276276
s = SelfCycleResurrector()
277277
ids = [id(s)]
278278
wr = weakref.ref(s)
279+
wrc = weakref.ref(s, lambda x: None)
279280
del s
280281
gc.collect()
281282
self.assert_del_calls(ids)
282283
self.assert_survivors(ids)
283-
# XXX is this desirable?
284-
self.assertIs(wr(), None)
284+
# This used to be None because weakrefs were cleared before
285+
# calling finalizers. Now they are cleared after.
286+
self.assertIsNotNone(wr())
287+
# A weakref with a callback is still cleared before calling
288+
# finalizers.
289+
self.assertIsNone(wrc())
285290
# When trying to destroy the object a second time, __del__
286291
# isn't called anymore (and the object isn't resurrected).
287292
self.clear_survivors()
288293
gc.collect()
289294
self.assert_del_calls(ids)
290295
self.assert_survivors([])
291-
self.assertIs(wr(), None)
296+
self.assertIsNone(wr())
292297

293298
def test_simple_suicide(self):
294299
# Test the GC is able to deal with an object that kills its last
@@ -301,11 +306,11 @@ def test_simple_suicide(self):
301306
gc.collect()
302307
self.assert_del_calls(ids)
303308
self.assert_survivors([])
304-
self.assertIs(wr(), None)
309+
self.assertIsNone(wr())
305310
gc.collect()
306311
self.assert_del_calls(ids)
307312
self.assert_survivors([])
308-
self.assertIs(wr(), None)
313+
self.assertIsNone(wr())
309314

310315

311316
class ChainedBase:
@@ -378,18 +383,27 @@ def check_non_resurrecting_chain(self, classes):
378383

379384
def check_resurrecting_chain(self, classes):
380385
N = len(classes)
386+
def dummy_callback(ref):
387+
pass
381388
with SimpleBase.test():
382389
nodes = self.build_chain(classes)
383390
N = len(nodes)
384391
ids = [id(s) for s in nodes]
385392
survivor_ids = [id(s) for s in nodes if isinstance(s, SimpleResurrector)]
386393
wrs = [weakref.ref(s) for s in nodes]
394+
wrcs = [weakref.ref(s, dummy_callback) for s in nodes]
387395
del nodes
388396
gc.collect()
389397
self.assert_del_calls(ids)
390398
self.assert_survivors(survivor_ids)
391-
# XXX desirable?
392-
self.assertEqual([wr() for wr in wrs], [None] * N)
399+
for wr in wrs:
400+
# These values used to be None because weakrefs were cleared
401+
# before calling finalizers. Now they are cleared after.
402+
self.assertIsNotNone(wr())
403+
for wr in wrcs:
404+
# Weakrefs with callbacks are still cleared before calling
405+
# finalizers.
406+
self.assertIsNone(wr())
393407
self.clear_survivors()
394408
gc.collect()
395409
self.assert_del_calls(ids)
@@ -491,7 +505,7 @@ def test_legacy(self):
491505
self.assert_del_calls(ids)
492506
self.assert_tp_del_calls(ids)
493507
self.assert_survivors([])
494-
self.assertIs(wr(), None)
508+
self.assertIsNone(wr())
495509
gc.collect()
496510
self.assert_del_calls(ids)
497511
self.assert_tp_del_calls(ids)
@@ -507,13 +521,13 @@ def test_legacy_resurrect(self):
507521
self.assert_tp_del_calls(ids)
508522
self.assert_survivors(ids)
509523
# weakrefs are cleared before tp_del is called.
510-
self.assertIs(wr(), None)
524+
self.assertIsNone(wr())
511525
self.clear_survivors()
512526
gc.collect()
513527
self.assert_del_calls(ids)
514528
self.assert_tp_del_calls(ids * 2)
515529
self.assert_survivors(ids)
516-
self.assertIs(wr(), None)
530+
self.assertIsNone(wr())
517531

518532
def test_legacy_self_cycle(self):
519533
# Self-cycles with legacy finalizers end up in gc.garbage.
@@ -527,11 +541,11 @@ def test_legacy_self_cycle(self):
527541
self.assert_tp_del_calls([])
528542
self.assert_survivors([])
529543
self.assert_garbage(ids)
530-
self.assertIsNot(wr(), None)
544+
self.assertIsNotNone(wr())
531545
# Break the cycle to allow collection
532546
gc.garbage[0].ref = None
533547
self.assert_garbage([])
534-
self.assertIs(wr(), None)
548+
self.assertIsNone(wr())
535549

536550

537551
if __name__ == "__main__":

Lib/test/test_gc.py

Lines changed: 24 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -309,6 +309,26 @@ def func():
309309
self.assertRegex(stdout, rb"""\A\s*func=None""")
310310
self.assertFalse(stderr)
311311

312+
def test_datetime_weakref_cycle(self):
313+
# https://github.com/python/cpython/issues/132413
314+
# If the weakref used by the datetime extension gets cleared by the GC (due to being
315+
# in an unreachable cycle) then datetime functions would crash (get_module_state()
316+
# was returning a NULL pointer). This bug is fixed by clearing weakrefs without
317+
# callbacks *after* running finalizers.
318+
code = """if 1:
319+
import _datetime
320+
class C:
321+
def __del__(self):
322+
print('__del__ called')
323+
_datetime.timedelta(days=1) # crash?
324+
325+
l = [C()]
326+
l.append(l)
327+
"""
328+
rc, stdout, stderr = assert_python_ok("-c", code)
329+
self.assertEqual(rc, 0)
330+
self.assertEqual(stdout.strip(), b'__del__ called')
331+
312332
@refcount_test
313333
def test_frame(self):
314334
def f():
@@ -658,9 +678,8 @@ def callback(ignored):
658678
gc.collect()
659679
self.assertEqual(len(ouch), 2) # else the callbacks didn't run
660680
for x in ouch:
661-
# If the callback resurrected one of these guys, the instance
662-
# would be damaged, with an empty __dict__.
663-
self.assertEqual(x, None)
681+
# The weakref should be cleared before executing the callback.
682+
self.assertIsNone(x)
664683

665684
def test_bug21435(self):
666685
# This is a poor test - its only virtue is that it happened to
@@ -1335,6 +1354,7 @@ def setUp(self):
13351354
def tearDown(self):
13361355
gc.disable()
13371356

1357+
@unittest.skipIf(Py_GIL_DISABLED, "requires GC generations or increments")
13381358
def test_bug1055820c(self):
13391359
# Corresponds to temp2c.py in the bug report. This is pretty
13401360
# elaborate.
@@ -1410,6 +1430,7 @@ def callback(ignored):
14101430
self.assertEqual(x, None)
14111431

14121432
@gc_threshold(1000, 0, 0)
1433+
@unittest.skipIf(Py_GIL_DISABLED, "requires GC generations or increments")
14131434
def test_bug1055820d(self):
14141435
# Corresponds to temp2d.py in the bug report. This is very much like
14151436
# test_bug1055820c, but uses a __del__ method instead of a weakref

Lib/test/test_io.py

Lines changed: 18 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -808,7 +808,12 @@ def test_closefd_attr(self):
808808
def test_garbage_collection(self):
809809
# FileIO objects are collected, and collecting them flushes
810810
# all data to disk.
811-
with warnings_helper.check_warnings(('', ResourceWarning)):
811+
#
812+
# Note that using warnings_helper.check_warnings() will keep the
813+
# file alive due to the `source` argument to warn(). So, use
814+
# catch_warnings() instead.
815+
with warnings.catch_warnings():
816+
warnings.simplefilter("ignore", ResourceWarning)
812817
f = self.FileIO(os_helper.TESTFN, "wb")
813818
f.write(b"abcxxx")
814819
f.f = f
@@ -1809,7 +1814,11 @@ def test_garbage_collection(self):
18091814
# C BufferedReader objects are collected.
18101815
# The Python version has __del__, so it ends into gc.garbage instead
18111816
self.addCleanup(os_helper.unlink, os_helper.TESTFN)
1812-
with warnings_helper.check_warnings(('', ResourceWarning)):
1817+
# Note that using warnings_helper.check_warnings() will keep the
1818+
# file alive due to the `source` argument to warn(). So, use
1819+
# catch_warnings() instead.
1820+
with warnings.catch_warnings():
1821+
warnings.simplefilter("ignore", ResourceWarning)
18131822
rawio = self.FileIO(os_helper.TESTFN, "w+b")
18141823
f = self.tp(rawio)
18151824
f.f = f
@@ -2158,7 +2167,11 @@ def test_garbage_collection(self):
21582167
# all data to disk.
21592168
# The Python version has __del__, so it ends into gc.garbage instead
21602169
self.addCleanup(os_helper.unlink, os_helper.TESTFN)
2161-
with warnings_helper.check_warnings(('', ResourceWarning)):
2170+
# Note that using warnings_helper.check_warnings() will keep the
2171+
# file alive due to the `source` argument to warn(). So, use
2172+
# catch_warnings() instead.
2173+
with warnings.catch_warnings():
2174+
warnings.simplefilter("ignore", ResourceWarning)
21622175
rawio = self.FileIO(os_helper.TESTFN, "w+b")
21632176
f = self.tp(rawio)
21642177
f.write(b"123xxx")
@@ -4080,7 +4093,8 @@ def test_garbage_collection(self):
40804093
# C TextIOWrapper objects are collected, and collecting them flushes
40814094
# all data to disk.
40824095
# The Python version has __del__, so it ends in gc.garbage instead.
4083-
with warnings_helper.check_warnings(('', ResourceWarning)):
4096+
with warnings.catch_warnings():
4097+
warnings.simplefilter("ignore", ResourceWarning)
40844098
rawio = self.FileIO(os_helper.TESTFN, "wb")
40854099
b = self.BufferedWriter(rawio)
40864100
t = self.TextIOWrapper(b, encoding="ascii")
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
Fix a bug caused by the garbage collector clearing weakrefs too early. The
2+
weakrefs in the ``tp_subclasses`` dictionary are needed in order to correctly
3+
invalidate type caches (for example, by calling ``PyType_Modified()``).
4+
Clearing weakrefs before calling finalizers causes the caches to not be
5+
correctly invalidated. That can cause crashes since the caches can refer to
6+
invalid objects. Defer the clearing of weakrefs without callbacks until after
7+
finalizers are executed.

Modules/_datetimemodule.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -214,7 +214,7 @@ clear_current_module(PyInterpreterState *interp, PyObject *expected)
214214
if (PyDict_GetItemRef(dict, INTERP_KEY, &ref) < 0) {
215215
goto error;
216216
}
217-
if (ref != NULL) {
217+
if (ref != NULL && ref != Py_None) {
218218
PyObject *current = NULL;
219219
int rc = PyWeakref_GetRef(ref, &current);
220220
/* We only need "current" for pointer comparison. */

Modules/gc_weakref.txt

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,16 @@
11
Intro
22
=====
33

4+
**************************************************************************
5+
Note: this document was written long ago, before PEP 442 (safe object
6+
finalization) was implemented. While that has changed some things, this
7+
document is still mostly accurate. Just note that the rules being discussed
8+
here apply to the unreachable set of objects *after* non-legacy finalizers
9+
have been called. Also, the clearing of weakrefs has been changed to happen
10+
later in the collection (after running finalizers but before tp_clear is
11+
called).
12+
**************************************************************************
13+
414
The basic rule for dealing with weakref callbacks (and __del__ methods too,
515
for that matter) during cyclic gc:
616

0 commit comments

Comments
 (0)