Skip to content

_json: stale markers entries on error paths in encoder_listencode_obj #146152

@raminfp

Description

@raminfp

Bug report

Bug description:

encoder_listencode_obj() in Modules/_json.c calls PyDict_SetItem to register
an object in the circular-reference tracking dict (markers), but skips
PyDict_DelItem on all three error paths. This leaves a permanent strong reference
to the object inside markers.

Modules/_json.c - encoder_listencode_obj()

Root cause

/* L1444 */ PyDict_SetItem(s->markers, ident, obj);   // strong ref added

/* L1449 */ newobj = PyObject_CallOneArg(s->defaultfn, obj);
/* L1450 */ if (newobj == NULL) {
/* L1452 */     return -1;   // PATH A - no PyDict_DelItem
            }

/* L1455 */ if (_Py_EnterRecursiveCall("...")) {
/* L1458 */     return -1;   // PATH B - no PyDict_DelItem
            }

/* L1460 */ rv = encoder_listencode_obj(s, writer, newobj, newline_indent);
/* L1464 */ if (rv) {
/* L1466 */     return -1;   // PATH C - no PyDict_DelItem
            }

/* L1469 */ PyDict_DelItem(s->markers, ident);  // only reached on success

Impact

PATH A - default() raises

markers holds a strong reference to obj after failure.
del obj does not free it as long as markers is alive.

import _json, json, gc, sys

markers = {}
enc = _json.make_encoder(markers, lambda o: (_ for _ in ()).throw(TypeError()),
    json.encoder.encode_basestring_ascii, None, ': ', ', ', False, False, True)

class Obj: pass
obj = Obj()

rc_before = sys.getrefcount(obj)   # 2
try:
    enc(obj, 0)
except TypeError:
    pass
rc_after = sys.getrefcount(obj)    # 3  elevated by markers

del obj
gc.collect()
assert list(markers.values())[0]   # obj still alive - cannot be freed

PATH B - RecursionError

Each recursive frame adds one entry to markers without cleanup.
Triggerable via the public json.dumps() API.

import json

class Obj: pass

class R(json.JSONEncoder):
    def default(self, o): return Obj()

try:
    json.dumps(Obj(), cls=R)
except RecursionError:
    pass
# ~158 000 Obj instances were held alive in markers during the call

PATH C - recursive encoding of default()'s return value fails

One failed nested encoding leaves 3 stale entries: the original object,
the dict returned by default(), and the unencodable value inside it.

import _json, json, sys

class Obj: pass
class Inner: pass

def fn(o):
    if isinstance(o, Obj): return {"key": Inner()}
    raise TypeError

markers = {}
enc = _json.make_encoder(markers, fn,
    json.encoder.encode_basestring_ascii, None, ': ', ', ', False, False, True)

obj = Obj()
try:
    enc(obj, 0)
except TypeError:
    pass

assert len(markers) == 3        # Obj, dict, Inner - all leaked
assert sys.getrefcount(obj) > 2 # refcount elevated by markers

Expected behavior

PyDict_DelItem(s->markers, ident) should be called before every return -1
in the else-branch, mirroring the cleanup already present in
encoder_listencode_dict() and encoder_listencode_list() on their success paths.

CPython versions tested on:

CPython main branch

Operating systems tested on:

Linux

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    extension-modulesC modules in the Modules dirtype-bugAn unexpected behavior, bug, or error

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions