Skip to content

Conversation

@eendebakpt
Copy link
Contributor

@eendebakpt eendebakpt commented Oct 16, 2025

In the PR we untrack frozen tuples for the normal constructors. There are a few methods shared between the set and frozenset (for example set_intersection in setobject.c) where we have not added the untracking. (this is possible, but I am not sure this is worthwhile to do).

Here is a small script to test the idea:

import gc
import time
from statistics import mean

number_of_iterations = 20
number_of_gc_iterations = 50

deltas = []

gc.disable()
gc.collect()
for kk in range(number_of_iterations):
    t0 = time.perf_counter()
    for jj in range(number_of_gc_iterations):
        gc.collect()
    dt = time.perf_counter() - t0
    deltas.append(dt)
print(f"time per collection: mean {1e3 * mean(deltas) / number_of_iterations:.3f} [ms], min {1e3 * min(deltas) / number_of_iterations:.3f} [ms]")

sets = [frozenset([ii]) for ii in range(10_000)]
deltas = []
print("---")
gc.disable()
gc.collect()
for kk in range(number_of_iterations):
    t0 = time.perf_counter()
    for jj in range(number_of_gc_iterations):
        gc.collect()
    dt = time.perf_counter() - t0
    deltas.append(dt)
print(f"time per collection: mean {1e3 * mean(deltas) / number_of_iterations:.3f} [ms], min {1e3 * min(deltas) / number_of_iterations:.3f} [ms]")

#%% Show statistics of frozen containers

gc.collect()

def candidate(obj):
    return all(not gc.is_tracked(x) for x in obj)

for immutable_type in (tuple, frozenset):
    number_of_objects_tracked = 0
    number_of_candidates = 0
    number_of_immutable_candidates = 0

    for obj in gc.get_objects():
        number_of_objects_tracked += 1
        if type(obj) is immutable_type:
            number_of_candidates += 1
            # print(f"{type(obj)} = {obj}")
            if candidate(obj):
                number_of_immutable_candidates += 1

    print(f"type {immutable_type}")
    print(f"  {number_of_objects_tracked=}")
    print(f"  {number_of_candidates=}")
    print(f"  {number_of_immutable_candidates=}")

It measures the performance of garbage collection, and outputs some statistics for the numbers of frozen containers.

Main:

time per collection: mean 1.311 [ms], min 1.301 [ms]
---
time per collection: mean 2.467 [ms], min 2.272 [ms]
type <class 'tuple'>
  number_of_objects_tracked=18330
  number_of_candidates=546
  number_of_immutable_candidates=1
type <class 'frozenset'>
  number_of_objects_tracked=18330
  number_of_candidates=10059
  number_of_immutable_candidates=10057

PR

time per collection: mean 1.285 [ms], min 1.251 [ms]
---
time per collection: mean 1.424 [ms], min 1.396 [ms]
type <class 'tuple'>
  number_of_objects_tracked=8273
  number_of_candidates=546
  number_of_immutable_candidates=6
type <class 'frozenset'>
  number_of_objects_tracked=8273
  number_of_candidates=2
  number_of_immutable_candidates=0

Co-authored-by: Mikhail Efimov <[email protected]>
@sergey-miryanov
Copy link
Contributor

sergey-miryanov commented Oct 17, 2025

Maybe it is worth to change tp_alloc for something like:

PyObject *
PyFrozenSet_Alloc(PyTypeObject *type, Py_ssize_t nitems)
{
    PyObject *obj = PyType_GenericAlloc(type, nitems);
    if (obj == NULL) {
        return NULL;
    }

    _PyFrozenSet_MaybeUntrack(obj);
    return obj;
}

@eendebakpt
Copy link
Contributor Author

Maybe it is worth to change tp_alloc for something like:

The tp_alloc is used in make_new_set, which in turn is called by make_new_set. The last one is used set_intersection which modifies a frozenset. So adding _PyFrozenSet_MaybeUntrack to tp_alloc would mean we have to add a _PyFrozenSet_MaybeTrack to the end of set_intersection. This is a complication I do not want to tackle (certainly not in this PR).

Copy link
Contributor

@sergey-miryanov sergey-miryanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good to me.

PyObject *set = NULL, *empty_tuple=NULL, *tracked_object;


tracked_object = PyImport_ImportModule("sys");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just create empty list or dict here? Importing module seems too heavy for testing purpose.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree a module seems heavy, but we cannot add a list or dict to a set. A custom class or namedtuple would also do, but they require more code to setup. But any suggestion for a simple-to-create hashable object tracked by the GC is welcome.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ough, I forgot about the requirement to be a hashable. Thanks!
WDYT about exposing function for PySet_Add to _testcapi and write tests in python like I did this https://github.com/python/cpython/pull/140132/files#diff-70eaebed435342e02ba8f7f5a84e4eebd552438ce6ac2765e80abb5514bdea03R134?

Then you can write test like:

class Test:
    pass

fs = pyset_add(Test())

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the reply to victor below. Just adding pyset_add will not work. We could add a _testcapi.test_pyset_add(tracked_item), but then we have the test functionality spread over both the python and c side.

@vstinner
Copy link
Member

Would it be possible to write tests in Python rather than in C?

@eendebakpt
Copy link
Contributor Author

Would it be possible to write tests in Python rather than in C?

I tried, but it is not easy. We have to expose PySet_Add (frozenset().add does not exist on the python side). I added pyset_add on the _testcapi module (with pyset_add just calling PySet_Add). But running this on a frozenset from the python side does not work: when calling _testcapi.pyset_add(frozen_set, item) there too many references to the frozen_set and PySet_Add will fail with an internal error here:

PyErr_BadInternalCall();

And when calling _testcapi.pyset_add(frozenset(), item) we do not have the frozenset available to test whether tracking has been enabled.

@sergey-miryanov
Copy link
Contributor

And when calling _testcapi.pyset_add(frozenset(), item) we do not have the frozenset available to test whether tracking has been enabled.

IIUC, if you return the first argument from pyset_add then you can test it on the python side.

@eendebakpt
Copy link
Contributor Author

And when calling _testcapi.pyset_add(frozenset(), item) we do not have the frozenset available to test whether tracking has been enabled.

IIUC, if you return the first argument from pyset_add then you can test it on the python side.

Ok, I gave it another try. The first attempt failed, but by using the vectorcall convention I can keep the reference count at 1 also from the Python side.

Py_ssize_t pos = 0;
setentry *entry;
while (set_next((PySetObject *)op, &pos, &entry)) {
if (_PyObject_GC_MAY_BE_TRACKED(entry->key)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should use faster _PyType_IS_GC(Py_TYPE(entry->key)) as in maybe_tracked from Objects/tupleobject.c?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure performance matters a lot, but I would prefer to have it consistent with what is used in tupleobject.c. Unless there are objections, I will change the implementation to use the maybe_tracked.

return make_new_set(type, iterable);
}

void
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment to explain the purpose of this function with a link to the GitHub issue.

void
_PyFrozenSet_MaybeUntrack(PyObject *op)
{
if (op == NULL || !PyFrozenSet_CheckExact(op)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not untracking frozenset subtypes?

Copy link
Member

@efimov-mikhail efimov-mikhail Oct 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can make a cycle using frozenset subtype:

>>> class F(frozenset):
...     pass
...     
>>> f = F([1,2,3])
>>> f.cycle = f

So, we need to track them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh ok. In this case, please add a comment explaining that :-)

# Test the PySet_Add c-api for frozenset objects
assert _testcapi.pyset_add(frozenset(), 1) == frozenset([1])
frozen_set = frozenset()
self.assertRaises(SystemError, _testcapi.pyset_add, frozen_set, 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why the second test fails, whereas the first succeed.

Copy link
Contributor Author

@eendebakpt eendebakpt Oct 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second test fails because the argument frozen_set is not uniquely referenced. The error is raised here:

(!PyFrozenSet_Check(anyset) || !_PyObject_IsUniquelyReferenced(anyset))) {

I will add a comment to the test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants