Skip to content

Conversation

nascheme
Copy link
Member

@nascheme nascheme commented Apr 8, 2025

This roughly follows what was done for dictobject to make a lock-free lookup operation. The set contains operation scales much better when used from multiple-threads. The frozenset contains performance seems unchanged (as already lock-free).

Summary of changes:

  • refactor set_lookkey() into set_do_lookup() which now takes a function pointer that does the entry comparison. This is similar to dictobject and do_lookup(). In an optimized build, the comparison function is inlined and there should be no performance cost to this.
  • change set_do_lookup to return a status separately from the entry value.
  • add set_compare_frozenset() and use if the object is a frozenset. For the free-threaded build, this avoids some overhead (locking, atomic operations, incref/decref on key)
  • use FT_ATOMIC_* macros as needed for atomic loads and stores
  • use a deferred free on the set table array, if shared (only on free-threaded build, normal build always does an immediate free)
  • for free-threaded build, use explicit for loop to zero the table, rather than memcpy().
  • when mutating the set, assign so->table to NULL while the change is a happening. Assign the real table array after the change is done.

Free-threading scaling benchmark results from the attached scripts (result for 8 cores in parallel). This is a modified version of the ftscalingbenchmark.py script.

base this PR
dict_contains 9.1x faster 9.4x faster
tuple_contains 8.0x faster 7.6x faster
list_contains 7.9x faster 7.8x faster
frozenset_contains 7.8x faster 7.8x faster
frozenset_contains_dunder 7.6x faster 7.7x faster
set_contains 2.6x slower 7.7x faster
set_contains_dunder 2.8x slower 6.1x faster
shallow_copy 1.8x faster 1.8x faster
deepcopy 4.2x faster 4.5x faster

ftscaling_set.py.txt

@nascheme nascheme changed the title Add lock-free set contains implemention GH-132657: Add lock-free set contains implementation Jul 12, 2025
@nascheme nascheme added performance Performance or resource usage topic-free-threading 🔨 test-with-buildbots Test PR w/ buildbots; report in status section labels Jul 12, 2025
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @nascheme for commit 70a1c1f 🤖

Results will be shown at:

https://buildbot.python.org/all/#/grid?branch=refs%2Fpull%2F132290%2Fmerge

If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Jul 12, 2025
@nascheme nascheme marked this pull request as ready for review August 13, 2025 17:35
@nascheme nascheme requested a review from rhettinger as a code owner August 13, 2025 17:35
@rhettinger rhettinger removed their request for review August 13, 2025 20:42
@nascheme nascheme requested a review from eendebakpt September 30, 2025 22:57
@nascheme nascheme requested a review from eendebakpt September 30, 2025 22:57
@nascheme
Copy link
Member Author

Hello @eendebakpt, since you have recently worked on the setobject.c source, perhaps you are able to take another look at this. I think the free-threaded scaling improvement is significant. I've merge with the main branch and updated the benchmark results.

NUM_LOOPS = 20

s = frozenset()
def make_set():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the make_set here in a thread? It just assigns the same frozenset to s each time.

Maybe change make_set so that it creates a different frozenset each time?

#define SET_MARK_SHARED(so) _PyObject_GC_SET_SHARED(so)

static void
ensure_shared_on_read(PySetObject *so)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: this is identical to the ensure_shared_on_read from dictobject.c. The SET_DICT_SHARED would correspond to SET_SET_SHARED, which is an odd name so here it is SET_MARK_SHARED.

FT_ATOMIC_STORE_SSIZE_RELAXED(a->hash, -1);
FT_ATOMIC_STORE_SSIZE_RELAXED(b->hash, -1);
}
if (!SET_IS_SHARED(b) && SET_IS_SHARED(a)) {
Copy link
Contributor

@eendebakpt eendebakpt Oct 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set_swap_bodies is only called in two locations in this file. For both cases the second argument b is a newly created temporary which is discarded afterwards. So we could replace part (all?) of these branches with asserts. Part of the other work (e.g. copying back to b) might not be needed for the same reason.

Py_BEGIN_CRITICAL_SECTION(so);
rv = set_contains_lock_held(so, key);
Py_END_CRITICAL_SECTION();
return set_contains_entry(so, key, hash);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return set_contains_entry(so, key, hash);

{
int status;
setentry *entry;
if (PyFrozenSet_CheckExact(so)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: this path can be taken. set_lookkey_threadsafe is only called in set_contains_entry. The set_contains_entry is only called in _PySet_Contains. _PySet_Contains is called in set___contains___impl and in bytecodes.c _CONTAINS_OP_SET (the latter can have frozenset as input)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants