Skip to content

Conversation

@yoney
Copy link
Contributor

@yoney yoney commented Oct 21, 2025

In the fbthrift-python benchmarks, we observed a performance regression when comparing Python 3.14t to 3.14, particularly during Thrift struct initialization and deserialization. The regression was linked to the use of the PySet_Add() API for populating set and frozenset objects, where a critical section is always applied—even when the set is uniquely referenced during the initialization phase. By skipping the critical section for uniquely referenced frozenset, we were able to reduce the regression.

In a micro-benchmark specifically targeting PySet_Add(), we observed a 19% performance improvement on Intel Broadwell systems and around 27% improvement on ARM/GH200 systems.

cc: @mpage @colesbury @Yhg1s

@mpage mpage requested review from Yhg1s, colesbury and mpage October 21, 2025 23:32
@yoney yoney marked this pull request as ready for review October 21, 2025 23:58
@yoney yoney requested a review from rhettinger as a code owner October 21, 2025 23:58
Copy link
Member

@Yhg1s Yhg1s left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This deserves a new item: https://devguide.python.org/core-team/committing/#how-to-add-a-news-entry

It'd also be good to file an issue to describe the problem and the performance improvement this change gives, for future reference. The PR title should be updated to mention the issue.

@yoney yoney changed the title Optimize PySet_Add() for uniquely referenced sets in free-threading gh-140476: Optimize PySet_Add() for uniquely referenced sets in free-threading Oct 22, 2025
@yoney yoney changed the title gh-140476: Optimize PySet_Add() for uniquely referenced sets in free-threading gh-140476: Optimize PySet_Add() for frozenset in free-threading Oct 23, 2025
@yoney
Copy link
Contributor Author

yoney commented Oct 23, 2025

I made a couple of changes and narrowed the optimisation down to frozenset only. This is because the API could be used for sets that are either global variables or members of another wrapper. In such cases, the object can be accessed by multiple threads, even if it is uniquely referenced. I was thinking about the global variables, but thanks to @kumaraditya303 for pointing out the scenario involving wrappers as well.

The global variable or wrapper scenario is also relevant for frozenset. However, API documentation specifies a special condition.

Also works with frozenset instances (like PyTuple_SetItem() it can be used to fill in the values of brand new frozensets before they are exposed to other code)

If the condition "before they are exposed to other code" is sufficient to proceed with his optimisation, then we still have some wins (19% Intel, 27% ARM benchmarked after changes). Otherwise, I am going to close this PR and look for alternative.

@colesbury @sergey-miryanov, I haven’t included the double _PyObject_IsUniquelyReferenced() to handle the cases where hashing could introduce another reference and pass it to another thread for frozenset. This is essentially exposing the frozenset to other code, which is what the API documentation addresses.

However, it might be a good idea to move the hashing outside of the critical_section in general, to minimize the time spent in the critical section. I can do it in a separate PR (or this) if that sounds reasonable to you.

@Yhg1s
Copy link
Member

Yhg1s commented Oct 24, 2025

Global variables are in fact not a concern, because those must be a separate reference (otherwise the set could be destroyed during PySet_Add, which can call arbitrary Python code that would delete the global). Other borrowed references are a concern, though (and not for the first time... Raah.)

@sergey-miryanov sergey-miryanov dismissed their stale review October 25, 2025 02:47

IIUC, we have changed the semantics of the PySet_Add. For the sets we now can call it if have more than one reference, is this by intention?

@sergey-miryanov
Copy link
Contributor

Oh, it is good. Sorry for the noise.

@yoney yoney closed this Oct 28, 2025
@yoney yoney deleted the ft_set_frozenset branch October 28, 2025 15:51
@yoney yoney restored the ft_set_frozenset branch October 28, 2025 15:55
@yoney
Copy link
Contributor Author

yoney commented Oct 28, 2025

Sorry, that was a mistake. I was trying to clean up branches.

@yoney yoney reopened this Oct 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants