-
-
Notifications
You must be signed in to change notification settings - Fork 33.3k
gh-140476: Optimize PySet_Add() for frozenset in free-threading #140440
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This deserves a new item: https://devguide.python.org/core-team/committing/#how-to-add-a-news-entry
It'd also be good to file an issue to describe the problem and the performance improvement this change gives, for future reference. The PR title should be updated to mention the issue.
|
I made a couple of changes and narrowed the optimisation down to The global variable or wrapper scenario is also relevant for
If the condition "before they are exposed to other code" is sufficient to proceed with his optimisation, then we still have some wins (19% Intel, 27% ARM benchmarked after changes). Otherwise, I am going to close this PR and look for alternative. @colesbury @sergey-miryanov, I haven’t included the double However, it might be a good idea to move the hashing outside of the |
|
Global variables are in fact not a concern, because those must be a separate reference (otherwise the set could be destroyed during PySet_Add, which can call arbitrary Python code that would delete the global). Other borrowed references are a concern, though (and not for the first time... Raah.) |
IIUC, we have changed the semantics of the PySet_Add. For the sets we now can call it if have more than one reference, is this by intention?
|
Oh, it is good. Sorry for the noise. |
|
Sorry, that was a mistake. I was trying to clean up branches. |
In the
fbthrift-pythonbenchmarks, we observed a performance regression when comparing Python 3.14t to 3.14, particularly during Thrift struct initialization and deserialization. The regression was linked to the use of thePySet_Add()API for populatingsetandfrozensetobjects, where a critical section is always applied—even when the set is uniquely referenced during the initialization phase. By skipping the critical section for uniquely referencedfrozenset, we were able to reduce the regression.In a micro-benchmark specifically targeting
PySet_Add(), we observed a 19% performance improvement on Intel Broadwell systems and around 27% improvement on ARM/GH200 systems.cc: @mpage @colesbury @Yhg1s