Skip to content

Conversation

@mdboom
Copy link
Contributor

@mdboom mdboom commented Mar 20, 2025

Back in 2013, it was determined that caching the result of tuple_hash did not have any significant speedup.

However, a lot has changed since then, and in a recent experiment to add a tuple hash cache back in, the mdp benchmark increased by 86%.

Admittedly, there was no measurable improvement on any other benchmark, but it also seems to have no downside, including for memory usage when measured with max_rss.

@rhettinger rhettinger removed their request for review March 20, 2025 22:22
if (np == NULL)
return NULL;

_PyTuple_RESET_HASH_CACHE(np);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any reason not to just move this into tuple_alloc() itself instead of repeating it everywhere tuple_alloc() is called?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't personally have an objection. Most *_alloc functions only do allocation, not initialization, so I didn't want to confuse that. But the cached hash is kind of a special case...

PyTupleObject *v = _PyTuple_CAST(op);

// For the empty singleton, we don't need to dereference the pointer
if (op == (PyObject *)&_Py_SINGLETON(tuple_empty)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could set the hash of () statically in pycore_runtime_init.h, then this test would be unnecessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but interestingly, this is measurably faster -- it doesn't have to chase the pointer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the difference here is in the noise, so might as well do this at build time and skip this extra branch.

{
PyTupleObject *v = _PyTuple_CAST(op);

if (v->ob_hash != -1) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The loads of ob_hash need relaxed memory ordering i.e. use FT_ATOMIC_LOAD_SSIZE_RELAXED

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still learning the free-threaded conventions. Can you explain why (or point me at an explanation)? Since the write is atomic and we don't care if there is a race (worst case, the hash gets recomputed), and the thread sanitizer CI isn't complaining, I don't see why this would be necessary -- but also I'm new to this stuff so maybe I'm missing something.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The FT_... macros translate to plain loads/stores on the default build, so there is no overhead there.

It is necessary because accessing a field from multiple threads without any kind of synchronization is undefined behavior.
Even on the free-threaded build the RELAXED part means that they is no synchronization at the hardware level on most platforms, it just convinces the compiler and sanitizers that this is OK.

@bedevere-app
Copy link

bedevere-app bot commented Mar 25, 2025

When you're done making the requested changes, leave the comment: I have made the requested changes; please review again.

@mdboom
Copy link
Contributor Author

mdboom commented Mar 25, 2025

I have made the requested changes; please review again

@bedevere-app
Copy link

bedevere-app bot commented Mar 25, 2025

Thanks for making the requested changes!

@picnixz, @kumaraditya303: please review the changes made to this pull request.

@bedevere-app bedevere-app bot requested a review from picnixz March 25, 2025 14:05
@brandtbucher
Copy link
Member

Now that all of these places have been replaced with a function call to _PyTuple_Recycle, is that sufficient, or would you prefer that these comments remain?

We should probably just remove the old comments and keep one in _PyTuple_Recycle, in my opinion.

@vstinner
Copy link
Member

This PR changes 102 files, it's hard to review. Most files are generated by Argument Clinic. Would it be possible to have a first PR which only adds the ob_hash member? So the second PR would only contain the interesting changes, making it easier to review.

@mdboom
Copy link
Contributor Author

mdboom commented Mar 26, 2025

This PR changes 102 files, it's hard to review. Most files are generated by Argument Clinic. Would it be possible to have a first PR which only adds the ob_hash member? So the second PR would only contain the interesting changes, making it easier to review.

GitHub seems to be doing the right thing by hiding the generated changes. Is that not happening for you?

I'd be happy to rearrange the commits, but I don't think merging a PR if the second one may not be approved makes sense. Let me know, as that would also invalidate the existing review comments on this PR.

@mdboom mdboom merged commit 8614f86 into python:main Mar 27, 2025
50 checks passed
diegorusso pushed a commit to diegorusso/cpython that referenced this pull request Apr 1, 2025
* pythongh-131525: Cache the result of tuple_hash

* Fix debug builds

* Add blurb

* Fix formatting

* Pre-compute empty tuple singleton

* Mostly set the cache within tuple_alloc

* Fixes for TSAN

* Pre-compute empty tuple singleton

* Fix for 32-bit platforms

* Assert that op != NULL in _PyTuple_RESET_HASH_CACHE

* Use FT_ATOMIC_STORE_SSIZE_RELAXED macro

* Update Include/internal/pycore_tuple.h

Co-authored-by: Bénédikt Tran <[email protected]>

* Fix alignment

* atomic load

* Update Objects/tupleobject.c

Co-authored-by: Chris Eibl <[email protected]>

---------

Co-authored-by: Bénédikt Tran <[email protected]>
Co-authored-by: Chris Eibl <[email protected]>
seehwan pushed a commit to seehwan/cpython that referenced this pull request Apr 16, 2025
* pythongh-131525: Cache the result of tuple_hash

* Fix debug builds

* Add blurb

* Fix formatting

* Pre-compute empty tuple singleton

* Mostly set the cache within tuple_alloc

* Fixes for TSAN

* Pre-compute empty tuple singleton

* Fix for 32-bit platforms

* Assert that op != NULL in _PyTuple_RESET_HASH_CACHE

* Use FT_ATOMIC_STORE_SSIZE_RELAXED macro

* Update Include/internal/pycore_tuple.h

Co-authored-by: Bénédikt Tran <[email protected]>

* Fix alignment

* atomic load

* Update Objects/tupleobject.c

Co-authored-by: Chris Eibl <[email protected]>

---------

Co-authored-by: Bénédikt Tran <[email protected]>
Co-authored-by: Chris Eibl <[email protected]>
facebook-github-bot pushed a commit to facebook/fbthrift that referenced this pull request Sep 25, 2025
Summary: Use `offsetof()` to find the offset of the `PyTupleObject`'s `ob_item` member, since upstream CPython could [add more fields](python/cpython#131529) in the future.

Reviewed By: prakashgayasen, islamismailov

Differential Revision: D83114258

fbshipit-source-id: f30225b70d07638c5092e2e022d11eb278a7c3ee
facebook-github-bot pushed a commit to facebook/hhvm that referenced this pull request Sep 25, 2025
Summary: Use `offsetof()` to find the offset of the `PyTupleObject`'s `ob_item` member, since upstream CPython could [add more fields](python/cpython#131529) in the future.

Reviewed By: prakashgayasen, islamismailov

Differential Revision: D83114258

fbshipit-source-id: f30225b70d07638c5092e2e022d11eb278a7c3ee
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants