Skip to content

Conversation

@nascheme
Copy link
Member

@nascheme nascheme commented Jun 3, 2025

This is a slightly modified version of GH-132381 but based on the 'main' branch instead of 3.13. I originally closed GH-132381 because Sam had some concerns about the approach. However, on re-visting it, I still think this change is worth considering. It helps the non-interned string lookup case and should also help in general due to less contention on TYPE_LOCK.

If the type lookup cache (mcache) is missed, do not acquire TYPE_LOCK unless we need to assign a type version. We actually don't need to lock in order to do the MRO attribute lookup.

This avoids lock contention outlined in GH-132380. Performance results based on running LibCST on the numpy source:

Time [s] Config
35.9 Main branch, free-threaded build
23.6 Above config, with this PR applied
18.2 Main branch, free-threaded with -X gil=1 (multi-process)

In a previous version of this PR, Sam was concerned about the addition of the _PyType_GetMRO() call, potentially causing refcount contention. I don't think that's a problem since the previous code immediately called Py_INCREF() on the mro tuple and so this should make it no worse in that respect.

Sam also made the comment:

I think this splits up the (slow path) lookup and update of the MRO cache into two separate lock acquisitions. That doesn't seem safe to me. I think that would allow interleavings in which we permanently replace a good cache entry with a stale entry.

The lock is only held when assigning the type version. The MRO lookup and cache update are done without a lock. It could happen that we add an entry to the cache with an out-of-date (already stale) type version number. That has the downside of potentially replacing a good entry from the cache with a useless one. However, I think this case is so rare that it doesn't matter in practice and it is more efficient not to handle this case. The replacement is not permanent. The next time the "good" item is looked up, it will be re-added to the cache, replacing the stale entry. So, unless the type version is changing on every lookup, the addition of the stale entry is temporary.

We could handle it explicitly by re-checking the current type version after the find_name_in_mro() call but I think that would overall decrease performance. I didn't try benchmarking this though. The cache miss path is already pretty slow so probably it won't hurt much to re-check the version.

pyperformance results

This attached script shows a behavioral difference due to this change. Since the type objects are allowed to be mutated while the MRO lookup is happening, you can get an AttributeError running this script while in previous versions I don't think that was possible. At least, not without having __hash__ or similar methods that mutate the type objects.

race_mro_lookup.py.txt

If the type lookup cache (mcache) is missed, do not acquire TYPE_LOCK
unless we need to assign a type version.  We actually don't need to lock
in order to do the MRO attribute lookup.
@nascheme
Copy link
Member Author

To avoid the refcount contention when getting the MRO tuple, I added a PyUnstable_Object_EnableDeferredRefcount() call on it. That improves multi-threaded performance quite a bit, based on the LibCST test. The time goes from 23.6 seconds to 19.3, pretty close to the multi-process time now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant