Skip to content

Conversation

mansidhall-CAW
Copy link

@mansidhall-CAW mansidhall-CAW commented Mar 28, 2025

@mansidhall-CAW
Copy link
Author

📝 OVERVIEW

  • Main purpose: Optimize functools.lru_cache for concurrency by allowing parallel execution of the cached function's body without holding the internal lock, while maintaining cache consistency.
  • Type of changes: Refactoring and performance optimization.

🛠️ TECHNICAL DETAILS

  • Atomic operations for hit/miss counters: Replaced direct increments of hits and misses with FT_ATOMIC_ADD_SSIZE, which uses _Py_atomic_add_ssize when the GIL is disabled. This ensures thread-safe counter updates without requiring the cache lock.
  • Lock release during function execution: Introduced a context manager (via Py_BEGIN_ALLOW_THREADS/Py_END_ALLOW_THREADS in C) to temporarily release the cache's internal lock while executing the decorated function's body.
  • Bounded cache logic restructuring: Moved bounded LRU cache management (e.g., evicting entries when capacity is exceeded) into lock-protected functions (_lru_cache_bounded_evict, _lru_cache_bounded_add) to ensure thread safety during critical path operations.
  • Thread-safe state access: cache_info() and cache_clear() now use atomic loads/stores (_Py_atomic_load_ssize, _Py_atomic_store_ssize) to read/write counters and state variables without requiring the lock.
  • No new dependencies/API changes: No external dependencies or public API changes. Internal C API functions (e.g., _Py_atomic_*) are leveraged for atomic operations.

🏗️ ARCHITECTURAL IMPACT

  • Locking pattern modification: The cache lock is now released during the function's execution phase, decoupling the function's runtime from lock contention.
  • Atomic operations as a new abstraction: Atomic counters reduce reliance on coarse-grained locks for state updates, introducing a finer-grained synchronization mechanism.
  • Coupling reduction: Thread-safe state access via atomic operations reduces dependencies on the cache lock for read/write operations, improving modularity.

🚀 IMPLEMENTATION HIGHLIGHTS

  • Key algorithms/data structures:
    • Atomic counters: Used for hits and misses to avoid lock contention during increment operations.
    • Context manager pattern: Explicitly releases the GIL and cache lock during function execution, reacquiring them afterward.
    • Bounded LRU eviction: Restructured to use lock-protected functions for safe entry insertion/removal in bounded caches.
  • Performance considerations:
    • Reduces lock contention, enabling parallel execution of the cached function in multi-threaded scenarios.
    • Atomic operations may introduce minor overhead but avoid the cost of acquiring/releasing the lock for counter updates.
  • Security implications: Proper use of atomic operations and lock management avoids race conditions in counter updates and cache state transitions.
  • Error handling: The context manager ensures the lock/GIL is reacquired even if the decorated function raises an exception, preventing deadlocks.

📜 Changes

File Change Type Changes Summary
Misc/NEWS.d/next/Library/2025-03-26-10-56-22.gh-issue-131757.pFRdmN.rst Added +1/-0 The change modifies functools.lru_cache to execute the cached function's body without holding the cache's internal lock, allowing concurrent execution of the decorated function in parallel threads. This uses a context manager to temporarily release the lock during the function's execution, improving concurrency while maintaining cache consistency.
Modules/_functoolsmodule.c Modified +92/-37 The changes implement atomic operations for hit/miss counters and critical sections in LRU cache logic, affecting uncached_lru_cache_wrapper, infinite_lru_cache_wrapper, and bounded_lru_cache_wrapper by replacing direct increments with FT_ATOMIC_ADD_SSIZE (using _Py_atomic_add_ssize when GIL is disabled) and restructuring bounded cache handling into lock-protected functions, while cache_info/cache_clear now use atomic loads/stores for thread-safe state access.

PyObject *result;

self->misses++;
FT_ATOMIC_ADD_SSIZE(self->misses, 1);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The non-bounded path incorrectly increments misses and calls the function before checking the cache. This bypasses the cache entirely, leading to incorrect behavior. The cache check must occur before calling the function.

SUGGESTION:

- FT_ATOMIC_ADD_SSIZE(self->misses, 1);
- result = PyObject_Call(self->func, args, kwds);
+ // Move cache check before function call
+ // Restore original logic from old code
+ // (create key, check cache, increment hits/misses accordingly)

@mansidhall-CAW
Copy link
Author

{
  "lastReviewedCommitId": "34f7fe7d3e46506b4d670b8712573b9c20c3999b",
  "reviewedCommitIds": [
    "34f7fe7d3e46506b4d670b8712573b9c20c3999b"
  ]
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants