-
Notifications
You must be signed in to change notification settings - Fork 136
CachingSession: Mostly fix a race condition #1423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
A race can happen when multiple calls with different statements (none of which are in cache) are executed at the same time, with cache full. It is possible that all of them will remove the same entry from the cache, and then add their own entries. Then size of the cache will exceed self.max_capacity, so we will never remove from cache again. First fix that I thought of is to change "==" to "<=". With that fix, we would still be able to remove elements from the map after race happens. Note that we only remove an element when inserting a new element, so this would still never decrease the cache overflow unless 2 cache misses with the same query happen at the same time. So the memory leak would still happen, but much slower. So in addition to that I changed "if cache full: remove" to "while cache full: remove". In some cases it could lead to some queries getting evicted for no reason, hurting performance. There is also slim possibility of some thread getting starved for a bit. Those issues seem not very probable, so may it is better to risk them than the memory leak? I'm honestly not sure. I also thought of checking cache size in the happy path, but len() is not a simple operation in dashmap - it goes through shards and sums their lengths. I didn't benchmark it, but I'm afraid it could hurt performance. Maybe in the future we should research how dedicated cache libraries are approaching this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes a race condition in the CachingSession that could lead to a memory leak. The issue occurs when multiple threads execute cache misses simultaneously on a full cache, potentially causing all threads to remove the same cache entry but add different ones, resulting in cache overflow that would never be cleaned up.
- Changed the cache size check from equality (
==
) to less-than-or-equal (<=
) to handle overflow conditions - Replaced the single cache eviction with a while loop to ensure proper cache size management after race conditions
- Added detailed comments explaining the race condition and potential trade-offs of the solution
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.
// If we don't have a loop here, then this overflow would never disappear during typical | ||
// operation of caching session. | ||
// The loop has downsides: it could evict more entries than strictly necessary, or starve | ||
// some thread for a bit. If this becomes a problem then maye we should research how |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a typo in the comment: 'maye' should be 'maybe'.
// some thread for a bit. If this becomes a problem then maye we should research how | |
// some thread for a bit. If this becomes a problem then maybe we should research how |
Copilot uses AI. Check for mistakes.
See the following report for details: cargo semver-checks output
|
Probably it worth to throw some warning when cache is full. |
A race can happen when multiple calls with different statements (none of which are in cache) are executed at the same time, with cache full.
It is possible that all of them will remove the same entry from the cache, and then add their own entries. Then size of the cache will exceed self.max_capacity, so we will never remove from cache again.
First fix that I thought of is to change "==" to "<=". With that fix, we would still be able to remove elements from the map after race happens. Note that we only remove an element when inserting a new element, so this would still never decrease the cache overflow unless 2 cache misses with the same query happen at the same time. So the memory leak would still happen, but much slower.
So in addition to that I changed "if cache full: remove" to "while cache full: remove". In some cases it could lead to some queries getting evicted for no reason, hurting performance. There is also slim possibility of some thread getting starved for a bit. Those issues seem not very probable, so maybe it is better to risk them than the memory leak? I'm honestly not sure. Maybe we should remove the loop and only keep "<=".
I also thought of checking cache size in the happy path, but len() is not a simple operation in dashmap - it goes through shards and sums their lengths. I didn't benchmark it, but I'm afraid it could hurt performance.
Maybe in the future we should research how dedicated cache libraries are approaching this?
Fixes: #1420
Pre-review checklist
I added relevant tests for new features and bug fixes.- Writing tests for that seems quite difficult...I have provided docstrings for the public items that I want to introduce.I have adjusted the documentation in./docs/source/
.Fixes:
annotations to PR description.