-
Notifications
You must be signed in to change notification settings - Fork 925
Use Arc in HotHDiffBufferCache to avoid holding cache lock #8176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: unstable
Are you sure you want to change the base?
Conversation
beacon_node/store/src/state_cache.rs
Outdated
return Some(buffer); | ||
} | ||
metrics::inc_counter_vec(&metrics::STORE_BEACON_HDIFF_BUFFER_CACHE_MISS, HOT_METRIC); | ||
None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could consider deleting the once cell in this case, as we've created an entry and then left it empty. Alternatively we could return it for the caller to fill in once they load the state from disk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's problematic to populate the LRU cache with empty cells, it can cause it to evict valuable entries for no reason. I have this commit where we only insert if necessary without much more complexity b6bd9b3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even better, I think we can just remove the OnceCell
. I did this in this commit: a3d5354.
The reason being, the buffer cache is only meant to contain states prior to finalization, so we should not be priming it after loading a full state (which would be post finalization). In future we may alter this paradigm, but for now, just an Arc
is enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lighthouse/beacon_node/store/src/state_cache.rs
Lines 58 to 66 in a3d5354
/// Cache of hdiff buffers for hot states. | |
/// | |
/// This cache only keeps buffers prior to the finalized state, which are required by the | |
/// hierarchical state diff scheme to construct newer unfinalized states. | |
/// | |
/// The cache always retains the hdiff buffer for the most recent snapshot so that even if the | |
/// cache capacity is 1, this snapshot never needs to be loaded from disk. | |
#[derive(Debug)] | |
pub struct HotHDiffBufferCache { |
commit b1816d0 Author: Michael Sproul <[email protected]> Date: Thu Oct 9 13:15:48 2025 +1100 Use OnceCell in HotHDiffBufferCache
beacon_node/store/src/state_cache.rs
Outdated
let timer = | ||
metrics::start_timer_vec(&metrics::BEACON_HDIFF_BUFFER_CLONE_TIME, HOT_METRIC); | ||
let result = Some(buffer.clone()); | ||
drop(timer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it necessary to drop timer explicitly here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opinionated, but it's easier to read to me and less lines using if else
. I have the diff in this commit 9209c57
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess technically not. The timer is meant to measure just the clone, which is why it's written like this. With your change, we'll also measure the function post-amble where all of the local variables get dropped. However this should be completely negligible in terms of time, so I think it's fine
beacon_node/store/src/state_cache.rs
Outdated
if self.hdiff_buffers.len() != self.hdiff_buffers.cap().get() { | ||
self.hdiff_buffers.put(state_root, (slot, buffer)); | ||
self.hdiff_buffers | ||
.put(state_root, Arc::new(OnceCell::with_value((slot, buffer)))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a race condition here
- thread 1: get or insert: insert empty one
- thread 1: clone new once cell
- thread 1: start computing buffer
- thread 2: put buffer, replacing previous created once cell and wasting its compute
- thread 1: completes computing the buffer and the once cell is dropped
I don't feel it's a serious issue but noting it here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gone with removal of OnceCell, I think?
beacon_node/store/src/state_cache.rs
Outdated
return Some(buffer); | ||
} | ||
metrics::inc_counter_vec(&metrics::STORE_BEACON_HDIFF_BUFFER_CACHE_MISS, HOT_METRIC); | ||
None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's problematic to populate the LRU cache with empty cells, it can cause it to evict valuable entries for no reason. I have this commit where we only insert if necessary without much more complexity b6bd9b3
Issue Addressed
Partly addresses:
Proposed Changes
Use an
Arc
to avoid holding thestate_cache
lock whileHDiffBuffer::from_state
runs. This conversion function can take up to 1s, so it's good if we can release the lock while it runs. This can unblock other threads waiting on the lock.