Skip to content

Conversation

michaelsproul
Copy link
Member

@michaelsproul michaelsproul commented Oct 9, 2025

Issue Addressed

Partly addresses:

Proposed Changes

Use an Arc to avoid holding the state_cache lock while HDiffBuffer::from_state runs. This conversion function can take up to 1s, so it's good if we can release the lock while it runs. This can unblock other threads waiting on the lock.

@michaelsproul michaelsproul added optimization Something to make Lighthouse run more efficiently. database tree-states Ongoing state and database overhaul labels Oct 9, 2025
return Some(buffer);
}
metrics::inc_counter_vec(&metrics::STORE_BEACON_HDIFF_BUFFER_CACHE_MISS, HOT_METRIC);
None
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could consider deleting the once cell in this case, as we've created an entry and then left it empty. Alternatively we could return it for the caller to fill in once they load the state from disk.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's problematic to populate the LRU cache with empty cells, it can cause it to evict valuable entries for no reason. I have this commit where we only insert if necessary without much more complexity b6bd9b3

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even better, I think we can just remove the OnceCell. I did this in this commit: a3d5354.

The reason being, the buffer cache is only meant to contain states prior to finalization, so we should not be priming it after loading a full state (which would be post finalization). In future we may alter this paradigm, but for now, just an Arc is enough.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/// Cache of hdiff buffers for hot states.
///
/// This cache only keeps buffers prior to the finalized state, which are required by the
/// hierarchical state diff scheme to construct newer unfinalized states.
///
/// The cache always retains the hdiff buffer for the most recent snapshot so that even if the
/// cache capacity is 1, this snapshot never needs to be loaded from disk.
#[derive(Debug)]
pub struct HotHDiffBufferCache {

@jimmygchen jimmygchen mentioned this pull request Oct 10, 2025
2 tasks
jimmygchen added a commit to jimmygchen/lighthouse that referenced this pull request Oct 10, 2025
commit b1816d0
Author: Michael Sproul <[email protected]>
Date:   Thu Oct 9 13:15:48 2025 +1100

    Use OnceCell in HotHDiffBufferCache
let timer =
metrics::start_timer_vec(&metrics::BEACON_HDIFF_BUFFER_CLONE_TIME, HOT_METRIC);
let result = Some(buffer.clone());
drop(timer);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary to drop timer explicitly here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opinionated, but it's easier to read to me and less lines using if else. I have the diff in this commit 9209c57

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess technically not. The timer is meant to measure just the clone, which is why it's written like this. With your change, we'll also measure the function post-amble where all of the local variables get dropped. However this should be completely negligible in terms of time, so I think it's fine

if self.hdiff_buffers.len() != self.hdiff_buffers.cap().get() {
self.hdiff_buffers.put(state_root, (slot, buffer));
self.hdiff_buffers
.put(state_root, Arc::new(OnceCell::with_value((slot, buffer))));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a race condition here

  • thread 1: get or insert: insert empty one
  • thread 1: clone new once cell
  • thread 1: start computing buffer
  • thread 2: put buffer, replacing previous created once cell and wasting its compute
  • thread 1: completes computing the buffer and the once cell is dropped

I don't feel it's a serious issue but noting it here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gone with removal of OnceCell, I think?

return Some(buffer);
}
metrics::inc_counter_vec(&metrics::STORE_BEACON_HDIFF_BUFFER_CACHE_MISS, HOT_METRIC);
None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's problematic to populate the LRU cache with empty cells, it can cause it to evict valuable entries for no reason. I have this commit where we only insert if necessary without much more complexity b6bd9b3

@michaelsproul michaelsproul changed the title Use OnceCell in HotHDiffBufferCache Use Arc in HotHDiffBufferCache to avoid holding cache lock Oct 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

database optimization Something to make Lighthouse run more efficiently. tree-states Ongoing state and database overhaul

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants