Skip to content

Commit 918121e

Browse files
Fix bugs in rebasing of states prior to finalization (#7849)
Attempt to fix this error reported by `beaconcha.in` on their Hoodi archive nodes: > {"code":500,"message":"UNHANDLED_ERROR: DBError(CacheBuildError(BeaconState(MilhouseError(OutOfBoundsIterFrom { index: 1199549, len: 1060000 }))))","stacktraces":[]} There are only a handful of places where we call `iter_from`. This one is safe by construction (the check immediately prior ensures `self.pubkeys.len()` is not out of bounds): https://github.com/sigp/lighthouse/blob/cfb1f7331064b758c6786e4e1dc15507af5ff5d1/beacon_node/beacon_chain/src/validator_pubkey_cache.rs#L84-L90 This one should also be safe, and the indexes used here would not be as large as the ones in the reported error: https://github.com/sigp/lighthouse/blob/cfb1f7331064b758c6786e4e1dc15507af5ff5d1/consensus/state_processing/src/per_epoch_processing/single_pass.rs#L365-L368 Which leaves one remaining usage which must be the culprit: https://github.com/sigp/lighthouse/blob/cfb1f7331064b758c6786e4e1dc15507af5ff5d1/consensus/types/src/beacon_state.rs#L2109-L2113 This indexing relies on the invariant that `self.pubkey_cache().len() <= self.validators.len()`. We mostly maintain that invariant, except for in `rebase_caches_on` (fixed in this PR). The other bug, is that we were calling `rebase_on_finalized` for all "hot" states, which post-v7.1.0 includes states prior to the split which are required by the hdiff grid. This is how we end up calling something like `genesis_state.rebase_on(&split_state)`, which then corrupts the pubkey cache of the genesis state using the newer pubkey cache from the split state.
1 parent 80ba0b1 commit 918121e

File tree

2 files changed

+14
-3
lines changed

2 files changed

+14
-3
lines changed

beacon_node/store/src/state_cache.rs

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -186,8 +186,13 @@ impl<E: EthSpec> StateCache<E> {
186186
state: &mut BeaconState<E>,
187187
spec: &ChainSpec,
188188
) -> Result<(), Error> {
189+
// Do not attempt to rebase states prior to the finalized state. This method might be called
190+
// with states on the hdiff grid prior to finalization, as part of the reconstruction of
191+
// some later unfinalized state.
189192
if let Some(finalized_state) = &self.finalized_state {
190-
state.rebase_on(&finalized_state.state, spec)?;
193+
if state.slot() >= finalized_state.state.slot() {
194+
state.rebase_on(&finalized_state.state, spec)?;
195+
}
191196
}
192197
Ok(())
193198
}

consensus/types/src/beacon_state.rs

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2535,11 +2535,17 @@ impl<E: EthSpec> BeaconState<E> {
25352535

25362536
pub fn rebase_caches_on(&mut self, base: &Self, spec: &ChainSpec) -> Result<(), Error> {
25372537
// Use pubkey cache from `base` if it contains superior information (likely if our cache is
2538-
// uninitialized).
2538+
// uninitialized). Be careful not to use a cache which has *more* validators than expected,
2539+
// as other code expects `self.pubkey_cache().len() <= self.validators.len()`.
25392540
let num_validators = self.validators().len();
25402541
let pubkey_cache = self.pubkey_cache_mut();
25412542
let base_pubkey_cache = base.pubkey_cache();
2542-
if pubkey_cache.len() < base_pubkey_cache.len() && pubkey_cache.len() < num_validators {
2543+
2544+
let current_cache_is_incomplete = pubkey_cache.len() < num_validators;
2545+
let base_cache_is_compatible = base_pubkey_cache.len() <= num_validators;
2546+
let base_cache_is_superior = base_pubkey_cache.len() > pubkey_cache.len();
2547+
2548+
if current_cache_is_incomplete && base_cache_is_compatible && base_cache_is_superior {
25432549
*pubkey_cache = base_pubkey_cache.clone();
25442550
}
25452551

0 commit comments

Comments
 (0)