feat: enforce "one BlockComponent per CompletedRange" + replay#575
feat: enforce "one BlockComponent per CompletedRange" + replay#575ksn6 merged 6 commits intoanza-xyz:masterfrom
BlockComponent per CompletedRange" + replay#575Conversation
cac4269 to
02ace98
Compare
1543b08 to
5ffab6a
Compare
fd1c1c9 to
6713039
Compare
ledger/src/blockstore_processor.rs
Outdated
| if bank.parent_slot() >= first_alpenglow_slot { | ||
| let mut verifier = bank.block_component_verifier.write().unwrap(); | ||
| for marker in slot_components.iter().filter_map(|bc| bc.as_marker()) { | ||
| verifier.on_marker( |
There was a problem hiding this comment.
How will this work for UpdateParent? I'm thinking we should keep the ordering relative to the entries when processing
There was a problem hiding this comment.
UpdateParent won't have any processing in block_component_verifier.rs; all UpdateParent-related checks will happen on shred ingest.
I'm thinking we should keep the ordering relative to the entries when processing
not sure I follow
There was a problem hiding this comment.
ah right, that includes killing this bank if needed right?
And for the second point, maybe it doesn't matter for the current BlockMarkers e.g. header / footer, but if we had new block markers that changed some sort of bank state it seems the state change would be non-deterministic based on how the completed ranges filled in?
There was a problem hiding this comment.
ah right, that includes killing this bank if needed right?
correct; it's unclear as to whether we'll want to have the direct invocation to kill the bank in blockstore, although blockstore would certainly initiate some sort of signal to kill the bank
And for the second point, maybe it doesn't matter for the current BlockMarkers e.g. header / footer, but if we had new block markers that changed some sort of bank state it seems the state change would be non-deterministic based on how the completed ranges filled in?
it's case-dependent, though we could abstractly consider:
-
ensuring that
BlockComponentVerifier's state is identical irrespective of the order in which block components are added (this property is currently true) -
placing bank-modifying logic in the
finish()function, which purely operates onBlockComponentVerifier's state and is currently invoked right before the bank is frozen
this is unless we don't want to just process the new bank-modifying block component on shred ingest
There was a problem hiding this comment.
My suggestion would be to process the block marker inline, also ensures that we don't have to wait until the end to throw away a block if some marker is invalid
There was a problem hiding this comment.
not sure what you mean by "process the block marker inline" - do you mean "online?"
if so, BlockComponentVerifier accomplishes this via on_marker(), which is invoked in blockstore_processor.rs; i.e., at the first sight of a mistake (e.g., a double header / footer), we error out
There was a problem hiding this comment.
Yeah sorry online.
Basically what i'm saying is assume you have FEC sets that look like this:
BlockMarker1 -> EntryBatch1 -> BlockMarker2 -> EntryBatch2
It seems we could have all sort of execution depending on the completed data ranges at the time.
I would expect to process these in the order that they appear in the block but:
- If I receive these FEC sets one at a time this is true
- If I receive all 4 FEC sets, the code written would process BM1 -> BM2 -> EB1 -> EB2
Again maybe this is not an issue because for the current BlockMarkers order doesn't matter, but I think it's a brittle assumption. For example I'm in the process of adding a new BlockMarker for migration that stores the Alpenglow genesis information in an onchain account - If EB1 interacts with this account then we could have divergent behavior
There was a problem hiding this comment.
nice, makes sense! just fixed this
akhi3030
left a comment
There was a problem hiding this comment.
i haven't wrapped my head around the entire PR but some minor feedback.
e6c7b66 to
9b5361d
Compare
e65561a to
12982ff
Compare
|
for readers, to address #575 (comment) easily, we'll be modifying solana-foundation/solana-improvement-documents#337 to enforce that one this is equivalent to saying that the |
d7950fa to
2daccd0
Compare
| }) | ||
| .map_err(|e| { | ||
| let components = | ||
| BlockComponent::from_bytes_multiple(&payload).map_err(|e| { |
There was a problem hiding this comment.
We can entirely remove from_bytes_multiple here and just go with from_bytes. This PR is getting large, though - will do this in a separate PR, as indicated in the below TODO.
BlockComponents during replayBlockComponent per CompletedRange" + replay
BlockComponent per CompletedRange" + replayBlockComponent per CompletedRange" + introduce BlockComponentProcessor
BlockComponent per CompletedRange" + introduce BlockComponentProcessorBlockComponent per CompletedRange" + replay
| .block_component_processor | ||
| .read() | ||
| .unwrap() | ||
| .finish(migration_status) |
There was a problem hiding this comment.
nit: I don't like the name finish, it sounds like you are trying to finish migration. But you are really checking whether the block format is consistent with migration status right?
There was a problem hiding this comment.
given a block, BlockComponentProcessor accomplishes two things:
-
returns an error, resulting in the block marked dead, if the block violates an invariant; sample invariants include # of block headers == 1, # of block footers == 1, etc. a follow-up PR will enforce the clock checks, e.g.
-
soon, apply state changes that certain block components opine on; e.g., update account rewards, the clock sysvar, etc.
the finish() function reasons about the sufficient statistics gathered by the BlockComponentProcessor during accumulation in confirm_slot, and helps accomplish the two above goals. for now, all we do is to enforce that both a block header and footer are present.
right now, the specific invariants we enforce depend on migration_status, which is why we need to pass migration_status in; this being said, I think the name is reasonable, since we'll eventually be throwing migration_status entirely away.
There was a problem hiding this comment.
It is okay for now, but it gets tricky when we apply state changes I guess.
| }) | ||
| })?; | ||
|
|
||
| // Enforce that completed ranges have precisely one BlockComponent. |
There was a problem hiding this comment.
nit: can you clarify how the caller can ensure this is true? And it should be specified in the doc as well.
There was a problem hiding this comment.
once this PR lands, I'll be modifying SIMD-0337 to ensure that this statement is true
in a follow-up PR, I'll be replacing the block component parsing with parsing a single component rather than a Vec<BlockComponent> and ensuring that the length is 1
| // this assumption. | ||
| if components.len() != 1 { | ||
| return Err(BlockstoreError::InvalidShredData(Box::new( | ||
| bincode::ErrorKind::Custom(format!( |
There was a problem hiding this comment.
nit: Is it worth introducing new error type rather than fitting into this one?
There was a problem hiding this comment.
this code is going away in a follow-up PR; see #575 (comment)
| /// completed_ranges = [..., (s_i..e_i), (s_i+1..e_i+1), ...] | ||
| /// Then, the following statements are true: | ||
| /// s_i < e_i == s_i+1 < e_i+1 | ||
| fn get_slot_components_in_block( |
There was a problem hiding this comment.
nit: explain the difference from get_slot_entries_in_block maybe?
There was a problem hiding this comment.
could you clarify? return type for getting slot components is Result<Vec<BlockComponent>> while the that of entries is Result<Vec<Entry>>; I think the comments elaborate a bit further on this
There was a problem hiding this comment.
I mean, we now seem to almost identical comments between the two functions, would be good to mention difference, but I'm fine either way.
| fn get_slot_entries_in_block( | ||
| &self, | ||
| slot: Slot, | ||
| completed_ranges: &CompletedRanges, |
There was a problem hiding this comment.
Why are we changing CompletedRanges to &CompletedRanges here?
There was a problem hiding this comment.
get_slot_components_with_shred_info returns an owned completed_ranges, but also needs to invoke get_slot_components_in_block, so get_slot_components_in_block must take in &CompletedRanges
get_slot_entries_in_block invokes get_slot_components_in_block, so we might as well pass in a ref rather than an owned version of CompletedRanges, since an owned version here is more restrictive unnecessarily
|
|
||
| // Verify and process block components (e.g., header, footer) before freezing | ||
| // Only verify blocks that were replayed from blockstore (not leader blocks) | ||
| if !is_leader_block { |
There was a problem hiding this comment.
Is there ever a case I should pay attention to my own block_components?
For example, a standby sending out block in my name, putting in block component I don't agree with?
There was a problem hiding this comment.
no reason to do this - if a validator wishes to modify block component dissemination, they should be doing so either in block creation loop or in broadcast
could you clarify what you mean by "standby sending out block in my name?"
There was a problem hiding this comment.
Let's say I run two validators A and B, A is B's standby, now I'm switching identity so A becomes the active one, but it's still processing blocks produced by B, in this case do I need to process the block_components for the blocks owned by "me"?
| if self.has_footer { | ||
| return Err(BlockComponentProcessorError::MultipleBlockFooters); | ||
| } | ||
|
|
There was a problem hiding this comment.
Where do we plan to put the parsed content?
There was a problem hiding this comment.
wdym by the parsed content?
eventually, on_footer will be the place where process rewards, address the clock, etc.
There was a problem hiding this comment.
I mean, the certs in blocks, in the future how do I access it?
self.bank.block_components().... ?
| self.num_slots_broadcasted += 1; | ||
|
|
||
| true | ||
| self.migration_status.is_alpenglow_enabled() |
There was a problem hiding this comment.
nit: the above code seems to not be too related to send_header calculation.
There was a problem hiding this comment.
it is related, right? we only send headers if alpenglow is enabled
| } | ||
|
|
||
| impl BlockComponentProcessor { | ||
| pub fn finish( |
There was a problem hiding this comment.
So rn we have to wait until the block has completed replay in order to do these checks.
A suggestion to make this more online:
- add an
on_entry_batchwhich is called right beforeconfirm_slot_entries confirm_slothas a MigrationStatus available for you to pass in toon_marker/on_entry_batch- If
on_entry_batchis called beforeself.has_header, we immediately know there's no header and can die - If
on_entry_batchis called withslot_is_fulland!self.has_footer, we immediately know there's no footer and can die
So BlockComponentProcessor processes entry batches and markers online, and we don't need to separately call finish in replay. lmk if that works
There was a problem hiding this comment.
this is a great suggestion; in the interest of parallelization, though, I'd like to implement this in a follow-up PR (I promise to get this out today!)
| Ok(()) | ||
| } | ||
|
|
||
| pub fn on_marker( |
There was a problem hiding this comment.
Don't feel too strongly about this but for the transaction processor we invoke through bank rather than exposing to blockstore_processor directly, e.g:
Lines 3278 to 3283 in 2a55cd1
Could be nice to follow the same style for marker processing, and then blockstore_processor doesn't have to know anything about finding the parent bank or grabbing the write lock.
There was a problem hiding this comment.
Makes it easier once we get to implementing the footer logic, by invoking through bank the code might look cleaner for updating accounts and bls verification.
There was a problem hiding this comment.
yep, makes sense - could we do this in a follow-up PR rather than in this one?
78afa38 to
abb42d5
Compare
#### Problem and Summary of Changes Right now, we need to complete replay to verify that `BlockComponent`s in a block satisfy a number of invariants (e.g., exactly one header, exactly one footer, etc.). @AshwinSekar found a clever way to avoid this: #575 (comment). Implement the above.
| if !is_leader_block { | ||
| if let Err(err) = bank | ||
| .block_component_processor | ||
| .read() | ||
| .unwrap() | ||
| .finish(migration_status) | ||
| { | ||
| warn!("Block component processing failed for slot {bank_slot}: {err:?}",); | ||
| let root = bank_forks.read().unwrap().root(); | ||
| Self::mark_dead_slot( | ||
| blockstore, | ||
| bank, | ||
| root, | ||
| &BlockstoreProcessorError::BlockComponentProcessor(err), | ||
| rpc_subscriptions, | ||
| slot_status_notifier, | ||
| progress, | ||
| duplicate_slots_to_repair, | ||
| ancestor_hashes_replay_update_sender, | ||
| purge_repair_slot_counter, | ||
| &mut tbft_structs, | ||
| ); | ||
| continue; | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
is there any reason not to move this check right after https://github.com/ksn6/alpenglow/blob/abb42d57cf609c64823b53ef39888e7aee001a0b/core/src/replay_stage.rs#L3606-L3639
That way all the failures are directly blocked one after another and don't have to reason about whether logic in between like https://github.com/ksn6/alpenglow/blob/abb42d57cf609c64823b53ef39888e7aee001a0b/core/src/replay_stage.rs#L3675-L3691 is necessary
Could even group the failure checks into a single function check_block_failures
There was a problem hiding this comment.
ashwin came up with this nice idea to entirely eliminate the finish() function; we pushed this a few days ago:
We parse `BlockComponent`s in replay and enforce the following checks: - Every block has a block header - Every block has a block footer
…za-xyz#594) Right now, we need to complete replay to verify that `BlockComponent`s in a block satisfy a number of invariants (e.g., exactly one header, exactly one footer, etc.). @AshwinSekar found a clever way to avoid this: anza-xyz#575 (comment). Implement the above.
Problem and Summary of Changes
We:
BlockComponents andCompletedRangesBlockComponentProcessor, allowing us to parseBlockComponents and enforce consensus rules in replay