Skip to content

feat: enforce "one BlockComponent per CompletedRange" + replay#575

Merged
ksn6 merged 6 commits intoanza-xyz:masterfrom
ksn6:parse-block-components-during-replay
Nov 10, 2025
Merged

feat: enforce "one BlockComponent per CompletedRange" + replay#575
ksn6 merged 6 commits intoanza-xyz:masterfrom
ksn6:parse-block-components-during-replay

Conversation

@ksn6
Copy link
Copy Markdown
Contributor

@ksn6 ksn6 commented Oct 30, 2025

Problem and Summary of Changes

We:

  • Enforce a bijection between BlockComponents and CompletedRanges
  • Introduce BlockComponentProcessor, allowing us to parse BlockComponents and enforce consensus rules in replay

@ksn6 ksn6 force-pushed the parse-block-components-during-replay branch from cac4269 to 02ace98 Compare October 31, 2025 19:05
@ksn6 ksn6 force-pushed the parse-block-components-during-replay branch from 1543b08 to 5ffab6a Compare November 1, 2025 04:40
@ksn6 ksn6 force-pushed the parse-block-components-during-replay branch from fd1c1c9 to 6713039 Compare November 4, 2025 03:21
if bank.parent_slot() >= first_alpenglow_slot {
let mut verifier = bank.block_component_verifier.write().unwrap();
for marker in slot_components.iter().filter_map(|bc| bc.as_marker()) {
verifier.on_marker(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will this work for UpdateParent? I'm thinking we should keep the ordering relative to the entries when processing

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UpdateParent won't have any processing in block_component_verifier.rs; all UpdateParent-related checks will happen on shred ingest.

I'm thinking we should keep the ordering relative to the entries when processing

not sure I follow

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah right, that includes killing this bank if needed right?

And for the second point, maybe it doesn't matter for the current BlockMarkers e.g. header / footer, but if we had new block markers that changed some sort of bank state it seems the state change would be non-deterministic based on how the completed ranges filled in?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah right, that includes killing this bank if needed right?

correct; it's unclear as to whether we'll want to have the direct invocation to kill the bank in blockstore, although blockstore would certainly initiate some sort of signal to kill the bank

And for the second point, maybe it doesn't matter for the current BlockMarkers e.g. header / footer, but if we had new block markers that changed some sort of bank state it seems the state change would be non-deterministic based on how the completed ranges filled in?

it's case-dependent, though we could abstractly consider:

  • ensuring that BlockComponentVerifier's state is identical irrespective of the order in which block components are added (this property is currently true)

  • placing bank-modifying logic in the finish() function, which purely operates on BlockComponentVerifier's state and is currently invoked right before the bank is frozen

this is unless we don't want to just process the new bank-modifying block component on shred ingest

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My suggestion would be to process the block marker inline, also ensures that we don't have to wait until the end to throw away a block if some marker is invalid

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure what you mean by "process the block marker inline" - do you mean "online?"

if so, BlockComponentVerifier accomplishes this via on_marker(), which is invoked in blockstore_processor.rs; i.e., at the first sight of a mistake (e.g., a double header / footer), we error out

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah sorry online.
Basically what i'm saying is assume you have FEC sets that look like this:

BlockMarker1 -> EntryBatch1 -> BlockMarker2 -> EntryBatch2

It seems we could have all sort of execution depending on the completed data ranges at the time.
I would expect to process these in the order that they appear in the block but:

  • If I receive these FEC sets one at a time this is true
  • If I receive all 4 FEC sets, the code written would process BM1 -> BM2 -> EB1 -> EB2

Again maybe this is not an issue because for the current BlockMarkers order doesn't matter, but I think it's a brittle assumption. For example I'm in the process of adding a new BlockMarker for migration that stores the Alpenglow genesis information in an onchain account - If EB1 interacts with this account then we could have divergent behavior

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, makes sense! just fixed this

Copy link
Copy Markdown
Contributor

@akhi3030 akhi3030 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i haven't wrapped my head around the entire PR but some minor feedback.

@ksn6 ksn6 force-pushed the parse-block-components-during-replay branch 4 times, most recently from e6c7b66 to 9b5361d Compare November 7, 2025 05:28
@ksn6 ksn6 enabled auto-merge (squash) November 7, 2025 05:32
@ksn6 ksn6 force-pushed the parse-block-components-during-replay branch 3 times, most recently from e65561a to 12982ff Compare November 8, 2025 21:13
@ksn6
Copy link
Copy Markdown
Contributor Author

ksn6 commented Nov 8, 2025

for readers, to address #575 (comment) easily, we'll be modifying solana-foundation/solana-improvement-documents#337 to enforce that one CompletedRange exactly corresponds to one BlockComponent

this is equivalent to saying that the DATA_COMPLETE_SHRED flag on shreds serves as a delimiter between BlockComponent

@ksn6 ksn6 force-pushed the parse-block-components-during-replay branch from d7950fa to 2daccd0 Compare November 9, 2025 01:05
})
.map_err(|e| {
let components =
BlockComponent::from_bytes_multiple(&payload).map_err(|e| {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can entirely remove from_bytes_multiple here and just go with from_bytes. This PR is getting large, though - will do this in a separate PR, as indicated in the below TODO.

@ksn6 ksn6 changed the title feat: parse BlockComponents during replay feat: "one BlockComponent per CompletedRange" + replay Nov 9, 2025
@ksn6 ksn6 changed the title feat: "one BlockComponent per CompletedRange" + replay feat: enforce "one BlockComponent per CompletedRange" + introduce BlockComponentProcessor Nov 9, 2025
@ksn6 ksn6 changed the title feat: enforce "one BlockComponent per CompletedRange" + introduce BlockComponentProcessor feat: enforce "one BlockComponent per CompletedRange" + replay Nov 9, 2025
.block_component_processor
.read()
.unwrap()
.finish(migration_status)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I don't like the name finish, it sounds like you are trying to finish migration. But you are really checking whether the block format is consistent with migration status right?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given a block, BlockComponentProcessor accomplishes two things:

  • returns an error, resulting in the block marked dead, if the block violates an invariant; sample invariants include # of block headers == 1, # of block footers == 1, etc. a follow-up PR will enforce the clock checks, e.g.

  • soon, apply state changes that certain block components opine on; e.g., update account rewards, the clock sysvar, etc.

the finish() function reasons about the sufficient statistics gathered by the BlockComponentProcessor during accumulation in confirm_slot, and helps accomplish the two above goals. for now, all we do is to enforce that both a block header and footer are present.

right now, the specific invariants we enforce depend on migration_status, which is why we need to pass migration_status in; this being said, I think the name is reasonable, since we'll eventually be throwing migration_status entirely away.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is okay for now, but it gets tricky when we apply state changes I guess.

})
})?;

// Enforce that completed ranges have precisely one BlockComponent.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can you clarify how the caller can ensure this is true? And it should be specified in the doc as well.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

once this PR lands, I'll be modifying SIMD-0337 to ensure that this statement is true

in a follow-up PR, I'll be replacing the block component parsing with parsing a single component rather than a Vec<BlockComponent> and ensuring that the length is 1

// this assumption.
if components.len() != 1 {
return Err(BlockstoreError::InvalidShredData(Box::new(
bincode::ErrorKind::Custom(format!(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Is it worth introducing new error type rather than fitting into this one?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this code is going away in a follow-up PR; see #575 (comment)

/// completed_ranges = [..., (s_i..e_i), (s_i+1..e_i+1), ...]
/// Then, the following statements are true:
/// s_i < e_i == s_i+1 < e_i+1
fn get_slot_components_in_block(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: explain the difference from get_slot_entries_in_block maybe?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you clarify? return type for getting slot components is Result<Vec<BlockComponent>> while the that of entries is Result<Vec<Entry>>; I think the comments elaborate a bit further on this

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, we now seem to almost identical comments between the two functions, would be good to mention difference, but I'm fine either way.

fn get_slot_entries_in_block(
&self,
slot: Slot,
completed_ranges: &CompletedRanges,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we changing CompletedRanges to &CompletedRanges here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_slot_components_with_shred_info returns an owned completed_ranges, but also needs to invoke get_slot_components_in_block, so get_slot_components_in_block must take in &CompletedRanges

get_slot_entries_in_block invokes get_slot_components_in_block, so we might as well pass in a ref rather than an owned version of CompletedRanges, since an owned version here is more restrictive unnecessarily


// Verify and process block components (e.g., header, footer) before freezing
// Only verify blocks that were replayed from blockstore (not leader blocks)
if !is_leader_block {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there ever a case I should pay attention to my own block_components?

For example, a standby sending out block in my name, putting in block component I don't agree with?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no reason to do this - if a validator wishes to modify block component dissemination, they should be doing so either in block creation loop or in broadcast

could you clarify what you mean by "standby sending out block in my name?"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's say I run two validators A and B, A is B's standby, now I'm switching identity so A becomes the active one, but it's still processing blocks produced by B, in this case do I need to process the block_components for the blocks owned by "me"?

if self.has_footer {
return Err(BlockComponentProcessorError::MultipleBlockFooters);
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do we plan to put the parsed content?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wdym by the parsed content?

eventually, on_footer will be the place where process rewards, address the clock, etc.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, the certs in blocks, in the future how do I access it?
self.bank.block_components().... ?

self.num_slots_broadcasted += 1;

true
self.migration_status.is_alpenglow_enabled()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the above code seems to not be too related to send_header calculation.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is related, right? we only send headers if alpenglow is enabled

}

impl BlockComponentProcessor {
pub fn finish(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So rn we have to wait until the block has completed replay in order to do these checks.
A suggestion to make this more online:

  • add an on_entry_batch which is called right before confirm_slot_entries
  • confirm_slot has a MigrationStatus available for you to pass in to on_marker / on_entry_batch
  • If on_entry_batch is called before self.has_header, we immediately know there's no header and can die
  • If on_entry_batch is called with slot_is_full and !self.has_footer, we immediately know there's no footer and can die

So BlockComponentProcessor processes entry batches and markers online, and we don't need to separately call finish in replay. lmk if that works

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a great suggestion; in the interest of parallelization, though, I'd like to implement this in a follow-up PR (I promise to get this out today!)

Ok(())
}

pub fn on_marker(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't feel too strongly about this but for the transaction processor we invoke through bank rather than exposing to blockstore_processor directly, e.g:

alpenglow/runtime/src/bank.rs

Lines 3278 to 3283 in 2a55cd1

let sanitized_output = self
.transaction_processor
.load_and_execute_sanitized_transactions(
self,
sanitized_txs,
check_results,

Could be nice to follow the same style for marker processing, and then blockstore_processor doesn't have to know anything about finding the parent bank or grabbing the write lock.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes it easier once we get to implementing the footer logic, by invoking through bank the code might look cleaner for updating accounts and bls verification.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, makes sense - could we do this in a follow-up PR rather than in this one?

@ksn6 ksn6 force-pushed the parse-block-components-during-replay branch from 78afa38 to abb42d5 Compare November 10, 2025 16:52
@ksn6 ksn6 merged commit 7437655 into anza-xyz:master Nov 10, 2025
7 checks passed
@ksn6 ksn6 deleted the parse-block-components-during-replay branch November 10, 2025 17:35
ksn6 added a commit that referenced this pull request Nov 10, 2025
#### Problem and Summary of Changes
Right now, we need to complete replay to verify that `BlockComponent`s
in a block satisfy a number of invariants (e.g., exactly one header,
exactly one footer, etc.).

@AshwinSekar found a clever way to avoid this:
#575 (comment).

Implement the above.
Comment on lines +3695 to +3720
if !is_leader_block {
if let Err(err) = bank
.block_component_processor
.read()
.unwrap()
.finish(migration_status)
{
warn!("Block component processing failed for slot {bank_slot}: {err:?}",);
let root = bank_forks.read().unwrap().root();
Self::mark_dead_slot(
blockstore,
bank,
root,
&BlockstoreProcessorError::BlockComponentProcessor(err),
rpc_subscriptions,
slot_status_notifier,
progress,
duplicate_slots_to_repair,
ancestor_hashes_replay_update_sender,
purge_repair_slot_counter,
&mut tbft_structs,
);
continue;
}
}

Copy link
Copy Markdown
Contributor

@carllin carllin Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any reason not to move this check right after https://github.com/ksn6/alpenglow/blob/abb42d57cf609c64823b53ef39888e7aee001a0b/core/src/replay_stage.rs#L3606-L3639

That way all the failures are directly blocked one after another and don't have to reason about whether logic in between like https://github.com/ksn6/alpenglow/blob/abb42d57cf609c64823b53ef39888e7aee001a0b/core/src/replay_stage.rs#L3675-L3691 is necessary

Could even group the failure checks into a single function check_block_failures

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ashwin came up with this nice idea to entirely eliminate the finish() function; we pushed this a few days ago:

#594

AshwinSekar pushed a commit to AshwinSekar/alpenglow that referenced this pull request Feb 23, 2026
We parse `BlockComponent`s in replay and enforce the following checks:

- Every block has a block header
- Every block has a block footer
AshwinSekar pushed a commit to AshwinSekar/alpenglow that referenced this pull request Feb 23, 2026
…za-xyz#594)

Right now, we need to complete replay to verify that `BlockComponent`s
in a block satisfy a number of invariants (e.g., exactly one header,
exactly one footer, etc.).

@AshwinSekar found a clever way to avoid this:
anza-xyz#575 (comment).

Implement the above.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants