Skip to content

Conversation

@frisitano
Copy link
Collaborator

Overview

This PR outlines the additional crates, components, interfaces, and data structures that will be used for Milestones 3 and 4. This will provide a holistic view of how the rollup node should be structured. We can then provide the concrete implementations in future PRs.

Copy link
Contributor

@greged93 greged93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just some thoughts related to the codec types and the input to the derivation pipeline

Comment on lines +5 to +6
/// This is used as input for the derivation pipeline. All data remains in its raw serialized form.
/// The data is then deserialized, enriched and processed in the derivation pipeline.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have thought a little more about this: don't you think we also need to have the calldata decoding inside of the pipeline? This way the pipeline receives only the calldata to the commit transaction, decodes the batch header and chunks, then proceeds on to decoding the chunk data and fetching the possible blob data for transactions if needed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the way we should authenticate the batch data is by computing the batch hash in the pipeline. Why do you think we should decode the calldata in the pipeline?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So because we commit to the batch in the contract via its hash, is that why you would skip the decoding of the call data?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the public input to the prover is the batch hash so it's primarily driven by proving requirements. The batch hash acts as the commitment to the batch data.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been thinking about this a little bit. I think the most future-proof and maintainable way is to not expose any details about internals of a Codec version, the way to read it (e.g. some calldata is required) and batch structure (e.g. chunks and skipped message bitmap).

Otherwise we will end up with lots of structs that represent different versions of the same thing e.g. BatchInputDataV1, BatchInputDataV2. This basically leaks implementation details of the pipeline into the rest of the node. Also this would need major restructuring if we have a new version where BatchInput does not really apply anymore.

From a high-level the derivation pipeline should just take some L1 inputs and produce PayloadAttributes. What exactly happens in between if there's batches, chunks, bundles, should not matter from the perspective of the rollup node overall.

If pipeline behavior is similar enough we can add different versions to the same pipeline instance with some type/version switches here and there. But this is already starting to get messy in l2geth:

If they diverge too much, I think it would be best to have a separate pipeline implementation for a specific version. That includes reading calldata, blobs and creating a "batch" and "chunks" if that is the entity within the specific codec/protocol version.

Comment on lines +12 to +27
pub struct Batch {
/// The index of the batch.
pub index: u64,
/// The total number of L1 messages popped before this batch.
pub total_l1_messages_popped_before: u64,
/// The hash of the parent batch.
pub parent_hash: B256,
/// The chunks in the batch.
pub chunks: Vec<Chunk>,
/// The hash of the L1 message queue before the batch.
pub prev_l1_message_queue_hash: B256,
/// The hash of the L1 message queue after the batch.
pub post_l1_message_queue_hash: B256,
/// The block commitments in the batch.
pub blocks: Vec<BlockCommitment>,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to confirm with @jonastheis but afaiu, the types in da.go are not used for the decoding but for the construction of the blob and/or calldata. I think we should be using the DAChunkRawTx for the chunk and the daBatchV0, V1, V3 and V7.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Let's wait for input from @jonastheis, and I will also review myself.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is correct the types in da.go are only required for creating batches/blobs, its calldata and preparation for the provers.

I think that every rollup node should ultimately support batch submission to increase decentralization and have an easy fail-over in case the centralized operator is down. Or as a supporting mechanism for subcommitments (so that a node is able to commit the seen data itself autonomously). For now we can ignore this part though.

From the Codec interface only following is required:
image

Some of these methods like DecodeDAChunksRawTx and DecodeTxsFromBlob are already technical debt because they are only needed in older versions. V7 uses DecodeBlob.

Comment on lines +7 to +41
/// A pipeline for processing batch inputs and producing scroll payloads.
#[derive(Debug)]
pub struct Pipeline;

impl Pipeline {
/// Creates a new [`Pipeline`] instance.
pub fn new() -> Self {
Pipeline
}

/// Handles a batch input.
pub fn handle_batch_input(&mut self, _batch_input: BatchInput) {
// Handle the batch input.
todo!()
}

/// Gets the next scroll payload.
pub fn next(&mut self) -> Option<ScrollPayloadAttributes> {
// Get the next scroll payload.
todo!()
}
}

impl Stream for Pipeline {
type Item = ScrollPayloadAttributes;

fn poll_next(
self: std::pin::Pin<&mut Self>,
cx: &mut std::task::Context<'_>,
) -> std::task::Poll<Option<Self::Item>> {
let this = self.get_mut();

todo!()
}
}
Copy link
Collaborator Author

@frisitano frisitano Mar 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should refactor this interface as we discussed. @greged93 I have assigned #2 to you. We should redesign this interface, taking into account the insights provided by @Thegaram. We don't need full implementation now, but an API skeleton would be good.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, I will provide some pseudocode and let Peter review as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is more network configuration no?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we should probably move this.

impl Indexer {
/// Handles an event from the L1.
pub async fn handle_l1_event(&mut self, event: L1Event) {
match event {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CommitBatch and L1Message are specific events to Scroll and the others are general L1 "events". Is this correct?

What about Revert and Finalize events for batches?

Copy link
Collaborator Author

@frisitano frisitano Mar 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we will need to index Finalize to determine the rollup finality status.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is such a good idea to add the rollup events here.
When having a new version: If we add or change the events intrinsic meaning, even if we remove an event we will always need to support all of this here to be able to sync from genesis.

I still think we should try to encapsulate all the derivation specifics as much as possible and push them into a specific derivation pipeline implementation. We could have derivation pipeline implementations for different versions of the protocol/codec.

This way the general node framework stays the same:

  • notify of L1 updates (and maybe supply data in a very general way. but this a bit tricky as we don't know what exact data is required, so we need to enable the derivation pipeline to specify what events and data it is interested in without decoding the data)
  • derivation pipeline does all the specifics

/// The withdrawal root of the block commitment.
pub withdraw_root: B256,
/// The prover row consumption of the block commitment.
pub row_consumption: RowConsumption,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iiuc RowConsumption is not needed anymore after EuclidV1

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@greged93 keep this in mind as you work on the derivation pipeline.

Comment on lines +5 to +6
/// This is used as input for the derivation pipeline. All data remains in its raw serialized form.
/// The data is then deserialized, enriched and processed in the derivation pipeline.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been thinking about this a little bit. I think the most future-proof and maintainable way is to not expose any details about internals of a Codec version, the way to read it (e.g. some calldata is required) and batch structure (e.g. chunks and skipped message bitmap).

Otherwise we will end up with lots of structs that represent different versions of the same thing e.g. BatchInputDataV1, BatchInputDataV2. This basically leaks implementation details of the pipeline into the rest of the node. Also this would need major restructuring if we have a new version where BatchInput does not really apply anymore.

From a high-level the derivation pipeline should just take some L1 inputs and produce PayloadAttributes. What exactly happens in between if there's batches, chunks, bundles, should not matter from the perspective of the rollup node overall.

If pipeline behavior is similar enough we can add different versions to the same pipeline instance with some type/version switches here and there. But this is already starting to get messy in l2geth:

If they diverge too much, I think it would be best to have a separate pipeline implementation for a specific version. That includes reading calldata, blobs and creating a "batch" and "chunks" if that is the entity within the specific codec/protocol version.

Comment on lines +12 to +27
pub struct Batch {
/// The index of the batch.
pub index: u64,
/// The total number of L1 messages popped before this batch.
pub total_l1_messages_popped_before: u64,
/// The hash of the parent batch.
pub parent_hash: B256,
/// The chunks in the batch.
pub chunks: Vec<Chunk>,
/// The hash of the L1 message queue before the batch.
pub prev_l1_message_queue_hash: B256,
/// The hash of the L1 message queue after the batch.
pub post_l1_message_queue_hash: B256,
/// The block commitments in the batch.
pub blocks: Vec<BlockCommitment>,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is correct the types in da.go are only required for creating batches/blobs, its calldata and preparation for the provers.

I think that every rollup node should ultimately support batch submission to increase decentralization and have an easy fail-over in case the centralized operator is down. Or as a supporting mechanism for subcommitments (so that a node is able to commit the seen data itself autonomously). For now we can ignore this part though.

From the Codec interface only following is required:
image

Some of these methods like DecodeDAChunksRawTx and DecodeTxsFromBlob are already technical debt because they are only needed in older versions. V7 uses DecodeBlob.

@frisitano frisitano closed this Mar 13, 2025
@frisitano frisitano mentioned this pull request Apr 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants