-
Notifications
You must be signed in to change notification settings - Fork 7
Primitives, Components and Interfaces Design #29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
greged93
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just some thoughts related to the codec types and the input to the derivation pipeline
| /// This is used as input for the derivation pipeline. All data remains in its raw serialized form. | ||
| /// The data is then deserialized, enriched and processed in the derivation pipeline. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have thought a little more about this: don't you think we also need to have the calldata decoding inside of the pipeline? This way the pipeline receives only the calldata to the commit transaction, decodes the batch header and chunks, then proceeds on to decoding the chunk data and fetching the possible blob data for transactions if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the way we should authenticate the batch data is by computing the batch hash in the pipeline. Why do you think we should decode the calldata in the pipeline?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So because we commit to the batch in the contract via its hash, is that why you would skip the decoding of the call data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the public input to the prover is the batch hash so it's primarily driven by proving requirements. The batch hash acts as the commitment to the batch data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been thinking about this a little bit. I think the most future-proof and maintainable way is to not expose any details about internals of a Codec version, the way to read it (e.g. some calldata is required) and batch structure (e.g. chunks and skipped message bitmap).
Otherwise we will end up with lots of structs that represent different versions of the same thing e.g. BatchInputDataV1, BatchInputDataV2. This basically leaks implementation details of the pipeline into the rest of the node. Also this would need major restructuring if we have a new version where BatchInput does not really apply anymore.
From a high-level the derivation pipeline should just take some L1 inputs and produce PayloadAttributes. What exactly happens in between if there's batches, chunks, bundles, should not matter from the perspective of the rollup node overall.
If pipeline behavior is similar enough we can add different versions to the same pipeline instance with some type/version switches here and there. But this is already starting to get messy in l2geth:
- type switch for creating different "raw" data, comparable to
BatchInputV1andBatchInputV2. as you can see commit data is required. - for reading the commit data we need to support multiple different function signatures etc: https://github.com/scroll-tech/go-ethereum/blob/40ebbd6491f2a3f604d835a37c7519df643de113/rollup/l1/reader.go#L386
- another type switch to distinguish how to handle certain rollup events or different revert events
If they diverge too much, I think it would be best to have a separate pipeline implementation for a specific version. That includes reading calldata, blobs and creating a "batch" and "chunks" if that is the entity within the specific codec/protocol version.
| pub struct Batch { | ||
| /// The index of the batch. | ||
| pub index: u64, | ||
| /// The total number of L1 messages popped before this batch. | ||
| pub total_l1_messages_popped_before: u64, | ||
| /// The hash of the parent batch. | ||
| pub parent_hash: B256, | ||
| /// The chunks in the batch. | ||
| pub chunks: Vec<Chunk>, | ||
| /// The hash of the L1 message queue before the batch. | ||
| pub prev_l1_message_queue_hash: B256, | ||
| /// The hash of the L1 message queue after the batch. | ||
| pub post_l1_message_queue_hash: B256, | ||
| /// The block commitments in the batch. | ||
| pub blocks: Vec<BlockCommitment>, | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to confirm with @jonastheis but afaiu, the types in da.go are not used for the decoding but for the construction of the blob and/or calldata. I think we should be using the DAChunkRawTx for the chunk and the daBatchV0, V1, V3 and V7.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Let's wait for input from @jonastheis, and I will also review myself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is correct the types in da.go are only required for creating batches/blobs, its calldata and preparation for the provers.
I think that every rollup node should ultimately support batch submission to increase decentralization and have an easy fail-over in case the centralized operator is down. Or as a supporting mechanism for subcommitments (so that a node is able to commit the seen data itself autonomously). For now we can ignore this part though.
From the Codec interface only following is required:

Some of these methods like DecodeDAChunksRawTx and DecodeTxsFromBlob are already technical debt because they are only needed in older versions. V7 uses DecodeBlob.
| /// A pipeline for processing batch inputs and producing scroll payloads. | ||
| #[derive(Debug)] | ||
| pub struct Pipeline; | ||
|
|
||
| impl Pipeline { | ||
| /// Creates a new [`Pipeline`] instance. | ||
| pub fn new() -> Self { | ||
| Pipeline | ||
| } | ||
|
|
||
| /// Handles a batch input. | ||
| pub fn handle_batch_input(&mut self, _batch_input: BatchInput) { | ||
| // Handle the batch input. | ||
| todo!() | ||
| } | ||
|
|
||
| /// Gets the next scroll payload. | ||
| pub fn next(&mut self) -> Option<ScrollPayloadAttributes> { | ||
| // Get the next scroll payload. | ||
| todo!() | ||
| } | ||
| } | ||
|
|
||
| impl Stream for Pipeline { | ||
| type Item = ScrollPayloadAttributes; | ||
|
|
||
| fn poll_next( | ||
| self: std::pin::Pin<&mut Self>, | ||
| cx: &mut std::task::Context<'_>, | ||
| ) -> std::task::Poll<Option<Self::Item>> { | ||
| let this = self.get_mut(); | ||
|
|
||
| todo!() | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, I will provide some pseudocode and let Peter review as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is more network configuration no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we should probably move this.
| impl Indexer { | ||
| /// Handles an event from the L1. | ||
| pub async fn handle_l1_event(&mut self, event: L1Event) { | ||
| match event { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CommitBatch and L1Message are specific events to Scroll and the others are general L1 "events". Is this correct?
What about Revert and Finalize events for batches?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we will need to index Finalize to determine the rollup finality status.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this is such a good idea to add the rollup events here.
When having a new version: If we add or change the events intrinsic meaning, even if we remove an event we will always need to support all of this here to be able to sync from genesis.
I still think we should try to encapsulate all the derivation specifics as much as possible and push them into a specific derivation pipeline implementation. We could have derivation pipeline implementations for different versions of the protocol/codec.
This way the general node framework stays the same:
- notify of L1 updates (and maybe supply data in a very general way. but this a bit tricky as we don't know what exact data is required, so we need to enable the derivation pipeline to specify what events and data it is interested in without decoding the data)
- derivation pipeline does all the specifics
| /// The withdrawal root of the block commitment. | ||
| pub withdraw_root: B256, | ||
| /// The prover row consumption of the block commitment. | ||
| pub row_consumption: RowConsumption, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iiuc RowConsumption is not needed anymore after EuclidV1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@greged93 keep this in mind as you work on the derivation pipeline.
| /// This is used as input for the derivation pipeline. All data remains in its raw serialized form. | ||
| /// The data is then deserialized, enriched and processed in the derivation pipeline. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been thinking about this a little bit. I think the most future-proof and maintainable way is to not expose any details about internals of a Codec version, the way to read it (e.g. some calldata is required) and batch structure (e.g. chunks and skipped message bitmap).
Otherwise we will end up with lots of structs that represent different versions of the same thing e.g. BatchInputDataV1, BatchInputDataV2. This basically leaks implementation details of the pipeline into the rest of the node. Also this would need major restructuring if we have a new version where BatchInput does not really apply anymore.
From a high-level the derivation pipeline should just take some L1 inputs and produce PayloadAttributes. What exactly happens in between if there's batches, chunks, bundles, should not matter from the perspective of the rollup node overall.
If pipeline behavior is similar enough we can add different versions to the same pipeline instance with some type/version switches here and there. But this is already starting to get messy in l2geth:
- type switch for creating different "raw" data, comparable to
BatchInputV1andBatchInputV2. as you can see commit data is required. - for reading the commit data we need to support multiple different function signatures etc: https://github.com/scroll-tech/go-ethereum/blob/40ebbd6491f2a3f604d835a37c7519df643de113/rollup/l1/reader.go#L386
- another type switch to distinguish how to handle certain rollup events or different revert events
If they diverge too much, I think it would be best to have a separate pipeline implementation for a specific version. That includes reading calldata, blobs and creating a "batch" and "chunks" if that is the entity within the specific codec/protocol version.
| pub struct Batch { | ||
| /// The index of the batch. | ||
| pub index: u64, | ||
| /// The total number of L1 messages popped before this batch. | ||
| pub total_l1_messages_popped_before: u64, | ||
| /// The hash of the parent batch. | ||
| pub parent_hash: B256, | ||
| /// The chunks in the batch. | ||
| pub chunks: Vec<Chunk>, | ||
| /// The hash of the L1 message queue before the batch. | ||
| pub prev_l1_message_queue_hash: B256, | ||
| /// The hash of the L1 message queue after the batch. | ||
| pub post_l1_message_queue_hash: B256, | ||
| /// The block commitments in the batch. | ||
| pub blocks: Vec<BlockCommitment>, | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is correct the types in da.go are only required for creating batches/blobs, its calldata and preparation for the provers.
I think that every rollup node should ultimately support batch submission to increase decentralization and have an easy fail-over in case the centralized operator is down. Or as a supporting mechanism for subcommitments (so that a node is able to commit the seen data itself autonomously). For now we can ignore this part though.
From the Codec interface only following is required:

Some of these methods like DecodeDAChunksRawTx and DecodeTxsFromBlob are already technical debt because they are only needed in older versions. V7 uses DecodeBlob.
Overview
This PR outlines the additional crates, components, interfaces, and data structures that will be used for Milestones 3 and 4. This will provide a holistic view of how the rollup node should be structured. We can then provide the concrete implementations in future PRs.