-
Notifications
You must be signed in to change notification settings - Fork 162
feat(l2): enable distributed proving and multi-batch verification #6158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
verification in the proof sender and OnChainProposer contracts. The proof coordinator now tracks batch-to-prover assignments with timestamps using an in-memory map behind std::sync::Mutex (two-phase lock pattern to avoid holding locks across .await points). When a prover requests work, the coordinator assigns the first unassigned or timed-out batch, allowing multiple provers to work on different batches concurrently. Assignments are cleaned up when all proof types arrive for a batch. The L1 proof sender collects all consecutive proven batches starting from last_verified_batch + 1 and sends them in a single verifyBatches() contract call. Falls back to single-batch verifyBatch() for one batch (backward compat) and to per-batch sending on invalid proof reverts to isolate the failing batch. Both OnChainProposer contracts (standard and based) extract shared verification logic into _verifyBatchInternal() and add a verifyBatches() function that loops over it. The based contract now also enforces sequential batch verification (batchNumber == lastVerifiedBatch + 1), fixing a pre-existing gap. Timelock forwarding added for verifyBatches. New CLI flag: --proof-coordinator.prover-timeout (default 600s) controls how long before a batch assignment is considered stale and reassigned.
🤖 Kimi Code ReviewSecurity & Correctness ReviewCritical Issues
Logic Issues
Performance & Gas Optimization
Code Quality
Minor Issues
The PR introduces important batch verification functionality but needs attention to race conditions and state consistency. Automated review by Kimi (Moonshot AI) · custom prompt |
🤖 Codex Code ReviewFound a couple of correctness issues to address; rest looks reasonable. Findings
Notes
If you want, I can propose a concrete patch for the Automated review by OpenAI Codex · custom prompt |
🤖 Claude Code ReviewIndeed, the based contract is missing the L2 message removal logic. This appears to be a pre-existing issue that wasn't introduced by this PR, but it's worth noting. Actually, this is NOT introduced by this PR - let me verify if it existed before: The diff shows this is just a refactoring of existing Positive Observations
Recommendations SummaryMust fix before merge:
Should fix before merge: Nice to have: Automated review by Claude (Anthropic) · custom prompt |
Lines of code reportTotal lines added: Detailed view |
…Timelock in favor of using verifyBatches() for both single and multi-batch verification. The proof sender now always calls verifyBatches (with a single-element array when only one batch is ready). Extracted send_verify_batches_tx as a helper to avoid async recursion in the fallback path that isolates invalid proofs.
for every batch in a multi-batch verifyBatches call, preserving compatibility with systems that listen for per-batch events.
4d416be to
c693938
Compare
instead of referencing a previous hotfix that won't exist after merge.
…n the proof sender to deduplicate the pattern that detects and deletes invalid proofs from the store across the multi-batch fallback and single-batch paths.
single batch via verifyBatches, handling invalid proof deletion and verify tx hash storage. Used in the multi-batch fallback loop.
branches on batch_count > 1 internally, removing the duplicated error pattern matching.
… into a single assignments map. The assignment Instant is now used for both timeout detection and proving time metrics, removing the need for a separate SystemTime-based map and the tokio::sync::Mutex dependency.
…ordinator skips fully-proved batches when scanning for work to assign. This removes the redundant check from handle_request since next_batch_to_assign now guarantees that returned batches are missing at least one proof.
…undant database lookup in handle_request, since the input was already fetched and validated during the assignment scan.
…nstead of just batch_number, so that e.g. a RISC0 prover requesting work doesn't block an SP1 prover from working on the same batch. Adds prover_type to the BatchRequest protocol message and a prover_type() method to the ProverBackend trait so each backend identifies itself when requesting assignments.
… todo!() for ZisK and OpenVM backends (no ProverType variant exists for them yet) and call self.backend.prover_type() directly instead of storing it as a field.
…nder, which was broken by the addition of prover_type to the protocol message.
…tisfy clippy::collapsible_if lint, which was failing the Lint L2 CI check.
…sting steps, configuration reference, and how batch assignment, prover timeout, and multi-batch verification work.
ProofCoordinator: the field is used again in PR #6157 and is preserved to simplify merge conflicts.
…n gas
metric, and fix invalid_proof_type to match both error codes and full messages.
The require(firstBatchNumber == lastVerifiedBatch + 1) in verifyBatches was
redundant since _verifyBatchInternal already enforces sequential verification
on every iteration. The batch verification gas metric was lost during the
multi-batch refactor and is now restored. The invalid_proof_type function now
matches the standard contract's error codes ("00c", "00e", "00g") in addition
to the based contract's full error messages.
assignment scan in the proof coordinator. The multi-batch path now records amortized gas (total_gas / batch_count) for each batch number so the Grafana dashboard (grouped by batch_number) gets a data point for every batch instead of only the first. The single-batch fallback path also records gas now. The inner assignment scan in next_batch_to_assign is bounded by the map size to prevent unbounded iteration under the lock.
…fana dashboard from an XY chart to a table showing batch_number, tx_gas, and tx_hash columns. Batches verified in the same multi-batch transaction now share the same tx_gas and tx_hash values, making it clear which batches were grouped together. The single-batch fallback path also records the metric.
…Proposer contract, matching the interface declaration and the based contract variant.
commit hash doesn't match and no batches are available. Previously the response was flattened into a generic empty BatchResponse, losing the signal that tells the prover its binary is outdated.
showing batch_number on X axis and verification gas on Y with tx_hash visible in the tooltip. Existing panels in the row are narrowed from w=8 to w=6 to fit the new panel alongside the Batch Verification Gas table.
since the Verification Gas by Batch xychart now shows the same information with batch_number on X, gas on Y, and tx_hash in the tooltip. Remaining panels in the row are restored to w=8 each.
…just invalid proof reverts. Previously an out-of-gas or calldata-too-large failure would propagate without fallback, causing an infinite retry loop since the same unbounded batch count would be collected on the next tick.
storage, checkpoint cleanup, latest_sent_batch_proof update) happens exactly once per batch inside send_batches_proof_to_contract. Extract finalize_batch_proof helper, rename verify_and_send_proof to verify_and_send_proofs, and widen the multi-batch fallback to trigger on any error so the sequencer never gets stuck retrying an unbounded batch count that exceeds gas or calldata limits.
…mplemented!), make send_batches_proof_to_contract private, add natspec to based contract's verifyBatches noting it has no access control, update all L2 docs replacing verifyBatch references with verifyBatches, move distributed_proving.md from docs/prover/ to docs/l2/fundamentals/ since it describes the interaction between proof coordinator, proof sender, and provers rather than prover internals, and restructure the doc to be explanation-first with the testing guide at the end.
🤖 Kimi Code ReviewReview SummaryThis PR introduces significant improvements to the L2 proving system, focusing on distributed proving and multi-batch verification. The changes are well-structured and follow good practices, but there are a few areas that need attention. Critical Issues
Security Concerns
Performance & Best Practices
Minor Issues
Recommendations
The PR is well-architected and the changes are necessary for distributed proving. The issues identified are manageable and don't block merging, but should be addressed in follow-up PRs. Automated review by Kimi (Moonshot AI) · custom prompt |
🤖 Codex Code ReviewMain concerns
Other notes
If you want, I can scan the repo for any remaining Automated review by OpenAI Codex · custom prompt |
🤖 Claude Code ReviewNow let me create a comprehensive review. Based on my analysis, I'll provide detailed feedback on the PR: PR #6158 Code Review: Enable Distributed Proving and Multi-Batch VerificationExecutive SummaryThis PR implements two major features: distributed proving (allowing multiple provers to work concurrently) and multi-batch verification (reducing L1 gas costs). The changes are substantial but well-structured. I've identified several critical issues that need addressing, along with recommendations for improvements. 🔴 Critical Issues1. Mutex Poisoning Recovery is InsufficientLocation: The code handles poisoned mutex with simple error returns, but doesn't attempt recovery. In a production blockchain system, this could halt all proving operations: let mut assignments = self.assignments.lock().map_err(|_| {
ProofCoordinatorError::Custom("Assignment lock poisoned".to_string())
})?;Risk: If any thread panics while holding the lock, all future proof assignments fail permanently. Recommendation: Use 2. Race Condition in Proof DeletionLocation: The async fn try_delete_invalid_proof(&self, message: &str, batch_number: u64) -> Result<(), ProofSenderError> {
if let Some(proof_type) = Self::invalid_proof_type(message) {
warn!("Deleting invalid {proof_type:?} proof for batch {batch_number}");
self.rollup_store
.delete_proof_by_batch_and_type(batch_number, proof_type)
.await?;
}
Ok(())
}Risk: A prover could submit a new proof between the error detection and deletion, causing a valid proof to be incorrectly deleted. Recommendation: Add batch version tracking or check the proof timestamp before deletion. 3. Based Contract Sequential Check Missing Pre-VerificationLocation: The based contract enforces sequential verification but only inside require(
batchNumber == lastVerifiedBatch + 1,
"OnChainProposer: batch already verified"
);Risk: If Recommendation: Add an upfront check in require(
firstBatchNumber == lastVerifiedBatch + 1,
"OnChainProposer: first batch must be next to verify"
);
|
Greptile OverviewGreptile SummaryThis PR enables (1) distributed proving by having the proof coordinator assign different batches to different provers (tracked in-memory with per-(batch, prover_type) timestamps + timeout), and (2) multi-batch L1 verification by having the proof sender bundle consecutive proven batches into a single On the contract side, both standard and based Two correctness issues in the new coordinator/sender logic should be addressed before merge: the coordinator can return “no work” too early when the first chosen candidate batch lacks prover input for a commit hash, and the sender’s multi-batch calldata builder currently takes only the first Confidence Score: 2/5
|
| Filename | Overview |
|---|---|
| cmd/ethrex/l2/options.rs | Adds proof coordinator prover timeout CLI/env config and wires it into SequencerConfig. |
| crates/blockchain/metrics/l2/metrics.rs | Adds tx_hash label to batch_verification_gas metric and updates setter signature accordingly. |
| crates/l2/common/src/prover.rs | Extends ProofData::BatchRequest to include prover_type and updates builder; BatchProof::calldata remains Vec (relevant to sender encoding bug). |
| crates/l2/contracts/src/l1/OnChainProposer.sol | Adds verifyBatches and factors verification into _verifyBatchInternal; removes verifyBatch entrypoint. |
| crates/l2/contracts/src/l1/based/OnChainProposer.sol | Adds verifyBatches and sequential verification requirement; factors logic into _verifyBatchInternal. |
| crates/l2/sequencer/l1_proof_sender.rs | Implements multi-batch verifyBatches sending with fallback; introduces calldata construction that takes only first Value from proof.calldata() (must-fix). |
| crates/l2/sequencer/proof_coordinator.rs | Implements distributed assignment with timeouts; new next_batch_to_assign returns None on first missing input which can hide later available work (must-fix). |
| metrics/provisioning/grafana/dashboards/l2_dashboards/l2_overview.json | Dashboard updated: remove redundant table and add tx_hash tooltip/xychart changes for verification gas. |
Sequence Diagram
sequenceDiagram
participant Prover as Prover Client
participant PC as ProofCoordinator
participant Store as Rollup Store
participant PS as L1ProofSender
participant TL as Timelock (optional)
participant OCP as OnChainProposer
Prover->>PC: BatchRequest(commit_hash, prover_type)
PC->>Store: get_latest_sent_batch_proof()
PC->>PC: choose candidate batch (assignments + timeout)
PC->>Store: get_prover_input_by_batch_and_version(batch, commit_hash)
alt input exists and proof not already stored
PC-->>Prover: BatchResponse(batch_number, input, format)
Prover->>PC: ProofSubmit(batch_number, batch_proof)
PC->>Store: store_proof_by_batch_and_type(batch_number, prover_type, proof)
PC-->>Prover: ProofSubmitACK(batch_number)
else no work / wrong version
PC-->>Prover: EmptyBatchResponse or NoBatchForVersion
end
PS->>OCP: get_last_verified_batch()
PS->>OCP: get_last_committed_batch()
PS->>Store: collect consecutive proofs from last_verified+1
alt multiple consecutive batches ready
PS->>TL: verifyBatches(firstBatch, proofs[]...) (if timelock configured)
TL->>OCP: verifyBatches(firstBatch, proofs[]...)
else single batch ready
PS->>TL: verifyBatches(batch, [proof], [proof], [proof])
TL->>OCP: verifyBatches(...)
end
alt tx fails for multi-batch
PS->>PS: fallback: send each batch individually
end
PS->>Store: store_verify_tx_by_batch(batch, tx_hash)
PS->>Store: set_latest_sent_batch_proof(batch)
PS->>PS: remove checkpoint(batch-1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
32 files reviewed, 2 comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR implements distributed proving and multi-batch L1 verification for the ethrex L2. The proof coordinator now assigns different batches to different provers concurrently (keyed by batch number and prover type), with assignment timeout handling. The L1 proof sender collects consecutive proven batches and submits them in a single verifyBatches() transaction, falling back to single-batch sending on any error. Both OnChainProposer contracts (standard and based) have been updated with a new verifyBatches function and refactored internal verification logic. Metrics now include tx_hash labels to track which batches were verified together.
Changes:
- Proof coordinator tracks in-memory batch assignments per (batch_number, prover_type) with timeout-based reassignment
- L1 proof sender collects consecutive proven batches and sends them in a single
verifyBatches()call, with automatic fallback to single-batch sending on errors - Smart contracts add
verifyBatches()accepting arrays of proofs, with internal_verifyBatchInternal()helper; based contract now enforces sequential verification
Reviewed changes
Copilot reviewed 32 out of 32 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| crates/l2/sequencer/proof_coordinator.rs | Implements distributed batch assignment with timeout tracking and per-prover-type assignments |
| crates/l2/sequencer/l1_proof_sender.rs | Adds multi-batch collection and verification with fallback logic and invalid proof detection |
| crates/l2/contracts/src/l1/OnChainProposer.sol | Adds verifyBatches() and extracts _verifyBatchInternal() helper |
| crates/l2/contracts/src/l1/based/OnChainProposer.sol | Adds verifyBatches() with no access control, adds sequential verification check |
| crates/l2/contracts/src/l1/Timelock.sol | Updates interface to call verifyBatches() |
| crates/l2/contracts/src/l1/interfaces/*.sol | Interface updates for verifyBatches() |
| crates/l2/prover/src/backend/*.rs | Adds prover_type() method to trait and implementations |
| crates/l2/prover/src/prover.rs | Passes prover_type in BatchRequest |
| crates/l2/common/src/prover.rs | Adds prover_type field to BatchRequest |
| crates/l2/tee/quote-gen/src/sender.rs | Adds ProverType::TDX to BatchRequest |
| crates/blockchain/metrics/l2/metrics.rs | Adds tx_hash label to batch_verification_gas metric |
| cmd/ethrex/l2/options.rs | Adds prover-timeout CLI option |
| docs/l2/fundamentals/distributed_proving.md | New comprehensive documentation for distributed proving |
| docs/** | Updates references from verifyBatch to verifyBatches |
| metrics/provisioning/grafana/dashboards/l2_dashboards/l2_overview.json | Updates dashboard to show tx_hash in tooltip |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
in all external functions (OnChainProposer, based OnChainProposer, Timelock, and their interfaces). This avoids unnecessary calldata-to-memory copies for the proof arrays, saving gas proportional to proof data size. Consistent with verifyBatchesAligned which already used calldata. Also add distributed_proving.md to SUMMARY.md so the mdbook linkcheck passes.
Motivation
The proof coordinator currently assigns the same batch to every prover that requests work, meaning only one prover can be active at a time. This is a bottleneck when multiple provers are available. Additionally, the proof sender verifies one batch per L1 transaction even when multiple proofs are ready, wasting gas on separate transactions.
Description
Proof Coordinator — distributed batch assignment:
Arc<std::sync::Mutex<HashMap<(u64, ProverType), Instant>>>(two-phase lock pattern: brief mutex for scan+assign, storage validation outside lock)--proof-coordinator.prover-timeout(default 600s, envETHREX_PROOF_COORDINATOR_PROVER_TIMEOUT) controls stale assignment timeoutL1 Proof Sender — multi-batch verification:
last_verified_batch + 1and send them in a singleverifyBatches()transactionverifyBatches()(with a single-element array when only one batch is ready)OnChainProposer contracts (standard + based):
_verifyBatchInternal()to avoid code duplicationverifyBatches(uint256, bytes[], bytes[], bytes[])that loops over_verifyBatchInternal()calldatainstead ofmemoryfor proof array parameters in external functions, avoiding unnecessary calldata-to-memory copies (consistent withverifyBatchesAligned)_getPublicInputsFromCommitmentcalled beforelastVerifiedBatchupdatebatchNumber == lastVerifiedBatch + 1), fixing a pre-existing gapverifyBatchesMetrics & Grafana:
tx_hashlabel tobatch_verification_gasmetric so batches verified in the same multi-batch tx share the same gas value and tx hashAligned mode is unchanged — it already supports multi-batch via
L1ProofVerifier.Note: The
prover_type()method onProverBackendand theprover_typefield inBatchRequestoverlap with #6157. OpenVM and ZisK useunimplemented!()since they are not yet enabled as L2 backends. Whichever PR lands first, the other will resolve on rebase.Checklist
STORE_SCHEMA_VERSION(crates/storage/lib.rs) if the PR includes breaking changes to theStorerequiring a re-sync.