feat(l2): enable distributed proving and multi-batch verification #6158

avilagaston9 · 2026-02-09T15:08:58Z

Motivation

The proof coordinator currently assigns the same batch to every prover that requests work, meaning only one prover can be active at a time. This is a bottleneck when multiple provers are available. Additionally, the proof sender verifies one batch per L1 transaction even when multiple proofs are ready, wasting gas on separate transactions.

Description

Proof Coordinator — distributed batch assignment:

Track in-flight batch assignments with timestamps using Arc<std::sync::Mutex<HashMap<(u64, ProverType), Instant>>> (two-phase lock pattern: brief mutex for scan+assign, storage validation outside lock)
When a prover requests work, assign the first unassigned or timed-out batch — different provers get different batches
Clean up assignments when all proof types arrive for a batch or when batches are verified on-chain
New CLI flag --proof-coordinator.prover-timeout (default 600s, env ETHREX_PROOF_COORDINATOR_PROVER_TIMEOUT) controls stale assignment timeout

L1 Proof Sender — multi-batch verification:

Collect all consecutive proven batches from last_verified_batch + 1 and send them in a single verifyBatches() transaction
Always uses verifyBatches() (with a single-element array when only one batch is ready)
On any multi-batch error, fall back to per-batch sending — this prevents the sequencer from getting stuck on gas limit or calldata size issues (see L2: Add a configurable cap to the number of batches per verifyBatches transaction #6173 for adding a proper cap)
On invalid proof revert during single-batch fallback, delete the offending proof from the store
Invalid proof detection matches both full error messages (based contract) and error codes (standard contract); see L2: normalize error codes between OnChainProposer contracts #6098 for normalizing these across contracts

OnChainProposer contracts (standard + based):

Extract shared verification logic into _verifyBatchInternal() to avoid code duplication
Add verifyBatches(uint256, bytes[], bytes[], bytes[]) that loops over _verifyBatchInternal()
Use calldata instead of memory for proof array parameters in external functions, avoiding unnecessary calldata-to-memory copies (consistent with verifyBatchesAligned)
Critical ordering preserved: _getPublicInputsFromCommitment called before lastVerifiedBatch update
Based contract now enforces sequential verification (batchNumber == lastVerifiedBatch + 1), fixing a pre-existing gap
Timelock and interface updates for verifyBatches

Metrics & Grafana:

Add tx_hash label to batch_verification_gas metric so batches verified in the same multi-batch tx share the same gas value and tx hash
New "Verification Gas by Batch" xychart panel (batch_number on X, gas on Y, tx_hash in tooltip)

Aligned mode is unchanged — it already supports multi-batch via L1ProofVerifier.

Note: The prover_type() method on ProverBackend and the prover_type field in BatchRequest overlap with #6157. OpenVM and ZisK use unimplemented!() since they are not yet enabled as L2 backends. Whichever PR lands first, the other will resolve on rebase.

Checklist

Updated STORE_SCHEMA_VERSION (crates/storage/lib.rs) if the PR includes breaking changes to the Store requiring a re-sync.

verification in the proof sender and OnChainProposer contracts. The proof coordinator now tracks batch-to-prover assignments with timestamps using an in-memory map behind std::sync::Mutex (two-phase lock pattern to avoid holding locks across .await points). When a prover requests work, the coordinator assigns the first unassigned or timed-out batch, allowing multiple provers to work on different batches concurrently. Assignments are cleaned up when all proof types arrive for a batch. The L1 proof sender collects all consecutive proven batches starting from last_verified_batch + 1 and sends them in a single verifyBatches() contract call. Falls back to single-batch verifyBatch() for one batch (backward compat) and to per-batch sending on invalid proof reverts to isolate the failing batch. Both OnChainProposer contracts (standard and based) extract shared verification logic into _verifyBatchInternal() and add a verifyBatches() function that loops over it. The based contract now also enforces sequential batch verification (batchNumber == lastVerifiedBatch + 1), fixing a pre-existing gap. Timelock forwarding added for verifyBatches. New CLI flag: --proof-coordinator.prover-timeout (default 600s) controls how long before a batch assignment is considered stale and reassigned.

github-actions · 2026-02-09T15:09:36Z

🤖 Kimi Code Review

Security & Correctness Review

Critical Issues

Race Condition in Proof Assignment (crates/l2/sequencer/proof_coordinator.rs:147-213)
- The next_batch_to_assign function has a TOCTOU race between checking if a batch has prover input and assigning it. A prover could receive an assignment for a batch that gets deleted/updated between the check and assignment.
- Fix: Move the prover input check inside the locked section or use a database transaction.
Missing Re-entrancy Protection (crates/l2/contracts/src/l1/OnChainProposer.sol:404-471)
- The _verifyBatchInternal function updates state (lastVerifiedBatch) after external calls to proof verifiers. While the current verifiers may not be re-entrant, this pattern is dangerous.
- Fix: Use re-entrancy guards or follow checks-effects-interactions pattern more strictly.

Logic Issues

Incorrect Batch Assignment Logic (crates/l2/sequencer/proof_coordinator.rs:167-183)
- The loop logic for finding the next batch to assign is flawed. It increments batch even when a batch is assigned but not timed out, potentially skipping valid batches.
- Fix: The loop should continue scanning, not break, when a batch is assigned but not timed out.
Array Length Mismatch Handling (crates/l2/contracts/src/l1/OnChainProposer.sol:450-456)
- The verifyBatches function only checks array lengths match each other, but doesn't validate against the actual number of batches being verified.
- Fix: Add explicit validation that all arrays have the same length as the batch range.

Performance & Gas Optimization

Inefficient Batch Verification (crates/l2/contracts/src/l1/OnChainProposer.sol:450-471)
- The loop in verifyBatches could be more gas efficient by caching firstBatchNumber + i in a local variable.
- Fix: Use uint256 currentBatch = firstBatchNumber + i to avoid repeated addition.
Unnecessary State Updates (crates/l2/sequencer/proof_coordinator.rs:348-355)
- The proof coordinator removes assignments from the HashMap even when all proofs exist, but this check runs for every proof submission.
- Fix: Only run the completeness check when a proof is actually submitted, not when checking existence.

Code Quality

Inconsistent Error Handling (crates/l2/sequencer/l1_proof_sender.rs:189-213)
- The fallback mechanism for invalid proofs in send_batches_proof_to_contract silently continues on individual failures, which could mask systematic issues.
- Fix: Log which specific batch failed and consider rate limiting the fallback.
Magic Numbers (cmd/ethrex/l2/options.rs:781)
- The default timeout of 600000ms (10 minutes) is hardcoded in multiple places.
- Fix: Define as a named constant, e.g., const DEFAULT_PROVER_TIMEOUT_MS: u64 = 600_000;

Minor Issues

Documentation Mismatch (crates/l2/contracts/src/l1/OnChainProposer.sol:355-361)
- The comment mentions "subsequent block" but should say "subsequent batch".
Redundant Checks (crates/l2/sequencer/l1_proof_sender.rs:240-255)
- The check for ready_batches.is_empty() is redundant since the subsequent match handles the empty case.

The PR introduces important batch verification functionality but needs attention to race conditions and state consistency.

Automated review by Kimi (Moonshot AI) · custom prompt

github-actions · 2026-02-09T15:10:57Z

🤖 Codex Code Review

Found a couple of correctness issues to address; rest looks reasonable.

Findings

Potential starvation / skipped batches (correctness): crates/l2/sequencer/proof_coordinator.rs in next_batch_to_assign. The new logic assigns only the base batch (latest_sent + 1) and returns Ok(None) if there’s no prover input for that batch, without scanning forward to the next batch that does have input (previous loop did). This can stall proving indefinitely if the first batch lacks input for the requested commit_hash but later batches are ready. Suggest restoring a forward scan similar to the previous while loop (and likely contains_batch gating) before assigning.
Path: crates/l2/sequencer/proof_coordinator.rs (function next_batch_to_assign).
Behavior regression on version mismatch (correctness / operability): crates/l2/sequencer/proof_coordinator.rs in handle_request. When next_batch_to_assign returns None, you now send empty_batch_response instead of no_batch_for_version. This removes the explicit signal to provers that the requested commit_hash is not available, which can keep them idling on a bad version. If the intention is to keep the old behavior, consider returning no_batch_for_version when the assigned batch exists but has no input for that version.
Path: crates/l2/sequencer/proof_coordinator.rs (function handle_request).

Notes

Multi-batch verification path in crates/l2/sequencer/l1_proof_sender.rs looks consistent. The checkpoint cleanup loop removes batch-1 for each batch in the sent range, which matches the prior single-batch cleanup behavior.
The Solidity refactor into _verifyBatchInternal and the new verifyBatches entrypoints appear correct and maintain sequencing/gas logic. Emitting BatchVerified once at the end is fine as long as that’s the desired event granularity.

If you want, I can propose a concrete patch for the next_batch_to_assign scan logic and the response behavior.

Automated review by OpenAI Codex · custom prompt

github-actions · 2026-02-09T15:11:50Z

🤖 Claude Code Review

Indeed, the based contract is missing the L2 message removal logic. This appears to be a pre-existing issue that wasn't introduced by this PR, but it's worth noting. Actually, this is NOT introduced by this PR - let me verify if it existed before:

The diff shows this is just a refactoring of existing verifyBatch logic into _verifyBatchInternal, so if L2 message removal was missing before, it's still missing now. This is not a new issue introduced by this PR.

Positive Observations

Good mutex discipline: The proof coordinator correctly uses std::sync::Mutex instead of tokio::sync::Mutex and never holds the lock across await points.
Sequential verification preserved: The contracts correctly enforce batchNumber == lastVerifiedBatch + 1 both at the start of verifyBatches and within _verifyBatchInternal, preventing gaps.
Proper state ordering: The contracts maintain the critical ordering of calling _getPublicInputsFromCommitment BEFORE updating lastVerifiedBatch, as documented.
Fallback mechanism: The intelligent fallback from multi-batch to single-batch on invalid proof helps isolate bad proofs.
Backward compatibility: Single-batch path is preserved for compatibility.
Assignment cleanup: The code properly cleans up assignments when all proofs are complete (line 348-350, though needs better error handling).

Recommendations Summary

Must fix before merge:

Fix lock poisoning error handling (docs: add milestones #1, feat: add fmt, test and clippy to CI #11)
Add maximum batch count limit (build: add Github actions boilerplate #3)
Fix assignment cleanup on storage error (feat: add basic RPC api. #5)

Should fix before merge:
4. Improve event emission for verifyBatches (#4)
5. Move checkpoint cleanup after tx confirmation (#7)
6. Clarify fallback behavior in comments (#6)

Nice to have:
7. Consistent error messages (#9)
8. Better documentation on timeout behavior (#10)
9. Validate timeout configuration (#12)

Automated review by Claude (Anthropic) · custom prompt

github-actions · 2026-02-09T15:12:01Z

Lines of code report

Total lines added: 182
Total lines removed: 0
Total lines changed: 182

Detailed view

+-------------------------------------------------+-------+------+
| File                                            | Lines | Diff |
+-------------------------------------------------+-------+------+
| ethrex/cmd/ethrex/l2/options.rs                 | 1081  | +11  |
+-------------------------------------------------+-------+------+
| ethrex/crates/blockchain/metrics/l2/metrics.rs  | 282   | +1   |
+-------------------------------------------------+-------+------+
| ethrex/crates/l2/common/src/prover.rs           | 196   | +6   |
+-------------------------------------------------+-------+------+
| ethrex/crates/l2/prover/src/backend/exec.rs     | 67    | +3   |
+-------------------------------------------------+-------+------+
| ethrex/crates/l2/prover/src/backend/mod.rs      | 96    | +1   |
+-------------------------------------------------+-------+------+
| ethrex/crates/l2/prover/src/backend/openvm.rs   | 103   | +3   |
+-------------------------------------------------+-------+------+
| ethrex/crates/l2/prover/src/backend/risc0.rs    | 131   | +3   |
+-------------------------------------------------+-------+------+
| ethrex/crates/l2/prover/src/backend/sp1.rs      | 182   | +3   |
+-------------------------------------------------+-------+------+
| ethrex/crates/l2/prover/src/backend/zisk.rs     | 158   | +3   |
+-------------------------------------------------+-------+------+
| ethrex/crates/l2/prover/src/prover.rs           | 186   | +1   |
+-------------------------------------------------+-------+------+
| ethrex/crates/l2/sequencer/configs.rs           | 108   | +1   |
+-------------------------------------------------+-------+------+
| ethrex/crates/l2/sequencer/l1_proof_sender.rs   | 593   | +126 |
+-------------------------------------------------+-------+------+
| ethrex/crates/l2/sequencer/proof_coordinator.rs | 445   | +19  |
+-------------------------------------------------+-------+------+
| ethrex/crates/l2/tee/quote-gen/src/sender.rs    | 86    | +1   |
+-------------------------------------------------+-------+------+

…Timelock in favor of using verifyBatches() for both single and multi-batch verification. The proof sender now always calls verifyBatches (with a single-element array when only one batch is ready). Extracted send_verify_batches_tx as a helper to avoid async recursion in the fallback path that isolates invalid proofs.

for every batch in a multi-batch verifyBatches call, preserving compatibility with systems that listen for per-batch events.

instead of referencing a previous hotfix that won't exist after merge.

…n the proof sender to deduplicate the pattern that detects and deletes invalid proofs from the store across the multi-batch fallback and single-batch paths.

single batch via verifyBatches, handling invalid proof deletion and verify tx hash storage. Used in the multi-batch fallback loop.

branches on batch_count > 1 internally, removing the duplicated error pattern matching.

… into a single assignments map. The assignment Instant is now used for both timeout detection and proving time metrics, removing the need for a separate SystemTime-based map and the tokio::sync::Mutex dependency.

…ordinator skips fully-proved batches when scanning for work to assign. This removes the redundant check from handle_request since next_batch_to_assign now guarantees that returned batches are missing at least one proof.

…undant database lookup in handle_request, since the input was already fetched and validated during the assignment scan.

…nstead of just batch_number, so that e.g. a RISC0 prover requesting work doesn't block an SP1 prover from working on the same batch. Adds prover_type to the BatchRequest protocol message and a prover_type() method to the ProverBackend trait so each backend identifies itself when requesting assignments.

… todo!() for ZisK and OpenVM backends (no ProverType variant exists for them yet) and call self.backend.prover_type() directly instead of storing it as a field.

…nder, which was broken by the addition of prover_type to the protocol message.

…tisfy clippy::collapsible_if lint, which was failing the Lint L2 CI check.

…sting steps, configuration reference, and how batch assignment, prover timeout, and multi-batch verification work.

ProofCoordinator: the field is used again in PR #6157 and is preserved to simplify merge conflicts.

…n gas metric, and fix invalid_proof_type to match both error codes and full messages. The require(firstBatchNumber == lastVerifiedBatch + 1) in verifyBatches was redundant since _verifyBatchInternal already enforces sequential verification on every iteration. The batch verification gas metric was lost during the multi-batch refactor and is now restored. The invalid_proof_type function now matches the standard contract's error codes ("00c", "00e", "00g") in addition to the based contract's full error messages.

assignment scan in the proof coordinator. The multi-batch path now records amortized gas (total_gas / batch_count) for each batch number so the Grafana dashboard (grouped by batch_number) gets a data point for every batch instead of only the first. The single-batch fallback path also records gas now. The inner assignment scan in next_batch_to_assign is bounded by the map size to prevent unbounded iteration under the lock.

…fana dashboard from an XY chart to a table showing batch_number, tx_gas, and tx_hash columns. Batches verified in the same multi-batch transaction now share the same tx_gas and tx_hash values, making it clear which batches were grouped together. The single-batch fallback path also records the metric.

…Proposer contract, matching the interface declaration and the based contract variant.

commit hash doesn't match and no batches are available. Previously the response was flattened into a generic empty BatchResponse, losing the signal that tells the prover its binary is outdated.

showing batch_number on X axis and verification gas on Y with tx_hash visible in the tooltip. Existing panels in the row are narrowed from w=8 to w=6 to fit the new panel alongside the Batch Verification Gas table.

since the Verification Gas by Batch xychart now shows the same information with batch_number on X, gas on Y, and tx_hash in the tooltip. Remaining panels in the row are restored to w=8 each.

…just invalid proof reverts. Previously an out-of-gas or calldata-too-large failure would propagate without fallback, causing an infinite retry loop since the same unbounded batch count would be collected on the next tick.

storage, checkpoint cleanup, latest_sent_batch_proof update) happens exactly once per batch inside send_batches_proof_to_contract. Extract finalize_batch_proof helper, rename verify_and_send_proof to verify_and_send_proofs, and widen the multi-batch fallback to trigger on any error so the sequencer never gets stuck retrying an unbounded batch count that exceeds gas or calldata limits.

…mplemented!), make send_batches_proof_to_contract private, add natspec to based contract's verifyBatches noting it has no access control, update all L2 docs replacing verifyBatch references with verifyBatches, move distributed_proving.md from docs/prover/ to docs/l2/fundamentals/ since it describes the interaction between proof coordinator, proof sender, and provers rather than prover internals, and restructure the doc to be explanation-first with the testing guide at the end.

github-actions · 2026-02-11T15:57:41Z

🤖 Kimi Code Review

Review Summary

This PR introduces significant improvements to the L2 proving system, focusing on distributed proving and multi-batch verification. The changes are well-structured and follow good practices, but there are a few areas that need attention.

Critical Issues

Potential Reentrancy in _verifyBatchInternal (contracts)
- In both OnChainProposer.sol files, _verifyBatchInternal is marked as internal but is called within a loop in verifyBatches. While not a traditional reentrancy issue, the function modifies state (lastVerifiedBatch, batchCommitments) and emits events. Consider adding a reentrancy guard or ensuring atomicity.
Integer Overflow in Batch Assignment (proof_coordinator.rs:152-175)
- The calculation max_batch = base_batch.saturating_add(u64::try_from(assignments.len()).unwrap_or(u64::MAX)) could overflow if assignments.len() is very large. Use saturating_add consistently.
Race Condition in Batch Assignment (proof_coordinator.rs)
- The assignment tracking uses Arc<Mutex<HashMap>> but the lock is held briefly. However, between checking availability and inserting the assignment, another prover could claim the same batch. Consider using a more robust synchronization mechanism.

Security Concerns

No Access Control in Based OnChainProposer (contracts)
- The based OnChainProposer.sol removes access control from verifyBatches(). While this is intentional for decentralization, ensure this aligns with the security model as anyone can now verify batches.
Calldata Size Limits (l1_proof_sender.rs:427-500)
- The multi-batch verification could hit Ethereum's calldata size limits. Consider adding a maximum batch count check to prevent DoS.

Performance & Best Practices

Memory Usage in send_verify_batches_tx (l1_proof_sender.rs:427-500)
- The function creates multiple temporary vectors (risc0_array, sp1_array, tdx_array) which could be memory-intensive for large batch counts. Consider streaming or chunking.
Error Handling in ProverType Detection (l1_proof_sender.rs:457-470)
- The invalid_proof_type function uses string matching which could be brittle. Consider using error codes or structured error types.
Metrics Label Cardinality (metrics.rs:79-89)
- Adding tx_hash to the batch_verification_gas metric increases label cardinality significantly. Ensure this won't cause performance issues with Prometheus.

Minor Issues

Typo in Grafana Dashboard (l2_overview.json:504)
- "unit": "sishort" appears to be a typo - should likely be "short".
Unimplemented ProverType (backend/openvm.rs:67, backend/zisk.rs:116)
- Both OpenVM and Zisk backends return unimplemented!() for prover_type(). Consider returning a proper error or implementing these.
Documentation Consistency
- The new distributed_proving.md is excellent, but ensure all references to verifyBatch are updated to verifyBatches throughout the codebase.

Recommendations

Add a maximum batch count parameter to prevent calldata overflow
Implement proper error handling for the unimplemented prover types
Consider adding a configuration parameter for maximum concurrent assignments per prover type
Add unit tests for the new batch assignment logic
Consider adding a metric for assignment timeout occurrences

The PR is well-architected and the changes are necessary for distributed proving. The issues identified are manageable and don't block merging, but should be addressed in follow-up PRs.

Automated review by Kimi (Moonshot AI) · custom prompt

github-actions · 2026-02-11T15:58:53Z

🤖 Codex Code Review

Main concerns

High‑cardinality Prometheus label: tx_hash on batch_verification_gas will create a new time series per tx hash and can explode memory/CPU in Prometheus/Grafana (especially for long‑running L2s). Consider keeping batch_number only, or moving the hash to logs/exemplars instead. Location: crates/blockchain/metrics/l2/metrics.rs in record_batch_verification_gas and the IntGaugeVec definition.
Gas overhead from memory arrays: the new verifyBatches functions take bytes[] memory, which copies calldata into memory and scales poorly with multi‑batch payloads. Use calldata for external functions and in interfaces. Locations: crates/l2/contracts/src/l1/OnChainProposer.sol verifyBatches, crates/l2/contracts/src/l1/based/OnChainProposer.sol verifyBatches, and both IOnChainProposer.sol and ITimelock.sol interfaces.

Other notes

Protocol change: adding prover_type to ProofData::BatchRequest is a wire‑format change. If any mixed‑version deployment is possible, ensure the coordinator/prover handshake rejects old clients early (commit hash seems to handle this). Location: crates/l2/common/src/prover.rs ProofData::BatchRequest.
Grafana panel rename/label change looks consistent with the metric label change, but be aware any existing dashboards or alerts relying on the old label set will break. Location: metrics/provisioning/grafana/dashboards/l2_dashboards/l2_overview.json.

If you want, I can scan the repo for any remaining verifyBatch call sites or interface mismatches.

Automated review by OpenAI Codex · custom prompt

github-actions · 2026-02-11T15:59:17Z

🤖 Claude Code Review

Now let me create a comprehensive review. Based on my analysis, I'll provide detailed feedback on the PR:

PR #6158 Code Review: Enable Distributed Proving and Multi-Batch Verification

Executive Summary

This PR implements two major features: distributed proving (allowing multiple provers to work concurrently) and multi-batch verification (reducing L1 gas costs). The changes are substantial but well-structured. I've identified several critical issues that need addressing, along with recommendations for improvements.

🔴 Critical Issues

1. Mutex Poisoning Recovery is Insufficient

Location: proof_coordinator.rs:153-155, 189-191, 297-299

The code handles poisoned mutex with simple error returns, but doesn't attempt recovery. In a production blockchain system, this could halt all proving operations:

let mut assignments = self.assignments.lock().map_err(|_| {
    ProofCoordinatorError::Custom("Assignment lock poisoned".to_string())
})?;

Risk: If any thread panics while holding the lock, all future proof assignments fail permanently.

Recommendation: Use PoisonError::into_inner() to recover the mutex guard and log the poisoning event, or use an alternative like parking_lot::Mutex which doesn't poison.

2. Race Condition in Proof Deletion

Location: l1_proof_sender.rs:508-520

The try_delete_invalid_proof function has a race condition with concurrent proof submissions:

async fn try_delete_invalid_proof(&self, message: &str, batch_number: u64) -> Result<(), ProofSenderError> {
    if let Some(proof_type) = Self::invalid_proof_type(message) {
        warn!("Deleting invalid {proof_type:?} proof for batch {batch_number}");
        self.rollup_store
            .delete_proof_by_batch_and_type(batch_number, proof_type)
            .await?;
    }
    Ok(())
}

Risk: A prover could submit a new proof between the error detection and deletion, causing a valid proof to be incorrectly deleted.

Recommendation: Add batch version tracking or check the proof timestamp before deletion.

3. Based Contract Sequential Check Missing Pre-Verification

Location: based/OnChainProposer.sol:343

The based contract enforces sequential verification but only inside _verifyBatchInternal:

require(
    batchNumber == lastVerifiedBatch + 1,
    "OnChainProposer: batch already verified"
);

Risk: If verifyBatches() is called with non-sequential batches (e.g., [5, 6, 8]), it will fail on the third batch after processing the first two, wasting gas.

Recommendation: Add an upfront check in verifyBatches() at line 452 before the loop:

require(
    firstBatchNumber == lastVerifiedBatch + 1,
    "OnChainProposer: first batch must be next to verify"
);

⚠️ High-Priority Issues

4. Unbounded Batch Scan in Proof Coordinator

Location: proof_coordinator.rs:163-164

let max_batch = base_batch.saturating_add(u64::try_from(assignments.len()).unwrap_or(u64::MAX));

Issue: The scan upper bound is based on the number of assignments, which could be large if there are many stale assignments. This could cause performance degradation.

Recommendation: Add a configurable maximum scan range (e.g., 100 batches) to prevent unbounded searches.

5. Invalid Proof Detection is Brittle

Location: l1_proof_sender.rs:493-504

Error detection relies on string matching:

fn invalid_proof_type(message: &str) -> Option<ProverType> {
    if message.contains("Invalid TDX proof") || message.contains("00g") {
        Some(ProverType::TDX)
    } else if message.contains("Invalid RISC0 proof") || message.contains("00c") {
        Some(ProverType::RISC0)
    } else if message.contains("Invalid SP1 proof") || message.contains("00e") {
        Some(ProverType::SP1)
    } else {
        None
    }
}

Issue: String matching on error messages is fragile. If contract error messages change, this breaks silently.

Recommendation:

Parse the actual revert data/error selector instead of string matching
Add integration tests that verify error detection works correctly

6. Proof Verification Gas Attribution May Be Incorrect

Location: l1_proof_sender.rs:617-628

In multi-batch verification, all batches get the same total gas cost:

let tx_gas: i64 = verify_tx_receipt.tx_info.gas_used.try_into()?;
for (batch_number, _) in batches {
    METRICS.set_batch_verification_gas(*batch_number, tx_gas, &tx_hash_str)?;
}

Issue: This makes gas metrics misleading—each batch appears to cost the full transaction gas instead of its share.

Recommendation: Either:

Divide gas equally among batches
Keep the current approach but document it clearly in metrics/dashboards
Add a separate metric for multi-batch transactions

🟡 Medium-Priority Issues

7. Assignment Cleanup Only on Proof Submission

Location: proof_coordinator.rs:296-299

Assignments are removed only when proofs are submitted, not when batches are verified on-chain:

if let Ok(mut assignments) = self.assignments.lock() {
    assignments.remove(&(batch_number, prover_type));
}

Issue: If a prover is assigned a batch but never submits a proof, and the batch is verified via another path (e.g., from store), the assignment persists until timeout.

Recommendation: Also clean up assignments when batches are verified on-chain (in the L1 proof sender or via a periodic cleanup task).

8. Silent Metric Failures

Location: proof_coordinator.rs:279-289

Metrics recording happens inside a complex conditional with silent failures:

metrics!(if let Ok(assignments) = self.assignments.lock()
    && let Some(&assigned_at) = assignments.get(&(batch_number, prover_type))
{
    // ... metric recording
});

Issue: If the mutex is poisoned or the assignment is missing, no metric is recorded and no warning is logged.

Recommendation: Log warnings when metrics fail to record.

9. Potential for Checkpoint Deletion Race

Location: l1_proof_sender.rs:524-538

let checkpoint_path = self
    .checkpoints_dir
    .join(batch_checkpoint_name(batch_number - 1));
if checkpoint_path.exists() {
    let _ = remove_dir_all(&checkpoint_path).inspect_err(|e| {
        error!("Failed to remove checkpoint directory at path {checkpoint_path:?}...")
    });
}

Issue: If multiple batches are verified in parallel (through the fallback path), they could both try to delete batch_number - 1's checkpoint.

Recommendation: Add proper synchronization or use remove_dir_all only if not already deleted.

🟢 Minor Issues & Improvements

10. Inconsistent Access Control Between Contracts

Location: Standard vs Based OnChainProposer.sol

Standard contract: verifyBatches() has onlyOwner whenNotPaused (line 477)
Based contract: verifyBatches() has no access control (line 436 comment)

Issue: This inconsistency is confusing. The based contract's natspec says "no access control" but doesn't explain the security implications.

Recommendation: Add a detailed natspec comment in the based contract explaining why this is safe (presumably because proofs are verified regardless of caller).

11. Missing Configuration Validation

Location: proof_coordinator.rs:98

prover_timeout: Duration::from_millis(config.proof_coordinator.prover_timeout_ms),

Issue: No validation that prover_timeout_ms is reasonable (not 0, not extremely large).

Recommendation: Add validation in config parsing:

if prover_timeout_ms == 0 || prover_timeout_ms > 3_600_000 {
    return Err("prover_timeout_ms must be between 1ms and 1 hour");
}

12. Duplicate Code in Contract Verification

Location: Both OnChainProposer.sol contracts

The _verifyBatchInternal function is nearly identical between standard and based contracts. The based version has simpler BatchCommitmentInfo but shares ~90% of the logic.

Recommendation: Consider creating a shared internal library for common verification logic to reduce maintenance burden.

13. Array Length Mismatch Check Could Be More Specific

Location: OnChainProposer.sol:485-487

require(
    sp1ProofsBytes.length == batchCount && tdxSignatures.length == batchCount,
    "OnChainProposer: array length mismatch"
);

Issue: The error message doesn't indicate which array has the wrong length.

Recommendation: Use separate checks with specific error messages for better debugging.

🔵 Code Quality & Style

14. Inconsistent Error Format in Contracts

Standard contract uses error codes ("009", "00a") while based contract uses full messages. This was noted in the PR description (issue #6098) but creates confusion.

15. Complex Boolean Logic in Lock Guards

The pattern if let Ok(assignments) = self.assignments.lock() is used frequently. Consider extracting to a helper method:

fn with_assignments<F, R>(&self, f: F) -> Option<R>
where F: FnOnce(&mut HashMap<(u64, ProverType), Instant>) -> R
{
    self.assignments.lock().ok().map(|mut guard| f(&mut guard))
}

16. Magic Number in Aligned Proof System ID

proof_coordinator.rs:line 52 defines ALIGNED_SP1_PROVING_SYSTEM_ID = 1 but this should be validated against the actual Aligned Layer constant.

✅ Positive Aspects

Excellent Documentation: The PR description is comprehensive with clear motivation and technical details
Proper Error Handling: Generally good error propagation with context
Backwards Compatibility: Fallback logic for single-batch sending prevents breaking existing deployments
Testing Considerations: The PR mentions issues with gas caps (L2: Add a configurable cap to the number of batches per verifyBatches transaction #6173) showing awareness of edge cases
Clean Abstraction: The _verifyBatchInternal extraction is a good refactoring
Metrics Integration: Good addition of tx_hash label for gas tracking

📋 Recommendations Summary

Must Fix Before Merge:

Add mutex poisoning recovery or use a non-poisoning mutex
Add race condition protection for proof deletion
Add sequential batch validation in based contract's verifyBatches()

Should Fix:
4. Add bounded batch scan range
5. Improve error detection reliability (use error selectors, not strings)
6. Clarify gas metric attribution or fix division

Nice to Have:
7. Add assignment cleanup on on-chain verification
8. Log metric recording failures
9. Add checkpoint deletion synchronization
10. Document access control differences
11. Validate configuration parameters
12. Consolidate duplicate Solidity code

Test Coverage Gaps

Based on the code changes, ensure these scenarios are tested:

Multiple provers requesting work simultaneously
Prover timeout and batch reassignment
Multi-batch transaction failure scenarios
Invalid proof detection and deletion
Mutex contention under high load
Checkpoint cleanup with concurrent verifications
Gas limit exceeded on multi-batch (mentioned in L2: Add a configurable cap to the number of batches per verifyBatches transaction #6173)

Security Considerations

✅ Good:

Proof verification is done before state updates
Sequential batch verification enforced
Invalid proofs are deleted to prevent replay

⚠️ Concerns:

Race conditions in proof deletion
Unbounded operations in coordinator
Brittle error detection

Automated review by Claude (Anthropic) · custom prompt

greptile-apps · 2026-02-11T16:09:43Z

Greptile Overview

Greptile Summary

This PR enables (1) distributed proving by having the proof coordinator assign different batches to different provers (tracked in-memory with per-(batch, prover_type) timestamps + timeout), and (2) multi-batch L1 verification by having the proof sender bundle consecutive proven batches into a single verifyBatches() transaction with a single-batch fallback on error.

On the contract side, both standard and based OnChainProposer variants add verifyBatches(uint256, bytes[], bytes[], bytes[]) and factor shared logic into _verifyBatchInternal(), with interfaces/timelock/docs updated accordingly. Metrics and Grafana are updated so batch_verification_gas is labeled by both batch_number and tx_hash to handle multi-batch transactions.

Two correctness issues in the new coordinator/sender logic should be addressed before merge: the coordinator can return “no work” too early when the first chosen candidate batch lacks prover input for a commit hash, and the sender’s multi-batch calldata builder currently takes only the first Value from each proof’s calldata vector, which can truncate proofs if any backend encodes more than one ABI value per proof.

Confidence Score: 2/5

This PR has two must-fix correctness issues in the new coordinator/sender paths.
Distributed proving + multi-batch verification are substantial changes and the contract/interface updates look consistent, but the current ProofCoordinator can incorrectly report no work even when later batches are available, and the L1ProofSender’s verifyBatches calldata construction can truncate proofs by dropping all but the first ABI value per proof. These can lead to stalled proving and invalid verification transactions.
crates/l2/sequencer/proof_coordinator.rs; crates/l2/sequencer/l1_proof_sender.rs

Important Files Changed

Filename	Overview
cmd/ethrex/l2/options.rs	Adds proof coordinator prover timeout CLI/env config and wires it into SequencerConfig.
crates/blockchain/metrics/l2/metrics.rs	Adds tx_hash label to batch_verification_gas metric and updates setter signature accordingly.
crates/l2/common/src/prover.rs	Extends ProofData::BatchRequest to include prover_type and updates builder; BatchProof::calldata remains Vec (relevant to sender encoding bug).
crates/l2/contracts/src/l1/OnChainProposer.sol	Adds verifyBatches and factors verification into _verifyBatchInternal; removes verifyBatch entrypoint.
crates/l2/contracts/src/l1/based/OnChainProposer.sol	Adds verifyBatches and sequential verification requirement; factors logic into _verifyBatchInternal.
crates/l2/sequencer/l1_proof_sender.rs	Implements multi-batch verifyBatches sending with fallback; introduces calldata construction that takes only first Value from proof.calldata() (must-fix).
crates/l2/sequencer/proof_coordinator.rs	Implements distributed assignment with timeouts; new next_batch_to_assign returns None on first missing input which can hide later available work (must-fix).
metrics/provisioning/grafana/dashboards/l2_dashboards/l2_overview.json	Dashboard updated: remove redundant table and add tx_hash tooltip/xychart changes for verification gas.

Sequence Diagram

sequenceDiagram
    participant Prover as Prover Client
    participant PC as ProofCoordinator
    participant Store as Rollup Store
    participant PS as L1ProofSender
    participant TL as Timelock (optional)
    participant OCP as OnChainProposer

    Prover->>PC: BatchRequest(commit_hash, prover_type)
    PC->>Store: get_latest_sent_batch_proof()
    PC->>PC: choose candidate batch (assignments + timeout)
    PC->>Store: get_prover_input_by_batch_and_version(batch, commit_hash)
    alt input exists and proof not already stored
        PC-->>Prover: BatchResponse(batch_number, input, format)
        Prover->>PC: ProofSubmit(batch_number, batch_proof)
        PC->>Store: store_proof_by_batch_and_type(batch_number, prover_type, proof)
        PC-->>Prover: ProofSubmitACK(batch_number)
    else no work / wrong version
        PC-->>Prover: EmptyBatchResponse or NoBatchForVersion
    end

    PS->>OCP: get_last_verified_batch()
    PS->>OCP: get_last_committed_batch()
    PS->>Store: collect consecutive proofs from last_verified+1
    alt multiple consecutive batches ready
        PS->>TL: verifyBatches(firstBatch, proofs[]...) (if timelock configured)
        TL->>OCP: verifyBatches(firstBatch, proofs[]...)
    else single batch ready
        PS->>TL: verifyBatches(batch, [proof], [proof], [proof])
        TL->>OCP: verifyBatches(...)
    end
    alt tx fails for multi-batch
        PS->>PS: fallback: send each batch individually
    end
    PS->>Store: store_verify_tx_by_batch(batch, tx_hash)
    PS->>Store: set_latest_sent_batch_proof(batch)
    PS->>PS: remove checkpoint(batch-1)

greptile-apps

_{32 files reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

crates/l2/sequencer/proof_coordinator.rs

crates/l2/sequencer/l1_proof_sender.rs

Copilot

Pull request overview

This PR implements distributed proving and multi-batch L1 verification for the ethrex L2. The proof coordinator now assigns different batches to different provers concurrently (keyed by batch number and prover type), with assignment timeout handling. The L1 proof sender collects consecutive proven batches and submits them in a single verifyBatches() transaction, falling back to single-batch sending on any error. Both OnChainProposer contracts (standard and based) have been updated with a new verifyBatches function and refactored internal verification logic. Metrics now include tx_hash labels to track which batches were verified together.

Changes:

Proof coordinator tracks in-memory batch assignments per (batch_number, prover_type) with timeout-based reassignment
L1 proof sender collects consecutive proven batches and sends them in a single verifyBatches() call, with automatic fallback to single-batch sending on errors
Smart contracts add verifyBatches() accepting arrays of proofs, with internal _verifyBatchInternal() helper; based contract now enforces sequential verification

Reviewed changes

Copilot reviewed 32 out of 32 changed files in this pull request and generated 10 comments.

Show a summary per file

File	Description
crates/l2/sequencer/proof_coordinator.rs	Implements distributed batch assignment with timeout tracking and per-prover-type assignments
crates/l2/sequencer/l1_proof_sender.rs	Adds multi-batch collection and verification with fallback logic and invalid proof detection
crates/l2/contracts/src/l1/OnChainProposer.sol	Adds `verifyBatches()` and extracts `_verifyBatchInternal()` helper
crates/l2/contracts/src/l1/based/OnChainProposer.sol	Adds `verifyBatches()` with no access control, adds sequential verification check
crates/l2/contracts/src/l1/Timelock.sol	Updates interface to call `verifyBatches()`
crates/l2/contracts/src/l1/interfaces/*.sol	Interface updates for `verifyBatches()`
crates/l2/prover/src/backend/*.rs	Adds `prover_type()` method to trait and implementations
crates/l2/prover/src/prover.rs	Passes prover_type in BatchRequest
crates/l2/common/src/prover.rs	Adds prover_type field to BatchRequest
crates/l2/tee/quote-gen/src/sender.rs	Adds ProverType::TDX to BatchRequest
crates/blockchain/metrics/l2/metrics.rs	Adds tx_hash label to batch_verification_gas metric
cmd/ethrex/l2/options.rs	Adds prover-timeout CLI option
docs/l2/fundamentals/distributed_proving.md	New comprehensive documentation for distributed proving
docs/**	Updates references from verifyBatch to verifyBatches
metrics/provisioning/grafana/dashboards/l2_dashboards/l2_overview.json	Updates dashboard to show tx_hash in tooltip

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

crates/l2/sequencer/proof_coordinator.rs

crates/l2/sequencer/l1_proof_sender.rs

crates/l2/contracts/src/l1/OnChainProposer.sol

crates/l2/sequencer/l1_proof_sender.rs

crates/l2/sequencer/proof_coordinator.rs

crates/l2/sequencer/l1_proof_sender.rs

docs/l2/fundamentals/distributed_proving.md

crates/l2/contracts/src/l1/based/OnChainProposer.sol

in all external functions (OnChainProposer, based OnChainProposer, Timelock, and their interfaces). This avoids unnecessary calldata-to-memory copies for the proof arrays, saving gas proportional to proof data size. Consistent with verifyBatchesAligned which already used calldata. Also add distributed_proving.md to SUMMARY.md so the mdbook linkcheck passes.

github-actions bot assigned avilagaston9 Feb 9, 2026

github-actions bot added the L2 Rollup client label Feb 9, 2026

github-project-automation bot added this to ethrex_l2 Feb 9, 2026

avilagaston9 added 2 commits February 9, 2026 16:49

Move BatchVerified event emission into _verifyBatchInternal so it fires

c693938

for every batch in a multi-batch verifyBatches call, preserving compatibility with systems that listen for per-batch events.

avilagaston9 force-pushed the feat/l2/distributed-proving branch from 4d416be to c693938 Compare February 9, 2026 20:00

avilagaston9 added 19 commits February 9, 2026 17:14

Reword comment about DB-behind-on-chain sync to be self-contained

6df4400

instead of referencing a previous hotfix that won't exist after merge.

Extract invalid_proof_type() and try_delete_invalid_proof() helpers i…

88419bd

…n the proof sender to deduplicate the pattern that detects and deletes invalid proofs from the store across the multi-batch fallback and single-batch paths.

Extract send_single_batch_proof() to encapsulate the logic for sending a

849bec4

single batch via verifyBatches, handling invalid proof deletion and verify tx hash storage. Used in the multi-batch fallback loop.

Consolidate the invalid proof error handling into a single if-block that

76f4179

branches on batch_count > 1 internally, removing the duplicated error pattern matching.

Return prover input directly from next_batch_to_assign to avoid a red…

94a7b92

…undant database lookup in handle_request, since the input was already fetched and validated during the assignment scan.

Align prover_type changes with PR #6157 to avoid merge conflicts: use…

d09103d

… todo!() for ZisK and OpenVM backends (no ProverType variant exists for them yet) and call self.backend.prover_type() directly instead of storing it as a field.

Add missing prover_type field to BatchRequest in the TDX quote-gen se…

9779c05

…nder, which was broken by the addition of prover_type to the protocol message.

Collapse nested if-let into a single chain in proof_coordinator to sa…

6ab37ca

…tisfy clippy::collapsible_if lint, which was failing the Lint L2 CI check.

Add distributed proving documentation covering architecture, local te…

7ae742f

…sting steps, configuration reference, and how batch assignment, prover timeout, and multi-batch verification work.

Add comment explaining the #[allow(dead_code)] on needed_proof_types in

df807d2

ProofCoordinator: the field is used again in PR #6157 and is preserved to simplify merge conflicts.

Add missing override keyword to verifyBatches in the standard OnChain…

ab6e51f

…Proposer contract, matching the interface declaration and the based contract variant.

Restore NoBatchForVersion response in proof_coordinator when a prover's

444285f

commit hash doesn't match and no batches are available. Previously the response was flattened into a generic empty BatchResponse, losing the signal that tells the prover its binary is outdated.

avilagaston9 added 4 commits February 10, 2026 18:31

Remove the Batch Verification Gas table panel from the Grafana dashboard

9ae1be6

since the Verification Gas by Batch xychart now shows the same information with batch_number on X, gas on Y, and tx_hash in the tooltip. Remaining panels in the row are restored to w=8 each.

avilagaston9 marked this pull request as ready for review February 11, 2026 15:57

avilagaston9 requested a review from ilitteri as a code owner February 11, 2026 15:57

Copilot AI review requested due to automatic review settings February 11, 2026 15:57

avilagaston9 requested review from a team, ManuelBilbao and jrchatruc as code owners February 11, 2026 15:57

Copilot started reviewing on behalf of avilagaston9 February 11, 2026 15:57 View session

greptile-apps bot reviewed Feb 11, 2026

View reviewed changes

crates/l2/sequencer/proof_coordinator.rs Show resolved Hide resolved

crates/l2/sequencer/l1_proof_sender.rs Show resolved Hide resolved

Copilot AI reviewed Feb 11, 2026

View reviewed changes

feat(l2): enable distributed proving and multi-batch verification #6158

Are you sure you want to change the base?

feat(l2): enable distributed proving and multi-batch verification #6158

Uh oh!

Conversation

avilagaston9 commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Description

Checklist

Uh oh!

github-actions bot commented Feb 9, 2026

🤖 Kimi Code Review

Security & Correctness Review

Critical Issues

Logic Issues

Performance & Gas Optimization

Code Quality

Minor Issues

Uh oh!

github-actions bot commented Feb 9, 2026

🤖 Codex Code Review

Uh oh!

github-actions bot commented Feb 9, 2026

🤖 Claude Code Review

Positive Observations

Recommendations Summary

Uh oh!

github-actions bot commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Lines of code report

Uh oh!

github-actions bot commented Feb 11, 2026

🤖 Kimi Code Review

Review Summary

Critical Issues

Security Concerns

Performance & Best Practices

Minor Issues

Recommendations

Uh oh!

github-actions bot commented Feb 11, 2026

🤖 Codex Code Review

Uh oh!

github-actions bot commented Feb 11, 2026

🤖 Claude Code Review

PR #6158 Code Review: Enable Distributed Proving and Multi-Batch Verification

Executive Summary

🔴 Critical Issues

1. Mutex Poisoning Recovery is Insufficient

2. Race Condition in Proof Deletion

3. Based Contract Sequential Check Missing Pre-Verification

⚠️ High-Priority Issues

4. Unbounded Batch Scan in Proof Coordinator

5. Invalid Proof Detection is Brittle

6. Proof Verification Gas Attribution May Be Incorrect

🟡 Medium-Priority Issues

7. Assignment Cleanup Only on Proof Submission

8. Silent Metric Failures

9. Potential for Checkpoint Deletion Race

🟢 Minor Issues & Improvements

10. Inconsistent Access Control Between Contracts

11. Missing Configuration Validation

12. Duplicate Code in Contract Verification

13. Array Length Mismatch Check Could Be More Specific

🔵 Code Quality & Style

14. Inconsistent Error Format in Contracts

15. Complex Boolean Logic in Lock Guards

16. Magic Number in Aligned Proof System ID

✅ Positive Aspects

📋 Recommendations Summary

Test Coverage Gaps

Security Considerations

Uh oh!

greptile-apps bot commented Feb 11, 2026

Greptile Overview

Greptile Summary

Confidence Score: 2/5

Important Files Changed

Sequence Diagram

avilagaston9 commented Feb 9, 2026 •

edited

Loading

github-actions bot commented Feb 9, 2026 •

edited

Loading