Skip to content

Conversation

@avilagaston9
Copy link
Contributor

@avilagaston9 avilagaston9 commented Feb 9, 2026

Motivation

The proof coordinator currently assigns the same batch to every prover that requests work, meaning only one prover can be active at a time. This is a bottleneck when multiple provers are available. Additionally, the proof sender verifies one batch per L1 transaction even when multiple proofs are ready, wasting gas on separate transactions.

Description

Proof Coordinator — distributed batch assignment:

  • Track in-flight batch assignments with timestamps using Arc<std::sync::Mutex<HashMap<(u64, ProverType), Instant>>> (two-phase lock pattern: brief mutex for scan+assign, storage validation outside lock)
  • When a prover requests work, assign the first unassigned or timed-out batch — different provers get different batches
  • Clean up assignments when all proof types arrive for a batch or when batches are verified on-chain
  • New CLI flag --proof-coordinator.prover-timeout (default 600s, env ETHREX_PROOF_COORDINATOR_PROVER_TIMEOUT) controls stale assignment timeout

L1 Proof Sender — multi-batch verification:

  • Collect all consecutive proven batches from last_verified_batch + 1 and send them in a single verifyBatches() transaction
  • Always uses verifyBatches() (with a single-element array when only one batch is ready)
  • On any multi-batch error, fall back to per-batch sending — this prevents the sequencer from getting stuck on gas limit or calldata size issues (see L2: Add a configurable cap to the number of batches per verifyBatches transaction #6173 for adding a proper cap)
  • On invalid proof revert during single-batch fallback, delete the offending proof from the store
  • Invalid proof detection matches both full error messages (based contract) and error codes (standard contract); see L2: normalize error codes between OnChainProposer contracts #6098 for normalizing these across contracts

OnChainProposer contracts (standard + based):

  • Extract shared verification logic into _verifyBatchInternal() to avoid code duplication
  • Add verifyBatches(uint256, bytes[], bytes[], bytes[]) that loops over _verifyBatchInternal()
  • Use calldata instead of memory for proof array parameters in external functions, avoiding unnecessary calldata-to-memory copies (consistent with verifyBatchesAligned)
  • Critical ordering preserved: _getPublicInputsFromCommitment called before lastVerifiedBatch update
  • Based contract now enforces sequential verification (batchNumber == lastVerifiedBatch + 1), fixing a pre-existing gap
  • Timelock and interface updates for verifyBatches

Metrics & Grafana:

  • Add tx_hash label to batch_verification_gas metric so batches verified in the same multi-batch tx share the same gas value and tx hash
  • New "Verification Gas by Batch" xychart panel (batch_number on X, gas on Y, tx_hash in tooltip)
image

Aligned mode is unchanged — it already supports multi-batch via L1ProofVerifier.

Note: The prover_type() method on ProverBackend and the prover_type field in BatchRequest overlap with #6157. OpenVM and ZisK use unimplemented!() since they are not yet enabled as L2 backends. Whichever PR lands first, the other will resolve on rebase.

Checklist

  • Updated STORE_SCHEMA_VERSION (crates/storage/lib.rs) if the PR includes breaking changes to the Store requiring a re-sync.

verification in the proof sender and OnChainProposer contracts.

The proof coordinator now tracks batch-to-prover assignments with timestamps
using an in-memory map behind std::sync::Mutex (two-phase lock pattern to
avoid holding locks across .await points). When a prover requests work, the
coordinator assigns the first unassigned or timed-out batch, allowing
multiple provers to work on different batches concurrently. Assignments are
cleaned up when all proof types arrive for a batch.

The L1 proof sender collects all consecutive proven batches starting from
last_verified_batch + 1 and sends them in a single verifyBatches() contract
call. Falls back to single-batch verifyBatch() for one batch (backward
compat) and to per-batch sending on invalid proof reverts to isolate the
failing batch.

Both OnChainProposer contracts (standard and based) extract shared
verification logic into _verifyBatchInternal() and add a verifyBatches()
function that loops over it. The based contract now also enforces sequential
batch verification (batchNumber == lastVerifiedBatch + 1), fixing a
pre-existing gap. Timelock forwarding added for verifyBatches.

New CLI flag: --proof-coordinator.prover-timeout (default 600s) controls
how long before a batch assignment is considered stale and reassigned.
@github-actions
Copy link

github-actions bot commented Feb 9, 2026

🤖 Kimi Code Review

Security & Correctness Review

Critical Issues

  1. Race Condition in Proof Assignment (crates/l2/sequencer/proof_coordinator.rs:147-213)

    • The next_batch_to_assign function has a TOCTOU race between checking if a batch has prover input and assigning it. A prover could receive an assignment for a batch that gets deleted/updated between the check and assignment.
    • Fix: Move the prover input check inside the locked section or use a database transaction.
  2. Missing Re-entrancy Protection (crates/l2/contracts/src/l1/OnChainProposer.sol:404-471)

    • The _verifyBatchInternal function updates state (lastVerifiedBatch) after external calls to proof verifiers. While the current verifiers may not be re-entrant, this pattern is dangerous.
    • Fix: Use re-entrancy guards or follow checks-effects-interactions pattern more strictly.

Logic Issues

  1. Incorrect Batch Assignment Logic (crates/l2/sequencer/proof_coordinator.rs:167-183)

    • The loop logic for finding the next batch to assign is flawed. It increments batch even when a batch is assigned but not timed out, potentially skipping valid batches.
    • Fix: The loop should continue scanning, not break, when a batch is assigned but not timed out.
  2. Array Length Mismatch Handling (crates/l2/contracts/src/l1/OnChainProposer.sol:450-456)

    • The verifyBatches function only checks array lengths match each other, but doesn't validate against the actual number of batches being verified.
    • Fix: Add explicit validation that all arrays have the same length as the batch range.

Performance & Gas Optimization

  1. Inefficient Batch Verification (crates/l2/contracts/src/l1/OnChainProposer.sol:450-471)

    • The loop in verifyBatches could be more gas efficient by caching firstBatchNumber + i in a local variable.
    • Fix: Use uint256 currentBatch = firstBatchNumber + i to avoid repeated addition.
  2. Unnecessary State Updates (crates/l2/sequencer/proof_coordinator.rs:348-355)

    • The proof coordinator removes assignments from the HashMap even when all proofs exist, but this check runs for every proof submission.
    • Fix: Only run the completeness check when a proof is actually submitted, not when checking existence.

Code Quality

  1. Inconsistent Error Handling (crates/l2/sequencer/l1_proof_sender.rs:189-213)

    • The fallback mechanism for invalid proofs in send_batches_proof_to_contract silently continues on individual failures, which could mask systematic issues.
    • Fix: Log which specific batch failed and consider rate limiting the fallback.
  2. Magic Numbers (cmd/ethrex/l2/options.rs:781)

    • The default timeout of 600000ms (10 minutes) is hardcoded in multiple places.
    • Fix: Define as a named constant, e.g., const DEFAULT_PROVER_TIMEOUT_MS: u64 = 600_000;

Minor Issues

  1. Documentation Mismatch (crates/l2/contracts/src/l1/OnChainProposer.sol:355-361)

    • The comment mentions "subsequent block" but should say "subsequent batch".
  2. Redundant Checks (crates/l2/sequencer/l1_proof_sender.rs:240-255)

    • The check for ready_batches.is_empty() is redundant since the subsequent match handles the empty case.

The PR introduces important batch verification functionality but needs attention to race conditions and state consistency.


Automated review by Kimi (Moonshot AI) · custom prompt

@github-actions
Copy link

github-actions bot commented Feb 9, 2026

🤖 Codex Code Review

Found a couple of correctness issues to address; rest looks reasonable.

Findings

  • Potential starvation / skipped batches (correctness): crates/l2/sequencer/proof_coordinator.rs in next_batch_to_assign. The new logic assigns only the base batch (latest_sent + 1) and returns Ok(None) if there’s no prover input for that batch, without scanning forward to the next batch that does have input (previous loop did). This can stall proving indefinitely if the first batch lacks input for the requested commit_hash but later batches are ready. Suggest restoring a forward scan similar to the previous while loop (and likely contains_batch gating) before assigning.
    Path: crates/l2/sequencer/proof_coordinator.rs (function next_batch_to_assign).

  • Behavior regression on version mismatch (correctness / operability): crates/l2/sequencer/proof_coordinator.rs in handle_request. When next_batch_to_assign returns None, you now send empty_batch_response instead of no_batch_for_version. This removes the explicit signal to provers that the requested commit_hash is not available, which can keep them idling on a bad version. If the intention is to keep the old behavior, consider returning no_batch_for_version when the assigned batch exists but has no input for that version.
    Path: crates/l2/sequencer/proof_coordinator.rs (function handle_request).

Notes

  • Multi-batch verification path in crates/l2/sequencer/l1_proof_sender.rs looks consistent. The checkpoint cleanup loop removes batch-1 for each batch in the sent range, which matches the prior single-batch cleanup behavior.
  • The Solidity refactor into _verifyBatchInternal and the new verifyBatches entrypoints appear correct and maintain sequencing/gas logic. Emitting BatchVerified once at the end is fine as long as that’s the desired event granularity.

If you want, I can propose a concrete patch for the next_batch_to_assign scan logic and the response behavior.


Automated review by OpenAI Codex · custom prompt

@github-actions
Copy link

github-actions bot commented Feb 9, 2026

🤖 Claude Code Review

Indeed, the based contract is missing the L2 message removal logic. This appears to be a pre-existing issue that wasn't introduced by this PR, but it's worth noting. Actually, this is NOT introduced by this PR - let me verify if it existed before:

The diff shows this is just a refactoring of existing verifyBatch logic into _verifyBatchInternal, so if L2 message removal was missing before, it's still missing now. This is not a new issue introduced by this PR.


Positive Observations

  1. Good mutex discipline: The proof coordinator correctly uses std::sync::Mutex instead of tokio::sync::Mutex and never holds the lock across await points.

  2. Sequential verification preserved: The contracts correctly enforce batchNumber == lastVerifiedBatch + 1 both at the start of verifyBatches and within _verifyBatchInternal, preventing gaps.

  3. Proper state ordering: The contracts maintain the critical ordering of calling _getPublicInputsFromCommitment BEFORE updating lastVerifiedBatch, as documented.

  4. Fallback mechanism: The intelligent fallback from multi-batch to single-batch on invalid proof helps isolate bad proofs.

  5. Backward compatibility: Single-batch path is preserved for compatibility.

  6. Assignment cleanup: The code properly cleans up assignments when all proofs are complete (line 348-350, though needs better error handling).


Recommendations Summary

Must fix before merge:

  1. Fix lock poisoning error handling (docs: add milestones #1, feat: add fmt, test and clippy to CI #11)
  2. Add maximum batch count limit (build: add Github actions boilerplate #3)
  3. Fix assignment cleanup on storage error (feat: add basic RPC api. #5)

Should fix before merge:
4. Improve event emission for verifyBatches (#4)
5. Move checkpoint cleanup after tx confirmation (#7)
6. Clarify fallback behavior in comments (#6)

Nice to have:
7. Consistent error messages (#9)
8. Better documentation on timeout behavior (#10)
9. Validate timeout configuration (#12)


Automated review by Claude (Anthropic) · custom prompt

@github-actions
Copy link

github-actions bot commented Feb 9, 2026

Lines of code report

Total lines added: 182
Total lines removed: 0
Total lines changed: 182

Detailed view
+-------------------------------------------------+-------+------+
| File                                            | Lines | Diff |
+-------------------------------------------------+-------+------+
| ethrex/cmd/ethrex/l2/options.rs                 | 1081  | +11  |
+-------------------------------------------------+-------+------+
| ethrex/crates/blockchain/metrics/l2/metrics.rs  | 282   | +1   |
+-------------------------------------------------+-------+------+
| ethrex/crates/l2/common/src/prover.rs           | 196   | +6   |
+-------------------------------------------------+-------+------+
| ethrex/crates/l2/prover/src/backend/exec.rs     | 67    | +3   |
+-------------------------------------------------+-------+------+
| ethrex/crates/l2/prover/src/backend/mod.rs      | 96    | +1   |
+-------------------------------------------------+-------+------+
| ethrex/crates/l2/prover/src/backend/openvm.rs   | 103   | +3   |
+-------------------------------------------------+-------+------+
| ethrex/crates/l2/prover/src/backend/risc0.rs    | 131   | +3   |
+-------------------------------------------------+-------+------+
| ethrex/crates/l2/prover/src/backend/sp1.rs      | 182   | +3   |
+-------------------------------------------------+-------+------+
| ethrex/crates/l2/prover/src/backend/zisk.rs     | 158   | +3   |
+-------------------------------------------------+-------+------+
| ethrex/crates/l2/prover/src/prover.rs           | 186   | +1   |
+-------------------------------------------------+-------+------+
| ethrex/crates/l2/sequencer/configs.rs           | 108   | +1   |
+-------------------------------------------------+-------+------+
| ethrex/crates/l2/sequencer/l1_proof_sender.rs   | 593   | +126 |
+-------------------------------------------------+-------+------+
| ethrex/crates/l2/sequencer/proof_coordinator.rs | 445   | +19  |
+-------------------------------------------------+-------+------+
| ethrex/crates/l2/tee/quote-gen/src/sender.rs    | 86    | +1   |
+-------------------------------------------------+-------+------+

…Timelock

in favor of using verifyBatches() for both single and multi-batch verification.
The proof sender now always calls verifyBatches (with a single-element array
when only one batch is ready). Extracted send_verify_batches_tx as a helper
to avoid async recursion in the fallback path that isolates invalid proofs.
for every batch in a multi-batch verifyBatches call, preserving compatibility
with systems that listen for per-batch events.
@avilagaston9 avilagaston9 force-pushed the feat/l2/distributed-proving branch from 4d416be to c693938 Compare February 9, 2026 20:00
instead of referencing a previous hotfix that won't exist after merge.
…n the

proof sender to deduplicate the pattern that detects and deletes invalid
proofs from the store across the multi-batch fallback and single-batch paths.
single batch via verifyBatches, handling invalid proof deletion and verify
tx hash storage. Used in the multi-batch fallback loop.
branches on batch_count > 1 internally, removing the duplicated error
pattern matching.
… into

a single assignments map. The assignment Instant is now used for both timeout
detection and proving time metrics, removing the need for a separate
SystemTime-based map and the tokio::sync::Mutex dependency.
…ordinator

skips fully-proved batches when scanning for work to assign. This removes the
redundant check from handle_request since next_batch_to_assign now guarantees
that returned batches are missing at least one proof.
…undant

database lookup in handle_request, since the input was already fetched and
validated during the assignment scan.
…nstead of

just batch_number, so that e.g. a RISC0 prover requesting work doesn't block
an SP1 prover from working on the same batch. Adds prover_type to the
BatchRequest protocol message and a prover_type() method to the ProverBackend
trait so each backend identifies itself when requesting assignments.
… todo!()

for ZisK and OpenVM backends (no ProverType variant exists for them yet) and
call self.backend.prover_type() directly instead of storing it as a field.
…nder,

which was broken by the addition of prover_type to the protocol message.
…tisfy

clippy::collapsible_if lint, which was failing the Lint L2 CI check.
…sting

steps, configuration reference, and how batch assignment, prover timeout,
and multi-batch verification work.
ProofCoordinator: the field is used again in PR #6157 and is preserved
to simplify merge conflicts.
…n gas

metric, and fix invalid_proof_type to match both error codes and full messages.

The require(firstBatchNumber == lastVerifiedBatch + 1) in verifyBatches was
redundant since _verifyBatchInternal already enforces sequential verification
on every iteration. The batch verification gas metric was lost during the
multi-batch refactor and is now restored. The invalid_proof_type function now
matches the standard contract's error codes ("00c", "00e", "00g") in addition
to the based contract's full error messages.
assignment scan in the proof coordinator.

The multi-batch path now records amortized gas (total_gas / batch_count) for
each batch number so the Grafana dashboard (grouped by batch_number) gets a
data point for every batch instead of only the first. The single-batch fallback
path also records gas now. The inner assignment scan in next_batch_to_assign is
bounded by the map size to prevent unbounded iteration under the lock.
…fana

dashboard from an XY chart to a table showing batch_number, tx_gas, and
tx_hash columns.

Batches verified in the same multi-batch transaction now share the same
tx_gas and tx_hash values, making it clear which batches were grouped
together. The single-batch fallback path also records the metric.
…Proposer

contract, matching the interface declaration and the based contract variant.
commit hash doesn't match and no batches are available. Previously the
response was flattened into a generic empty BatchResponse, losing the
signal that tells the prover its binary is outdated.
showing batch_number on X axis and verification gas on Y with tx_hash
visible in the tooltip. Existing panels in the row are narrowed from w=8
to w=6 to fit the new panel alongside the Batch Verification Gas table.
since the Verification Gas by Batch xychart now shows the same information
with batch_number on X, gas on Y, and tx_hash in the tooltip. Remaining
panels in the row are restored to w=8 each.
…just

invalid proof reverts. Previously an out-of-gas or calldata-too-large
failure would propagate without fallback, causing an infinite retry loop
since the same unbounded batch count would be collected on the next tick.
storage, checkpoint cleanup, latest_sent_batch_proof update) happens
exactly once per batch inside send_batches_proof_to_contract. Extract
finalize_batch_proof helper, rename verify_and_send_proof to
verify_and_send_proofs, and widen the multi-batch fallback to trigger
on any error so the sequencer never gets stuck retrying an unbounded
batch count that exceeds gas or calldata limits.
…mplemented!),

make send_batches_proof_to_contract private, add natspec to based contract's
verifyBatches noting it has no access control, update all L2 docs replacing
verifyBatch references with verifyBatches, move distributed_proving.md from
docs/prover/ to docs/l2/fundamentals/ since it describes the interaction between
proof coordinator, proof sender, and provers rather than prover internals, and
restructure the doc to be explanation-first with the testing guide at the end.
@avilagaston9 avilagaston9 marked this pull request as ready for review February 11, 2026 15:57
Copilot AI review requested due to automatic review settings February 11, 2026 15:57
@github-actions
Copy link

🤖 Kimi Code Review

Review Summary

This PR introduces significant improvements to the L2 proving system, focusing on distributed proving and multi-batch verification. The changes are well-structured and follow good practices, but there are a few areas that need attention.

Critical Issues

  1. Potential Reentrancy in _verifyBatchInternal (contracts)

    • In both OnChainProposer.sol files, _verifyBatchInternal is marked as internal but is called within a loop in verifyBatches. While not a traditional reentrancy issue, the function modifies state (lastVerifiedBatch, batchCommitments) and emits events. Consider adding a reentrancy guard or ensuring atomicity.
  2. Integer Overflow in Batch Assignment (proof_coordinator.rs:152-175)

    • The calculation max_batch = base_batch.saturating_add(u64::try_from(assignments.len()).unwrap_or(u64::MAX)) could overflow if assignments.len() is very large. Use saturating_add consistently.
  3. Race Condition in Batch Assignment (proof_coordinator.rs)

    • The assignment tracking uses Arc<Mutex<HashMap>> but the lock is held briefly. However, between checking availability and inserting the assignment, another prover could claim the same batch. Consider using a more robust synchronization mechanism.

Security Concerns

  1. No Access Control in Based OnChainProposer (contracts)

    • The based OnChainProposer.sol removes access control from verifyBatches(). While this is intentional for decentralization, ensure this aligns with the security model as anyone can now verify batches.
  2. Calldata Size Limits (l1_proof_sender.rs:427-500)

    • The multi-batch verification could hit Ethereum's calldata size limits. Consider adding a maximum batch count check to prevent DoS.

Performance & Best Practices

  1. Memory Usage in send_verify_batches_tx (l1_proof_sender.rs:427-500)

    • The function creates multiple temporary vectors (risc0_array, sp1_array, tdx_array) which could be memory-intensive for large batch counts. Consider streaming or chunking.
  2. Error Handling in ProverType Detection (l1_proof_sender.rs:457-470)

    • The invalid_proof_type function uses string matching which could be brittle. Consider using error codes or structured error types.
  3. Metrics Label Cardinality (metrics.rs:79-89)

    • Adding tx_hash to the batch_verification_gas metric increases label cardinality significantly. Ensure this won't cause performance issues with Prometheus.

Minor Issues

  1. Typo in Grafana Dashboard (l2_overview.json:504)

    • "unit": "sishort" appears to be a typo - should likely be "short".
  2. Unimplemented ProverType (backend/openvm.rs:67, backend/zisk.rs:116)

    • Both OpenVM and Zisk backends return unimplemented!() for prover_type(). Consider returning a proper error or implementing these.
  3. Documentation Consistency

    • The new distributed_proving.md is excellent, but ensure all references to verifyBatch are updated to verifyBatches throughout the codebase.

Recommendations

  1. Add a maximum batch count parameter to prevent calldata overflow
  2. Implement proper error handling for the unimplemented prover types
  3. Consider adding a configuration parameter for maximum concurrent assignments per prover type
  4. Add unit tests for the new batch assignment logic
  5. Consider adding a metric for assignment timeout occurrences

The PR is well-architected and the changes are necessary for distributed proving. The issues identified are manageable and don't block merging, but should be addressed in follow-up PRs.


Automated review by Kimi (Moonshot AI) · custom prompt

@github-actions
Copy link

🤖 Codex Code Review

Main concerns

  • High‑cardinality Prometheus label: tx_hash on batch_verification_gas will create a new time series per tx hash and can explode memory/CPU in Prometheus/Grafana (especially for long‑running L2s). Consider keeping batch_number only, or moving the hash to logs/exemplars instead. Location: crates/blockchain/metrics/l2/metrics.rs in record_batch_verification_gas and the IntGaugeVec definition.
  • Gas overhead from memory arrays: the new verifyBatches functions take bytes[] memory, which copies calldata into memory and scales poorly with multi‑batch payloads. Use calldata for external functions and in interfaces. Locations: crates/l2/contracts/src/l1/OnChainProposer.sol verifyBatches, crates/l2/contracts/src/l1/based/OnChainProposer.sol verifyBatches, and both IOnChainProposer.sol and ITimelock.sol interfaces.

Other notes

  • Protocol change: adding prover_type to ProofData::BatchRequest is a wire‑format change. If any mixed‑version deployment is possible, ensure the coordinator/prover handshake rejects old clients early (commit hash seems to handle this). Location: crates/l2/common/src/prover.rs ProofData::BatchRequest.
  • Grafana panel rename/label change looks consistent with the metric label change, but be aware any existing dashboards or alerts relying on the old label set will break. Location: metrics/provisioning/grafana/dashboards/l2_dashboards/l2_overview.json.

If you want, I can scan the repo for any remaining verifyBatch call sites or interface mismatches.


Automated review by OpenAI Codex · custom prompt

@github-actions
Copy link

🤖 Claude Code Review

Now let me create a comprehensive review. Based on my analysis, I'll provide detailed feedback on the PR:

PR #6158 Code Review: Enable Distributed Proving and Multi-Batch Verification

Executive Summary

This PR implements two major features: distributed proving (allowing multiple provers to work concurrently) and multi-batch verification (reducing L1 gas costs). The changes are substantial but well-structured. I've identified several critical issues that need addressing, along with recommendations for improvements.


🔴 Critical Issues

1. Mutex Poisoning Recovery is Insufficient

Location: proof_coordinator.rs:153-155, 189-191, 297-299

The code handles poisoned mutex with simple error returns, but doesn't attempt recovery. In a production blockchain system, this could halt all proving operations:

let mut assignments = self.assignments.lock().map_err(|_| {
    ProofCoordinatorError::Custom("Assignment lock poisoned".to_string())
})?;

Risk: If any thread panics while holding the lock, all future proof assignments fail permanently.

Recommendation: Use PoisonError::into_inner() to recover the mutex guard and log the poisoning event, or use an alternative like parking_lot::Mutex which doesn't poison.

2. Race Condition in Proof Deletion

Location: l1_proof_sender.rs:508-520

The try_delete_invalid_proof function has a race condition with concurrent proof submissions:

async fn try_delete_invalid_proof(&self, message: &str, batch_number: u64) -> Result<(), ProofSenderError> {
    if let Some(proof_type) = Self::invalid_proof_type(message) {
        warn!("Deleting invalid {proof_type:?} proof for batch {batch_number}");
        self.rollup_store
            .delete_proof_by_batch_and_type(batch_number, proof_type)
            .await?;
    }
    Ok(())
}

Risk: A prover could submit a new proof between the error detection and deletion, causing a valid proof to be incorrectly deleted.

Recommendation: Add batch version tracking or check the proof timestamp before deletion.

3. Based Contract Sequential Check Missing Pre-Verification

Location: based/OnChainProposer.sol:343

The based contract enforces sequential verification but only inside _verifyBatchInternal:

require(
    batchNumber == lastVerifiedBatch + 1,
    "OnChainProposer: batch already verified"
);

Risk: If verifyBatches() is called with non-sequential batches (e.g., [5, 6, 8]), it will fail on the third batch after processing the first two, wasting gas.

Recommendation: Add an upfront check in verifyBatches() at line 452 before the loop:

require(
    firstBatchNumber == lastVerifiedBatch + 1,
    "OnChainProposer: first batch must be next to verify"
);

⚠️ High-Priority Issues

4. Unbounded Batch Scan in Proof Coordinator

Location: proof_coordinator.rs:163-164

let max_batch = base_batch.saturating_add(u64::try_from(assignments.len()).unwrap_or(u64::MAX));

Issue: The scan upper bound is based on the number of assignments, which could be large if there are many stale assignments. This could cause performance degradation.

Recommendation: Add a configurable maximum scan range (e.g., 100 batches) to prevent unbounded searches.

5. Invalid Proof Detection is Brittle

Location: l1_proof_sender.rs:493-504

Error detection relies on string matching:

fn invalid_proof_type(message: &str) -> Option<ProverType> {
    if message.contains("Invalid TDX proof") || message.contains("00g") {
        Some(ProverType::TDX)
    } else if message.contains("Invalid RISC0 proof") || message.contains("00c") {
        Some(ProverType::RISC0)
    } else if message.contains("Invalid SP1 proof") || message.contains("00e") {
        Some(ProverType::SP1)
    } else {
        None
    }
}

Issue: String matching on error messages is fragile. If contract error messages change, this breaks silently.

Recommendation:

  1. Parse the actual revert data/error selector instead of string matching
  2. Add integration tests that verify error detection works correctly

6. Proof Verification Gas Attribution May Be Incorrect

Location: l1_proof_sender.rs:617-628

In multi-batch verification, all batches get the same total gas cost:

let tx_gas: i64 = verify_tx_receipt.tx_info.gas_used.try_into()?;
for (batch_number, _) in batches {
    METRICS.set_batch_verification_gas(*batch_number, tx_gas, &tx_hash_str)?;
}

Issue: This makes gas metrics misleading—each batch appears to cost the full transaction gas instead of its share.

Recommendation: Either:

  • Divide gas equally among batches
  • Keep the current approach but document it clearly in metrics/dashboards
  • Add a separate metric for multi-batch transactions

🟡 Medium-Priority Issues

7. Assignment Cleanup Only on Proof Submission

Location: proof_coordinator.rs:296-299

Assignments are removed only when proofs are submitted, not when batches are verified on-chain:

if let Ok(mut assignments) = self.assignments.lock() {
    assignments.remove(&(batch_number, prover_type));
}

Issue: If a prover is assigned a batch but never submits a proof, and the batch is verified via another path (e.g., from store), the assignment persists until timeout.

Recommendation: Also clean up assignments when batches are verified on-chain (in the L1 proof sender or via a periodic cleanup task).

8. Silent Metric Failures

Location: proof_coordinator.rs:279-289

Metrics recording happens inside a complex conditional with silent failures:

metrics!(if let Ok(assignments) = self.assignments.lock()
    && let Some(&assigned_at) = assignments.get(&(batch_number, prover_type))
{
    // ... metric recording
});

Issue: If the mutex is poisoned or the assignment is missing, no metric is recorded and no warning is logged.

Recommendation: Log warnings when metrics fail to record.

9. Potential for Checkpoint Deletion Race

Location: l1_proof_sender.rs:524-538

let checkpoint_path = self
    .checkpoints_dir
    .join(batch_checkpoint_name(batch_number - 1));
if checkpoint_path.exists() {
    let _ = remove_dir_all(&checkpoint_path).inspect_err(|e| {
        error!("Failed to remove checkpoint directory at path {checkpoint_path:?}...")
    });
}

Issue: If multiple batches are verified in parallel (through the fallback path), they could both try to delete batch_number - 1's checkpoint.

Recommendation: Add proper synchronization or use remove_dir_all only if not already deleted.


🟢 Minor Issues & Improvements

10. Inconsistent Access Control Between Contracts

Location: Standard vs Based OnChainProposer.sol

  • Standard contract: verifyBatches() has onlyOwner whenNotPaused (line 477)
  • Based contract: verifyBatches() has no access control (line 436 comment)

Issue: This inconsistency is confusing. The based contract's natspec says "no access control" but doesn't explain the security implications.

Recommendation: Add a detailed natspec comment in the based contract explaining why this is safe (presumably because proofs are verified regardless of caller).

11. Missing Configuration Validation

Location: proof_coordinator.rs:98

prover_timeout: Duration::from_millis(config.proof_coordinator.prover_timeout_ms),

Issue: No validation that prover_timeout_ms is reasonable (not 0, not extremely large).

Recommendation: Add validation in config parsing:

if prover_timeout_ms == 0 || prover_timeout_ms > 3_600_000 {
    return Err("prover_timeout_ms must be between 1ms and 1 hour");
}

12. Duplicate Code in Contract Verification

Location: Both OnChainProposer.sol contracts

The _verifyBatchInternal function is nearly identical between standard and based contracts. The based version has simpler BatchCommitmentInfo but shares ~90% of the logic.

Recommendation: Consider creating a shared internal library for common verification logic to reduce maintenance burden.

13. Array Length Mismatch Check Could Be More Specific

Location: OnChainProposer.sol:485-487

require(
    sp1ProofsBytes.length == batchCount && tdxSignatures.length == batchCount,
    "OnChainProposer: array length mismatch"
);

Issue: The error message doesn't indicate which array has the wrong length.

Recommendation: Use separate checks with specific error messages for better debugging.


🔵 Code Quality & Style

14. Inconsistent Error Format in Contracts

Standard contract uses error codes ("009", "00a") while based contract uses full messages. This was noted in the PR description (issue #6098) but creates confusion.

15. Complex Boolean Logic in Lock Guards

The pattern if let Ok(assignments) = self.assignments.lock() is used frequently. Consider extracting to a helper method:

fn with_assignments<F, R>(&self, f: F) -> Option<R>
where F: FnOnce(&mut HashMap<(u64, ProverType), Instant>) -> R
{
    self.assignments.lock().ok().map(|mut guard| f(&mut guard))
}

16. Magic Number in Aligned Proof System ID

proof_coordinator.rs:line 52 defines ALIGNED_SP1_PROVING_SYSTEM_ID = 1 but this should be validated against the actual Aligned Layer constant.


✅ Positive Aspects

  1. Excellent Documentation: The PR description is comprehensive with clear motivation and technical details
  2. Proper Error Handling: Generally good error propagation with context
  3. Backwards Compatibility: Fallback logic for single-batch sending prevents breaking existing deployments
  4. Testing Considerations: The PR mentions issues with gas caps (L2: Add a configurable cap to the number of batches per verifyBatches transaction #6173) showing awareness of edge cases
  5. Clean Abstraction: The _verifyBatchInternal extraction is a good refactoring
  6. Metrics Integration: Good addition of tx_hash label for gas tracking

📋 Recommendations Summary

Must Fix Before Merge:

  1. Add mutex poisoning recovery or use a non-poisoning mutex
  2. Add race condition protection for proof deletion
  3. Add sequential batch validation in based contract's verifyBatches()

Should Fix:
4. Add bounded batch scan range
5. Improve error detection reliability (use error selectors, not strings)
6. Clarify gas metric attribution or fix division

Nice to Have:
7. Add assignment cleanup on on-chain verification
8. Log metric recording failures
9. Add checkpoint deletion synchronization
10. Document access control differences
11. Validate configuration parameters
12. Consolidate duplicate Solidity code


Test Coverage Gaps

Based on the code changes, ensure these scenarios are tested:

  1. Multiple provers requesting work simultaneously
  2. Prover timeout and batch reassignment
  3. Multi-batch transaction failure scenarios
  4. Invalid proof detection and deletion
  5. Mutex contention under high load
  6. Checkpoint cleanup with concurrent verifications
  7. Gas limit exceeded on multi-batch (mentioned in L2: Add a configurable cap to the number of batches per verifyBatches transaction #6173)

Security Considerations

Good:

  • Proof verification is done before state updates
  • Sequential batch verification enforced
  • Invalid proofs are deleted to prevent replay

⚠️ Concerns:

  • Race conditions in proof deletion
  • Unbounded operations in coordinator
  • Brittle error detection

Automated review by Claude (Anthropic) · custom prompt

@greptile-apps
Copy link

greptile-apps bot commented Feb 11, 2026

Greptile Overview

Greptile Summary

This PR enables (1) distributed proving by having the proof coordinator assign different batches to different provers (tracked in-memory with per-(batch, prover_type) timestamps + timeout), and (2) multi-batch L1 verification by having the proof sender bundle consecutive proven batches into a single verifyBatches() transaction with a single-batch fallback on error.

On the contract side, both standard and based OnChainProposer variants add verifyBatches(uint256, bytes[], bytes[], bytes[]) and factor shared logic into _verifyBatchInternal(), with interfaces/timelock/docs updated accordingly. Metrics and Grafana are updated so batch_verification_gas is labeled by both batch_number and tx_hash to handle multi-batch transactions.

Two correctness issues in the new coordinator/sender logic should be addressed before merge: the coordinator can return “no work” too early when the first chosen candidate batch lacks prover input for a commit hash, and the sender’s multi-batch calldata builder currently takes only the first Value from each proof’s calldata vector, which can truncate proofs if any backend encodes more than one ABI value per proof.

Confidence Score: 2/5

  • This PR has two must-fix correctness issues in the new coordinator/sender paths.
  • Distributed proving + multi-batch verification are substantial changes and the contract/interface updates look consistent, but the current ProofCoordinator can incorrectly report no work even when later batches are available, and the L1ProofSender’s verifyBatches calldata construction can truncate proofs by dropping all but the first ABI value per proof. These can lead to stalled proving and invalid verification transactions.
  • crates/l2/sequencer/proof_coordinator.rs; crates/l2/sequencer/l1_proof_sender.rs

Important Files Changed

Filename Overview
cmd/ethrex/l2/options.rs Adds proof coordinator prover timeout CLI/env config and wires it into SequencerConfig.
crates/blockchain/metrics/l2/metrics.rs Adds tx_hash label to batch_verification_gas metric and updates setter signature accordingly.
crates/l2/common/src/prover.rs Extends ProofData::BatchRequest to include prover_type and updates builder; BatchProof::calldata remains Vec (relevant to sender encoding bug).
crates/l2/contracts/src/l1/OnChainProposer.sol Adds verifyBatches and factors verification into _verifyBatchInternal; removes verifyBatch entrypoint.
crates/l2/contracts/src/l1/based/OnChainProposer.sol Adds verifyBatches and sequential verification requirement; factors logic into _verifyBatchInternal.
crates/l2/sequencer/l1_proof_sender.rs Implements multi-batch verifyBatches sending with fallback; introduces calldata construction that takes only first Value from proof.calldata() (must-fix).
crates/l2/sequencer/proof_coordinator.rs Implements distributed assignment with timeouts; new next_batch_to_assign returns None on first missing input which can hide later available work (must-fix).
metrics/provisioning/grafana/dashboards/l2_dashboards/l2_overview.json Dashboard updated: remove redundant table and add tx_hash tooltip/xychart changes for verification gas.

Sequence Diagram

sequenceDiagram
    participant Prover as Prover Client
    participant PC as ProofCoordinator
    participant Store as Rollup Store
    participant PS as L1ProofSender
    participant TL as Timelock (optional)
    participant OCP as OnChainProposer

    Prover->>PC: BatchRequest(commit_hash, prover_type)
    PC->>Store: get_latest_sent_batch_proof()
    PC->>PC: choose candidate batch (assignments + timeout)
    PC->>Store: get_prover_input_by_batch_and_version(batch, commit_hash)
    alt input exists and proof not already stored
        PC-->>Prover: BatchResponse(batch_number, input, format)
        Prover->>PC: ProofSubmit(batch_number, batch_proof)
        PC->>Store: store_proof_by_batch_and_type(batch_number, prover_type, proof)
        PC-->>Prover: ProofSubmitACK(batch_number)
    else no work / wrong version
        PC-->>Prover: EmptyBatchResponse or NoBatchForVersion
    end

    PS->>OCP: get_last_verified_batch()
    PS->>OCP: get_last_committed_batch()
    PS->>Store: collect consecutive proofs from last_verified+1
    alt multiple consecutive batches ready
        PS->>TL: verifyBatches(firstBatch, proofs[]...) (if timelock configured)
        TL->>OCP: verifyBatches(firstBatch, proofs[]...)
    else single batch ready
        PS->>TL: verifyBatches(batch, [proof], [proof], [proof])
        TL->>OCP: verifyBatches(...)
    end
    alt tx fails for multi-batch
        PS->>PS: fallback: send each batch individually
    end
    PS->>Store: store_verify_tx_by_batch(batch, tx_hash)
    PS->>Store: set_latest_sent_batch_proof(batch)
    PS->>PS: remove checkpoint(batch-1)
Loading

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

32 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements distributed proving and multi-batch L1 verification for the ethrex L2. The proof coordinator now assigns different batches to different provers concurrently (keyed by batch number and prover type), with assignment timeout handling. The L1 proof sender collects consecutive proven batches and submits them in a single verifyBatches() transaction, falling back to single-batch sending on any error. Both OnChainProposer contracts (standard and based) have been updated with a new verifyBatches function and refactored internal verification logic. Metrics now include tx_hash labels to track which batches were verified together.

Changes:

  • Proof coordinator tracks in-memory batch assignments per (batch_number, prover_type) with timeout-based reassignment
  • L1 proof sender collects consecutive proven batches and sends them in a single verifyBatches() call, with automatic fallback to single-batch sending on errors
  • Smart contracts add verifyBatches() accepting arrays of proofs, with internal _verifyBatchInternal() helper; based contract now enforces sequential verification

Reviewed changes

Copilot reviewed 32 out of 32 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
crates/l2/sequencer/proof_coordinator.rs Implements distributed batch assignment with timeout tracking and per-prover-type assignments
crates/l2/sequencer/l1_proof_sender.rs Adds multi-batch collection and verification with fallback logic and invalid proof detection
crates/l2/contracts/src/l1/OnChainProposer.sol Adds verifyBatches() and extracts _verifyBatchInternal() helper
crates/l2/contracts/src/l1/based/OnChainProposer.sol Adds verifyBatches() with no access control, adds sequential verification check
crates/l2/contracts/src/l1/Timelock.sol Updates interface to call verifyBatches()
crates/l2/contracts/src/l1/interfaces/*.sol Interface updates for verifyBatches()
crates/l2/prover/src/backend/*.rs Adds prover_type() method to trait and implementations
crates/l2/prover/src/prover.rs Passes prover_type in BatchRequest
crates/l2/common/src/prover.rs Adds prover_type field to BatchRequest
crates/l2/tee/quote-gen/src/sender.rs Adds ProverType::TDX to BatchRequest
crates/blockchain/metrics/l2/metrics.rs Adds tx_hash label to batch_verification_gas metric
cmd/ethrex/l2/options.rs Adds prover-timeout CLI option
docs/l2/fundamentals/distributed_proving.md New comprehensive documentation for distributed proving
docs/** Updates references from verifyBatch to verifyBatches
metrics/provisioning/grafana/dashboards/l2_dashboards/l2_overview.json Updates dashboard to show tx_hash in tooltip

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

in all external functions (OnChainProposer, based OnChainProposer,
Timelock, and their interfaces). This avoids unnecessary
calldata-to-memory copies for the proof arrays, saving gas proportional
to proof data size. Consistent with verifyBatchesAligned which already
used calldata. Also add distributed_proving.md to SUMMARY.md so the
mdbook linkcheck passes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

L2 Rollup client

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

1 participant