feat(l2): add prover benchmarking tooling and documentation #6157

avilagaston9 · 2026-02-09T14:18:50Z

Motivation

Add tooling to make prover benchmarking workflows reproducible and ergonomic: structured timing logs, a results collection script, load test improvements, and documentation.

Description

Bug fixes:

Fix stale VK file paths in deployer after guest program was moved out of prover
Fix off-by-one in batch_size metric (single-block batch reported as 0)

Load test improvements:

Use pending nonce for consecutive runs without nonce conflicts
Add --endless flag for continuous load generation
Add env var support: LOAD_TEST_RPC_URL, LOAD_TEST_TX_AMOUNT, LOAD_TEST_ENDLESS

Prover instrumentation:

Add proving time instrumentation (prove_timed) and log-parsing benchmark script (scripts/bench_metrics.sh)
Enrich benchmark script with Prometheus batch metadata (gas, txs, blocks)
Add --timed flag to prover CLI to control whether proving time is measured (default: disabled). Set via PROVER_CLIENT_TIMED env var.

Proof coordinator:

Add prover_type field to BatchRequest so the coordinator knows which backend the client runs
Coordinator skips assigning batches to provers whose type isn't needed (e.g. an SP1 prover won't be assigned work when only exec proofs are required)
Add prover_type() method to the ProverBackend trait

Documentation:

Add prover benchmarking guide (docs/l2/prover-benchmarking.md)
Add agent workflow for prover benchmarking (docs/workflows/prover_benchmarking.md)

Checklist

Updated STORE_SCHEMA_VERSION (crates/storage/lib.rs) if the PR includes breaking changes to the Store requiring a re-sync.

Switch the prover loop from prove() to prove_timed() so each batch logs a structured line with batch number and elapsed proving time (seconds and milliseconds). Add scripts/sp1_bench_metrics.sh that tails the prover log, collects results into a CSV, and prints a summary table on exit.

… of prover The guest program was moved from crates/l2/prover/src/ethrex_guest_program/ to crates/guest-program/, but the fallback VK paths in the deployer were not updated. This caused deploy-l1-sp1 to fail with "No such file or directory" when running outside Docker. CI was unaffected because it passes explicit VK paths via ETHREX_SP1_VERIFICATION_KEY_PATH in docker-compose.yaml.

…nce conflicts, and add --endless flag for continuous load generation. The load_test function now fetches the pending nonce (instead of latest) so re-runs pick up where the previous round left off. wait_until_all_included now tracks per-account target nonces instead of a flat tx_amount, which was incorrect for any run where the starting nonce was not zero.

…ndpoint. The script now fetches batch_gas_used, batch_tx_count, and batch_size from the L2 metrics endpoint (localhost:3702/metrics) for each proved batch, joining them with proving time into a single CSV and summary table.

…ing. It now parses all proving_time lines from the file, fetches batch metadata from Prometheus, prints the table, and exits immediately.

…last] contains last - first + 1 blocks, not last - first. A single-block batch was reported as size 0.

…ment. When enabled (default), the prover wraps each prove() call with timing and logs structured fields (proving_time_s, proving_time_ms). When disabled with --no-timed, the prover calls prove() directly without timing overhead. Also fixes a clippy as_conversions warning in the elapsed.as_millis() cast.

…ess mode. The clap args now accept LOAD_TEST_RPC_URL, LOAD_TEST_TX_AMOUNT, and LOAD_TEST_ENDLESS environment variables as alternatives to --node, --tx-amount, and --endless flags respectively. CLI flags take precedence over env vars.

The guide (docs/l2/prover-benchmarking.md) explains how to use the localnet, prover timing, load test, and benchmark script together to measure proving performance. The agent workflow (docs/workflows/prover_benchmarking.md) provides step-by-step instructions for running a remote benchmark session, following the same format as the existing execution witness benchmarking workflow.

github-actions · 2026-02-09T14:19:25Z

🤖 Kimi Code Review

Review Summary

Overall, this PR introduces useful benchmarking infrastructure for the L2 prover with good observability improvements. The changes are mostly well-structured, but there are a few issues to address.

Issues Found

1. Potential Overflow in `l1_committer.rs` (Line 976)

let batch_size = (last_added_block_number - first_block_of_batch + 1).try_into()?;

Issue: The + 1 addition could overflow if last_added_block_number == u64::MAX. While unlikely in practice, this should use checked_add(1) or handle the edge case.

2. Path Construction in `deployer.rs` (Lines 1018, 1025, 1029)

format!("{}/../../crates/guest-program/bin/...", env!("CARGO_MANIFEST_DIR"))

Issue: Path construction is fragile. Consider using PathBuf and join() for cross-platform compatibility:

let base_path = PathBuf::from(env!("CARGO_MANIFEST_DIR"))
    .parent()
    .unwrap()
    .parent()
    .unwrap()
    .join("crates/guest-program/bin/...");

3. Error Handling in `prover.rs` (Lines 108-111)

proving_time_ms = u64::try_from(elapsed.as_millis()).unwrap_or(u64::MAX)

Issue: Using u64::MAX as a fallback could lead to confusing metrics. Consider logging an error and using a more reasonable fallback like 0 or the seconds value converted to ms.

4. Missing Validation in `sp1_bench_metrics.sh`

Issue: The script doesn't validate that the metrics URL is reachable before attempting to fetch data. Add a connectivity check:

if ! curl -s --connect-timeout 5 "$METRICS_URL" > /dev/null 2>&1; then
    echo "Error: Cannot connect to metrics endpoint at $METRICS_URL"
    exit 1
fi

5. Race Condition in Load Test

Issue: In load_test/src/main.rs, the wait_until_all_included function uses BlockTag::Latest which might not include pending transactions. Consider using BlockTag::Pending consistently or adding a small delay between checks.

Positive Aspects

Good CLI ergonomics: The new --timed flag and environment variables follow consistent patterns
Comprehensive documentation: The benchmarking guide is well-structured and includes troubleshooting
Structured logging: The timing information is logged in a parseable format
Backward compatibility: Default values maintain existing behavior

Minor Suggestions

Documentation: Add a note in prover-benchmarking.md about the potential performance overhead of --timed
Metrics: Consider adding a metric for failed proof attempts in the prover
Script: The benchmark script could benefit from a --help flag and version information

The changes are generally well-implemented and the benchmarking infrastructure will be valuable for performance analysis.

Automated review by Kimi (Moonshot AI) · custom prompt

github-actions · 2026-02-09T14:20:42Z

🤖 Claude Code Review

Based on my review of PR #6157, here's my comprehensive code review:

Code Review: PR #6157 - Extract Prover Benchmarking Tooling

Summary

This PR adds benchmarking infrastructure for proving performance, including timing instrumentation, load test improvements, and documentation. The changes are mostly low-risk tooling enhancements.

Critical Issues

1. Bash Script: Potential Integer Overflow (scripts/sp1_bench_metrics.sh:85-93)

count=0; total=0; min=999999999; max=0; total_gas=0; total_txs=0

Issue: Bash integer arithmetic can overflow for large millisecond values. If proving takes hours, total could exceed bash's integer limits (~2^63-1).

Fix: Add overflow protection or use bc for large numbers:

if [[ $total -gt 9223372036854775807 ]]; then
    echo "Warning: overflow detected in timing calculations"
fi

2. Bash Script: Division by Zero Risk (scripts/sp1_bench_metrics.sh:97)

avg=$((total / count))

Issue: While protected by if [[ $count -gt 0 ]], the logic flow could be clearer.

Status: Actually safe, but consider making it more explicit.

High Priority Issues

3. Load Test: Unwrap Without Error Context (tooling/load_test/src/main.rs:232-234)

let nonce = client
    .get_nonce(account.address(), BlockIdentifier::Tag(BlockTag::Pending))
    .await
    .unwrap();

Issue: .unwrap() will panic if RPC fails. In async task, this terminates only that task, but provides no useful error message.

Fix:

.await
.context("Failed to fetch pending nonce")?;

4. Load Test: Missing Error Propagation Context (tooling/load_test/src/main.rs:285-288)

let nonce = client
    .get_nonce(src, BlockIdentifier::Tag(BlockTag::Latest))
    .await
    .unwrap();

Issue: Same as above in the wait loop.

Medium Priority Issues

5. Prover: Potential U64 Overflow in Milliseconds (crates/l2/prover/src/prover.rs:110)

proving_time_ms = u64::try_from(elapsed.as_millis()).unwrap_or(u64::MAX),

Issue: Using unwrap_or(u64::MAX) masks overflow. For proving times >584 million years this saturates to u64::MAX, which could confuse metrics aggregation.

Recommendation: This is practically fine (proving won't take that long), but consider logging a warning on saturation for correctness:

proving_time_ms = u64::try_from(elapsed.as_millis())
    .inspect_err(|_| warn!("Proving time exceeds u64::MAX ms, clamping"))
    .unwrap_or(u64::MAX),

6. Bash Script: Inefficient Metric Fetching (scripts/sp1_bench_metrics.sh:30-37)

fetch_metric() {
    local metric="$1" batch="$2"
    curl -s "$METRICS_URL" 2>/dev/null \
        | grep "^${metric}{" \
        | grep "batch_number=\"${batch}\"" \
        | awk '{print $2}' \
        | head -1
}

Issue: This fetches the entire metrics endpoint for EACH metric for EACH batch (3 metrics per batch). For 100 batches, this makes 300 HTTP requests.

Fix: Fetch metrics once and cache:

METRICS_CACHE=$(curl -s "$METRICS_URL" 2>/dev/null)
fetch_metric() {
    echo "$METRICS_CACHE" | grep "^${1}{" | grep "batch_number=\"${2}\"" | awk '{print $2}' | head -1
}

7. Load Test: Nonce Race Condition (tooling/load_test/src/main.rs:232)

let nonce = client
    .get_nonce(account.address(), BlockIdentifier::Tag(BlockTag::Pending))
    .await
    .unwrap();

Issue: Using BlockTag::Pending is good, but if multiple load test instances run concurrently for the same account, they could get the same nonce range and collide.

Status: Acceptable for single-instance testing. Document this limitation in the guide or add a warning.

Low Priority / Style Issues

8. Load Test: Clone in Tight Loop (tooling/load_test/src/main.rs:240-260)

for i in 0..tx_amount {
    let (value, calldata, dst) = tx_builder.build_tx();
    // ...
    sleep(Duration::from_micros(800)).await;
}

Issue: tx_builder.build_tx() is called in a loop. If tx_builder involves cloning, this could be optimized.

Status: Likely negligible impact. The sleep dominates timing.

9. Prover: Duplicate Log Messages (crates/l2/prover/src/prover.rs:111-124)

"Proved batch {} in {:.2?}",
prover_data.batch_number,
elapsed
// vs
"Proved batch {}", prover_data.batch_number

Suggestion: The two branches have similar messages. Consider extracting common logging:

let batch_proof = if self.timed {
    let (output, elapsed) = self.backend.prove_timed(...)?;
    info!(batch = ..., proving_time_s = ..., proving_time_ms = ..., "Proved batch {} in {:.2?}", ...);
    output
} else {
    let output = self.backend.prove(...)?;
    info!(batch = ..., "Proved batch {}", ...);
    output
};
self.backend.to_batch_proof(output, prover_data.format)

10. Documentation: Missing Batch Size Explanation (docs/l2/prover-benchmarking.md:127)

| `batch_size{batch_number="N"}` | Number of blocks in batch N |

Clarification: Given the fix in l1_committer.rs:976, confirm this metric now correctly represents block count (inclusive).

Positive Observations

✅ Off-by-one fix is correct (crates/l2/sequencer/l1_committer.rs:976)

batch_size = (last_added_block_number - first_block_of_batch + 1) correctly includes both endpoints.

✅ VK path updates are correct (cmd/ethrex/l2/deployer.rs:1018-1028)

Paths properly reflect the moved guest program location.

✅ Timed flag design is clean

Defaulting to true is sensible for a benchmarking tool.
Env var support (PROVER_CLIENT_TIMED) is good for CI/scripts.

✅ Load test improvements are valuable

Using BlockTag::Pending for nonces fixes consecutive run issues.
--endless flag is useful for long-running benchmarks.
Env var support improves scriptability.

✅ Documentation is thorough

Both the guide and workflow docs are clear and actionable.

✅ ProverBackend trait usage is correct

The prove_timed method has a default implementation in the trait, so all backends automatically support it.

Security Considerations

No critical security issues found. This is primarily tooling/benchmarking code that:

Doesn't touch consensus logic
Doesn't modify transaction/block validation
Doesn't affect state trie or storage
Doesn't change cryptographic operations

Performance Considerations

The bash script's N×3 HTTP requests could be slow for large batch counts (see issue Add lint and test jobs to CI #6).
Load test nonce fetching is appropriately using Pending to avoid gaps.
Timing instrumentation overhead is minimal (single Instant::now() call).

Recommendations

Must fix: Add error context to load test unwraps (build: add Github actions boilerplate #3, docs: update milestones. #4)
Should fix: Optimize bash script metric fetching (Add lint and test jobs to CI #6)
Consider: Add overflow warning to proving_time_ms (feat: add basic RPC api. #5)
Consider: Document nonce race condition limitation (Add Dockerfile #7)
Nice to have: Refactor duplicate logging in prover (feat: add support for separate engine and rpc apis #9)

Verdict

Approved with minor suggestions. The code is well-structured, the documentation is excellent, and the changes are low-risk. The main concerns are error handling in the load test (easy fix) and bash script efficiency (minor optimization).

Automated review by Claude (Anthropic) · custom prompt

github-actions · 2026-02-09T14:20:50Z

🤖 Codex Code Review

Findings (ordered by severity)

Benchmark log can record “Proved batch” even when to_batch_proof fails

In both timed and non‑timed branches, the log line is emitted before to_batch_proof, so a conversion error will still produce a success log that the new benchmark script parses. This can inflate results and mislead operators. Consider logging only after to_batch_proof succeeds (or include a status flag).
File: crates/l2/prover/src/prover.rs:102

Potential underflow in inclusive batch size calculation

batch_size is now (last_added_block_number - first_block_of_batch + 1). If last_added_block_number < first_block_of_batch (bad input or off‑by‑one earlier), this underflows on u64 and could emit a huge metric. Suggest checked_sub with a clear error or guard before adding 1.
File: crates/l2/sequencer/l1_committer.rs:976

Other notes

The --timed option and log enrichment look fine, but consider whether the timing log should be emitted only on successful proof/submit, especially since the new script uses it as ground truth.
The LOAD_TEST_* env vars and pending‑nonce target tracking look good; behavior changes are reasonable.

If you want, I can propose concrete patches for the two items above.

Automated review by OpenAI Codex · custom prompt

github-actions · 2026-02-09T14:21:49Z

Lines of code report

Total lines added: 67
Total lines removed: 2
Total lines changed: 69

Detailed view

+-------------------------------------------------+-------+------+
| File                                            | Lines | Diff |
+-------------------------------------------------+-------+------+
| ethrex/cmd/ethrex/l2/options.rs                 | 1080  | +10  |
+-------------------------------------------------+-------+------+
| ethrex/crates/l2/common/src/prover.rs           | 197   | +7   |
+-------------------------------------------------+-------+------+
| ethrex/crates/l2/prover/src/backend/exec.rs     | 67    | +3   |
+-------------------------------------------------+-------+------+
| ethrex/crates/l2/prover/src/backend/mod.rs      | 96    | +1   |
+-------------------------------------------------+-------+------+
| ethrex/crates/l2/prover/src/backend/openvm.rs   | 103   | +3   |
+-------------------------------------------------+-------+------+
| ethrex/crates/l2/prover/src/backend/risc0.rs    | 131   | +3   |
+-------------------------------------------------+-------+------+
| ethrex/crates/l2/prover/src/backend/sp1.rs      | 182   | +3   |
+-------------------------------------------------+-------+------+
| ethrex/crates/l2/prover/src/backend/zisk.rs     | 158   | +3   |
+-------------------------------------------------+-------+------+
| ethrex/crates/l2/prover/src/config.rs           | 12    | +1   |
+-------------------------------------------------+-------+------+
| ethrex/crates/l2/prover/src/prover.rs           | 217   | +32  |
+-------------------------------------------------+-------+------+
| ethrex/crates/l2/sequencer/proof_coordinator.rs | 424   | -2   |
+-------------------------------------------------+-------+------+
| ethrex/crates/l2/tee/quote-gen/src/sender.rs    | 86    | +1   |
+-------------------------------------------------+-------+------+

or in a loop depending on --endless, instead of a loop-with-break pattern.

…lient, tx_builder, and accounts instead of owned values, removing unnecessary clones at the call sites in run_round.

This removes the need for --no-timed; users pass --timed when they want proving time measurement, and omit it otherwise.

both the default and gpu-enabled cargo run commands.

--timed can be passed via make (e.g. make init-prover-sp1 PROVER_ARGS=--timed). Update docs to use the Makefile commands with GPU=true and PROVER_ARGS examples.

PROVER_ARGS, so timing is enabled with TIMED=true (e.g. make init-prover-sp1 GPU=true TIMED=true).

…alias. Clap already reads the env var, so no Makefile plumbing is needed. Reverts the TIMED/PROVER_ARGS additions to the Makefile targets.

…ummary instead of CSV. Add load-test Makefile target with env var configuration. Update docs to use Makefile commands throughout, add mempool limit warning, note that L2 must be running when collecting results, and reference PROVER_CLIENT_TIMED env var instead of --timed flag.

root Makefile target instead. Update docs to run make load-test from the repo root.

script. The report now includes a Server Specs section with hardware info detected from /proc/cpuinfo, /proc/meminfo, sysctl, and nvidia-smi.

user edits to prover benchmarking guide.

by default) in the prover benchmarking workflow.

batch assignments by whether the prover's type is needed. Previously the coordinator assigned a batch whenever ANY required proof was missing, causing provers to waste time re-proving batches they had already proved (e.g. an SP1 prover proving a batch that only needed an exec proof). The ProverBackend trait now has a prover_type() method, implemented for each backend (todo!() for ZisK and OpenVM which lack ProverType variants). The prover client stores its type at startup and sends it with every BatchRequest. The coordinator checks needed_proof_types.contains() before assigning. Also fixes the deploy-l1-sp1 Makefile target which used the removed --sp1.deploy-verifier flag — replaced with --sp1 and removed dummy verifier addresses for other backends. Updated benchmarking docs to reference deploy-l1-sp1 for SP1 benchmarks.

already exposes it via the prover_type() trait method.

… todo!() for ZisK and OpenVM backends (no ProverType variant exists for them yet) and call self.backend.prover_type() directly instead of storing it as a field.

…oad_test formatting. Commit 517e535 added prover_type to BatchRequest but didn't update the TDX quote-gen sender (outside the workspace), breaking the tdx lint/build CI jobs. The load_test formatting was also off since it's outside the main workspace.

…arted immediately after the L2 is up, not just before the prover. The prover proves batches faster than the sequencer fills them with transactions, so any delay in starting the load test results in empty batches being proved.

empty batches. The block producer and committer build and commit empty blocks/batches faster than the load test can start sending transactions (especially for erc20/fibonacci/io-heavy which have a contract deployment setup phase). By pre-compiling the binary in step 1b and running it directly in step 3, the load test starts immediately when the L2 is up.

…t recompilation. Previously each step (deploy, L2 start, prover start) triggered its own `cargo run` which recompiled the binary. Now step 1b builds everything once with all needed features, and all subsequent steps use `target/release/ethrex` directly. Also made the docs backend-agnostic (not SP1-specific) with placeholder flags and tables for SP1/RISC0/Exec, and made the branch/commit prompt mandatory (no default).

ProofCoordinator: the field is used again in PR #6157 and is preserved to simplify merge conflicts.

…name sp1_bench_metrics.sh to bench_metrics.sh since the script is backend-agnostic, fix the load test targets table in the workflow doc to show -t flag values instead of Makefile targets, and add a note clarifying that the hardcoded keys in the deploy command are default test keys from fixtures/.

fix the load test progress percentage to be relative to the current round (not the absolute nonce), make run_round return Result so endless mode survives individual round failures, remove a duplicate debug log in the proof coordinator, add bash 4+ note to bench_metrics.sh, and clarify that LOAD_TEST_RPC_URL must point to the L2 port for benchmarking.

backend that is not in the deployment's required proof types, instead of silently sending an empty batch response. The prover now logs an error and surfaces the rejection clearly, rather than endlessly polling for batches it will never be assigned.

…ndant: we already check whether this specific prover type's proof exists for the batch, and we early-return if the prover type is not needed at all. If the current prover's proof exists, the batch is skipped regardless of what other proof types are missing.

if/else block. Each condition (proof already exists, batch not found, no input for version) is checked and returned independently.

get_prover_input_by_batch_and_version already returns None when the batch does not exist. Also send an empty batch response instead of NoBatchForVersion when no input is found, since the prover just needs to know there is nothing to prove right now.

that unneeded prover types are rejected immediately without hitting the database.

github-actions · 2026-02-11T13:44:05Z

🤖 Kimi Code Review

Review Summary

This PR introduces prover benchmarking capabilities and fixes several issues in the L2 prover system. Here are the key findings:

✅ Good Changes

Prover benchmarking infrastructure - Well-structured addition with proper documentation
Path fixes - Corrects verification key paths in deployer.rs (lines 1018, 1023, 1027)
Batch size calculation fix - Correct off-by-one error in l1_committer.rs:976
Prover type filtering - Smart optimization to skip unnecessary proof types

⚠️ Issues Found

1. Unimplemented Functions (Critical)

OpenVmBackend::prover_type() and ZiskBackend::prover_type() return unimplemented!()
Location: crates/l2/prover/src/backend/openvm.rs:68, crates/l2/prover/src/backend/zisk.rs:117
Fix: Return appropriate ProverType variants instead of panicking

2. Potential Overflow (Medium)

u64::try_from(elapsed.as_millis()).unwrap_or(u64::MAX) in prover.rs:110
Location: crates/l2/prover/src/prover.rs:110
Fix: Use .unwrap_or_else(|_| u64::MAX) for better error handling

3. Missing Error Handling (Medium)

tee/quote-gen/src/sender.rs:16 - Missing prover_type parameter in BatchRequest
Fix: Add prover_type: ProverType::TDX to match new signature

4. Documentation Inconsistency (Low)

docs/workflows/prover_benchmarking.md uses hardcoded private keys
Recommendation: Add warning about test-only keys

🔧 Suggested Improvements

Add validation for prover_type parameter in ProofData::batch_request
Consider using Duration::as_secs_f64() for more precise timing in logs
Add metrics for rejected prover types in proof_coordinator.rs

🎯 Security Considerations

No critical security issues found
Private keys in docs are test-only and properly documented

📊 Performance Impact

Minimal overhead from new timing measurements
Prover type filtering reduces unnecessary work

The PR is well-structured and the benchmarking additions are valuable. The unimplemented functions need immediate attention before merging.

Automated review by Kimi (Moonshot AI) · custom prompt

github-actions · 2026-02-11T13:45:23Z

🤖 Codex Code Review

Findings (ordered by severity)

Potential prover starvation/backlog: ProofCoordinator::handle_batch_request now gets the “next batch to prove” without considering the prover’s type, then returns an empty response if that type’s proof already exists. If next_batch_to_prove_for_version returns the earliest batch missing any proof type, a prover whose type is already done for that batch will keep getting empty responses and won’t progress to later batches that still need its proof. This can idle a prover indefinitely while other proof types catch up. Consider selecting the next batch that is missing this prover_type instead of returning empty.
Path: crates/l2/sequencer/proof_coordinator.rs (around the new “proof already exists” check).
Runtime panic for enabled backends: ProverBackend::prover_type() is unimplemented!() for OpenVM and ZisK. If those backends are selectable via config/CLI (or enabled by feature), the prover will panic the first time it requests a batch. Prefer returning the correct ProverType or hard‑disable these backends at configuration/CLI level with a clear error.
Paths: crates/l2/prover/src/backend/openvm.rs (around line 64), crates/l2/prover/src/backend/zisk.rs (around line 113).
Protocol compatibility risk: ProofData::BatchRequest gained a new field (prover_type) and a new variant (ProverTypeNotNeeded). If this enum is serialized over the network (likely), older coordinators/provers will fail to decode and disconnect. Ensure the protocol version is bumped or add backward‑compatible decoding/handshake.
Path: crates/l2/common/src/prover.rs (around lines 179–210).

Notes / minor observations

The change in batch size calculation (+1) looks like an off‑by‑one fix and should be correct if first_block_of_batch and last_added_block_number are inclusive.
Path: crates/l2/sequencer/l1_committer.rs (around line 973).
Logging and timing in the prover look fine; PROVER_CLIENT_TIMED is cleanly plumbed.
Paths: cmd/ethrex/l2/options.rs, crates/l2/prover/src/prover.rs.

If you want, I can dig into next_batch_to_prove_for_version to confirm the starvation scenario and suggest a concrete fix.

Automated review by OpenAI Codex · custom prompt

github-actions · 2026-02-11T13:45:46Z

🤖 Claude Code Review

Now I have enough context to provide a comprehensive review. Let me compile my findings:

PR Review: feat(l2): add prover benchmarking tooling and documentation

Overview

This PR adds comprehensive tooling for reproducible prover benchmarking, including timing instrumentation, metrics collection, load testing improvements, and documentation. The changes are well-structured and the feature additions are valuable for performance measurement.

Critical Issues

1. Proof Coordinator Logic Change - Potential Race Condition Risk ⚠️

Location: crates/l2/sequencer/proof_coordinator.rs:167-176

The coordinator now checks prover_type before calling next_batch_to_prove_for_version(). This is good for early rejection, but the order of operations has changed significantly:

Before: The coordinator checked if ALL needed proof types existed for a batch before assigning work.
After: It only checks if THIS prover type's proof exists.

Issue: While the commit message says "Move the prover type check before the store query," this optimization comes at line 167-176, which is good. However, the removal of the "all proofs exist" check (old lines 385-396) changes the fundamental coordination logic. Multiple provers of different types can now work on the same batch simultaneously, which might be intentional but should be verified.

Recommendation: Confirm this behavior change is intentional. If so, document it in the commit message or comments.

2. Error Handling - Prover Terminates on ProverTypeNotNeeded 🔴

Location: crates/l2/prover/src/prover.rs:329-337

ProofData::ProverTypeNotNeeded { prover_type } => {
    error!(...);
    return Err(format!(...)); // This causes the request to fail
}

When a prover receives ProverTypeNotNeeded, it returns an Err, which propagates up and causes the prover to continue the loop but skip that coordinator endpoint. However, this error is logged at error! level when it's actually an expected operational state (wrong backend type for this deployment).

Issues:

The error message is misleading - this isn't a failure, it's a configuration mismatch
The prover will keep retrying this coordinator in every loop iteration, generating spam
Should either: (a) remove that coordinator from the list permanently, or (b) downgrade to warn! and add exponential backoff

Recommendation: Either remove incompatible coordinators from the list after first rejection, or handle this as a non-error condition with backoff.

3. Off-by-One Fix Verification Needed ⚠️

Location: crates/l2/sequencer/l1_committer.rs:350

-let batch_size = (last_added_block_number - first_block_of_batch).try_into()?;
+let batch_size = (last_added_block_number - first_block_of_batch + 1).try_into()?;

This fixes the metric from reporting "0" for single-block batches, but verify the semantics:

If first_block_of_batch = 5 and last_added_block_number = 5, this now reports batch_size = 1 ✓
If first_block_of_batch = 5 and last_added_block_number = 7, this now reports batch_size = 3 ✓

Recommendation: Add a test case to prevent regression, especially since this metric is now used in benchmarking reports.

Major Issues

4. Load Test - Nonce Handling Still Fragile

Location: tooling/load_test/src/main.rs:232-233

The load test now uses BlockTag::Pending for nonce (good improvement!), but there's still a potential issue:

let nonce = client.get_nonce(account.address(), BlockIdentifier::Tag(BlockTag::Pending)).await.unwrap();

Issues:

Uses .unwrap() - if the RPC call fails, the entire test panics
No retry logic for transient network errors
Documentation warns about mempool limits but doesn't implement any safeguards

Recommendation: Add proper error handling with retry logic for the nonce fetch.

5. Bash Script - No Input Validation ⚠️

Location: scripts/bench_metrics.sh:44-56

The script uses basic string parsing with grep -o and cut, but doesn't validate numeric values:

batch=$(echo "$line" | grep -o 'batch=[0-9]*' | head -1 | cut -d= -f2)
secs=$(echo "$line" | grep -o 'proving_time_s=[0-9]*' | cut -d= -f2)

Issues:

If the log format changes slightly, parsing silently fails
No validation that batch, secs, ms are valid integers
The arithmetic operations later ($((total + ms))) will fail with non-numeric values

Recommendation: Add validation after parsing: [[ "$batch" =~ ^[0-9]+$ ]] before using values in arithmetic.

6. Bash Script - Integer Overflow Risk

Location: scripts/bench_metrics.sh:1054-1065

total=$((total + ms))
total_gas=$((total_gas + ${gas%%.*}))

Bash arithmetic is limited to signed 64-bit integers. For large benchmarks:

total_gas could overflow if many batches with high gas usage are processed
Unlikely in practice but possible with extended benchmarking sessions

Recommendation: Consider using bc for large arithmetic or document the limitation.

Minor Issues

7. Inconsistent Error Handling in ProverBackend Trait

Location: Multiple backend implementations

The new prover_type() method uses unimplemented!() for OpenVM and ZisK:

fn prover_type(&self) -> ProverType {
    unimplemented!("OpenVM is not yet enabled as a backend for the L2")
}

Issue: If someone accidentally enables OpenVM backend, this will panic at runtime instead of failing at compile time.

Recommendation: Consider returning Option<ProverType> or using a compile-time check with feature flags.

8. Documentation - Missing Cleanup Instructions

Location: docs/workflows/prover_benchmarking.md:650-664

The cleanup section checks for old databases but doesn't actually remove them. The user must manually interpret the output and decide what to delete.

Recommendation: Add explicit cleanup commands or a helper script for one-step cleanup.

9. Load Test - Max Fee Per Gas Set to i64::MAX

Location: tooling/load_test/src/main.rs:252

max_fee_per_gas: Some(i64::MAX as u64),

Issue: This is casting i64::MAX (9,223,372,036,854,775,807) to u64, which is fine technically, but this is an astronomically high gas price that could cause issues if the test ever runs against a real network (even testnet).

Recommendation: Use a reasonable max gas price constant (e.g., 1000 * 10^9 for 1000 gwei) or make it configurable.

10. Trait Method Without Default Implementation

Location: crates/l2/prover/src/backend/mod.rs:160

The new prover_type() method is added to the trait without a default implementation, forcing all implementors to update. This is a breaking change for any external backend implementations.

Recommendation: Acceptable for internal use, but document this as a breaking API change if the trait is public.

Performance Considerations

11. Metrics Fetching in Benchmark Script

Location: scripts/bench_metrics.sh:949-956

The script makes a separate curl call for each metric for each batch:

gas=$(fetch_metric "batch_gas_used" "$batch" 2>/dev/null || true)
txs=$(fetch_metric "batch_tx_count" "$batch" 2>/dev/null || true)
blocks=$(fetch_metric "batch_size" "$batch" 2>/dev/null || true)

Issue: For N batches, this makes 3N HTTP requests. With 100 batches, that's 300 requests.

Recommendation: Fetch all metrics once and parse locally:

metrics=$(curl -s "$METRICS_URL" 2>/dev/null)
gas=$(echo "$metrics" | grep "^batch_gas_used{.*batch_number=\"${batch}\"" | ...)

Security Considerations

12. SSH Commands in Documentation

Location: docs/workflows/prover_benchmarking.md

The workflow documentation shows direct SSH command execution patterns:

ssh <server> "bash -l -c 'command'"

Issue: This is secure for the documented use case, but the commands include sensitive paths and private keys. If users copy-paste without understanding, they might expose credentials.

Recommendation: Add a security note about protecting private keys and not using production credentials.

Code Quality & Style

13. Inconsistent Logging Levels

The prover logs batch completion at info! level whether timed or not, which is good. However, the coordinator rejection is logged as error! when it should be warn! or info!.

14. Magic Numbers

tooling/load_test/src/main.rs:260: sleep(Duration::from_micros(800)) - This 800µs delay should be a named constant with explanation.

15. Duplicate Code

The prover's prove vs prove_timed branches (prover.rs:102-127) have duplicated logging logic. Consider extracting a helper function.

Positive Notes ✓

Protocol Design: The prover_type field in BatchRequest is well-designed and enables proper coordination
Backward Compatibility: The PR doesn't break existing functionality (besides unimplemented backends)
Documentation: The benchmarking guide and workflow docs are thorough and helpful
Testing Infrastructure: The endless mode and env var support make the load test much more flexible
Metrics Integration: Enriching benchmark results with Prometheus data is excellent for analysis
Path Fixes: The VK path fixes prevent build failures after the guest program move

Summary

Approval Status: ⚠️ Conditional Approval - Fix critical issues before merge

Critical Issues to Address:

Fix error handling for ProverTypeNotNeeded (issue chore: create project structure #2)
Clarify/document the proof coordinator logic change (issue docs: add milestones #1)
Add nonce error handling in load test (issue docs: update milestones. #4)

Recommended Before Merge:

Add test for batch_size metric calculation (issue build: add Github actions boilerplate #3)
Improve bash script validation (issue feat: add basic RPC api. #5)

Nice to Have:

Optimize metrics fetching (issue feat: add fmt, test and clippy to CI #11)
Replace magic numbers with constants (issue Populate ChainSpec based on information stored in genesis.json file #14)
Refactor duplicate logging (issue Create Router for RPC requests #15)

The PR adds valuable infrastructure for performance measurement and the overall design is solid. The main concerns are around error handling and the coordinator logic change.

Automated review by Claude (Anthropic) · custom prompt

Copilot

Pull request overview

This pull request adds comprehensive tooling and documentation for benchmarking prover performance on the ethrex L2. The changes enable reproducible and ergonomic prover benchmarking workflows by adding structured timing logs, a results collection script, load test improvements, and detailed documentation for both manual and agent-assisted workflows.

Changes:

Adds prover instrumentation with prove_timed() method and --timed CLI flag for measuring proving time
Enhances proof coordinator to filter provers by type, preventing unnecessary work assignments when specific proof types aren't needed
Improves load test with pending nonce support, endless mode, and environment variable configuration for easier automation
Fixes two bugs: VK file paths after guest program relocation and off-by-one error in batch_size metric
Adds comprehensive documentation for prover benchmarking workflows

Reviewed changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
tooling/load_test/src/main.rs	Adds endless mode, uses pending nonces for consecutive runs, adds env var support, refactors into `run_round` function
scripts/bench_metrics.sh	New benchmark results collection script that parses prover logs and enriches with Prometheus batch metadata
docs/workflows/prover_benchmarking.md	New agent workflow documentation for remote server benchmarking
docs/l2/prover-benchmarking.md	New user guide for prover benchmarking on local/remote setups
crates/l2/prover/src/prover.rs	Adds timed proving support controlled by `--timed` flag
crates/l2/prover/src/config.rs	Adds `timed` field to ProverConfig
crates/l2/prover/src/backend/mod.rs	Adds `prover_type()` method to ProverBackend trait
crates/l2/prover/src/backend/*.rs	Implements `prover_type()` method for all backend implementations
crates/l2/sequencer/proof_coordinator.rs	Adds prover type filtering logic to skip unnecessary work assignments
crates/l2/common/src/prover.rs	Adds `prover_type` field to `BatchRequest` and new `ProverTypeNotNeeded` response
crates/l2/tee/quote-gen/src/sender.rs	Updates `get_batch` to include `prover_type: ProverType::TDX` in BatchRequest
crates/l2/sequencer/l1_committer.rs	Fixes off-by-one error in batch_size metric calculation
cmd/ethrex/l2/deployer.rs	Fixes VK file paths after guest program relocation to crates/guest-program/bin
cmd/ethrex/l2/options.rs	Adds `--timed` CLI option for prover client
crates/l2/Makefile	Simplifies deploy-l1-sp1 target to use `--sp1` flag

Comments suppressed due to low confidence (1)

crates/l2/tee/quote-gen/src/sender.rs:52

The get_batch function does not handle the new ProofData::ProverTypeNotNeeded response variant that was added to the protocol. When a TDX prover connects but TDX proofs are not needed by the coordinator, it will send back ProverTypeNotNeeded, but this function will treat it as an unexpected response and return a generic error "Expecting ProofData::Response". Add a match arm to handle ProofData::ProverTypeNotNeeded and return a more descriptive error message.

pub async fn get_batch(commit_hash: String) -> Result<(u64, ProgramInput), String> {
    let batch = connect_to_prover_server_wr(&ProofData::BatchRequest {
        commit_hash: commit_hash.clone(),
        prover_type: ProverType::TDX,
    })
    .await
    .map_err(|e| format!("Failed to get Response: {e}"))?;
    match batch {
        ProofData::BatchResponse {
            batch_number,
            input,
            ..
        } => match (batch_number, input) {
            (Some(batch_number), Some(input)) => {
                #[cfg(feature = "l2")]
                let input = ProgramInput {
                    blocks: input.blocks,
                    execution_witness: input.execution_witness,
                    elasticity_multiplier: input.elasticity_multiplier,
                    blob_commitment: input.blob_commitment,
                    blob_proof: input.blob_proof,
                    fee_configs: input.fee_configs,
                };
                #[cfg(not(feature = "l2"))]
                let input = ProgramInput {
                    blocks: input.blocks,
                    execution_witness: input.execution_witness,
                };
                Ok((batch_number, input))
            }
            _ => Err("No blocks to prove.".to_owned()),
        },
        ProofData::NoBatchForVersion {
            commit_hash: server_code_version,
        } => Err(format!(
            "Next batch does not match with the current version. Server code: {}, Prover code: {}",
            server_code_version, commit_hash
        )),
        _ => Err("Expecting ProofData::Response".to_owned()),
    }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

crates/l2/Makefile

greptile-apps · 2026-02-11T13:53:39Z

Greptile Overview

Greptile Summary

This PR adds reproducible prover benchmarking support across the L2 stack: new optional timed proving instrumentation (controlled by --timed / PROVER_CLIENT_TIMED), a metrics parsing script, load-test improvements (pending nonce, endless mode, env var configuration), and new benchmarking documentation.

On the coordinator side, BatchRequest now includes prover_type, allowing the proof coordinator to reject/skip assigning batches to provers whose backend isn’t required by the current deployment, reducing wasted polling and DB work. The deployer also updates hardcoded verification-key paths after the guest program relocation, and a batch_size metric off-by-one is fixed in the committer.

Confidence Score: 3/5

This PR is close to safe to merge but has a runtime panic in certain feature/backends that should be fixed first.
Most changes are additive (instrumentation, scripts, docs) and the coordinator/request wiring looks consistent, but prover_type() is left as unimplemented!() in the OpenVM and ZisK backends while those backends remain selectable, which will panic at runtime when enabled/used.
crates/l2/prover/src/backend/openvm.rs, crates/l2/prover/src/backend/zisk.rs

Important Files Changed

Filename	Overview
cmd/ethrex/l2/deployer.rs	Updates default VK file paths for RISC0/SP1 to new guest-program location; otherwise unchanged.
cmd/ethrex/l2/options.rs	Adds `--timed` / `PROVER_CLIENT_TIMED` flag to prover client options and threads it into ProverConfig.
crates/l2/Makefile	Adds/adjusts localnet/prover convenience targets and env exports for benchmarking workflow.
crates/l2/common/src/prover.rs	Extends BatchRequest with `prover_type` so coordinator can filter assignments; appears consistent with coordinator usage.
crates/l2/prover/src/backend/exec.rs	Implements `prover_type()` for exec backend and related instrumentation hooks.
crates/l2/prover/src/backend/mod.rs	Adds `prover_type()` to ProverBackend trait and propagates backend parsing; enables coordinator filtering by backend type.
crates/l2/prover/src/backend/openvm.rs	Adds ProverBackend::prover_type() but currently `unimplemented!()` causing runtime panic if openvm backend is selected.
crates/l2/prover/src/backend/risc0.rs	Implements `prover_type()` for Risc0 backend; no issues found.
crates/l2/prover/src/backend/sp1.rs	Implements `prover_type()` for SP1 backend; no issues found.
crates/l2/prover/src/backend/zisk.rs	Adds ProverBackend::prover_type() but currently `unimplemented!()` causing runtime panic if zisk backend is selected.
crates/l2/prover/src/config.rs	Adds `timed` field to ProverConfig for controlling proving-time instrumentation.
crates/l2/prover/src/prover.rs	Wires new `prover_type` into BatchRequest and optional timed proving; backend-specific `prover_type()` panics remain a risk.
crates/l2/sequencer/l1_committer.rs	Fixes batch_size metric off-by-one; no issues found.
crates/l2/sequencer/proof_coordinator.rs	Handles prover_type in BatchRequest and skips batches for unneeded prover types; logic appears consistent with new request format.
crates/l2/tee/quote-gen/src/sender.rs	Adds prover_type=TDX in batch requests to coordinator; aligns with coordinator filtering behavior.
docs/l2/prover-benchmarking.md	Adds benchmarking guide documentation.
docs/workflows/prover_benchmarking.md	Adds agent workflow for benchmarking documentation.
scripts/bench_metrics.sh	Adds bench log parsing script for prover timing metrics; not executed in build.
tooling/load_test/src/main.rs	Adds pending-nonce usage, endless mode, and env var support for load test CLI.

Sequence Diagram

sequenceDiagram
    participant Prover as Prover client
    participant Coord as ProofCoordinator
    participant Store as Store/DB

    Prover->>Coord: BatchRequest{ chain_id, last_verified_batch, prover_type }
    Coord->>Store: fetch needed_proof_types + next batch/input
    alt prover_type not needed
        Coord-->>Prover: ProverTypeNotNeeded
    else proof already exists
        Coord-->>Prover: ProofAlreadyExists
    else no batch/input available
        Coord-->>Prover: EmptyBatch / No work
    else batch assigned
        Coord-->>Prover: BatchResponse{ batch, input, ... }
        Prover->>Prover: prove() / prove_timed() (if --timed)
        Prover->>Coord: submit proof
    end

greptile-apps

_{19 files reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

crates/l2/prover/src/backend/openvm.rs

crates/l2/prover/src/backend/zisk.rs

ArgAction::Set, not SetTrue), shut down the prover process when the coordinator rejects its type instead of retrying forever, and cache the metrics endpoint response in bench_metrics.sh to avoid 3N HTTP requests.

…mplemented!), make send_batches_proof_to_contract private, add natspec to based contract's verifyBatches noting it has no access control, update all L2 docs replacing verifyBatch references with verifyBatches, move distributed_proving.md from docs/prover/ to docs/l2/fundamentals/ since it describes the interaction between proof coordinator, proof sender, and provers rather than prover internals, and restructure the doc to be explanation-first with the testing guide at the end.

avilagaston9 added 9 commits February 9, 2026 11:02

Make benchmark script read the log file once and exit instead of tail…

a236c08

…ing. It now parses all proving_time lines from the file, fetches batch metadata from Prometheus, prints the table, and exits immediately.

Fix off-by-one in batch_size metric: a batch spanning blocks [first, …

a56a257

…last] contains last - first + 1 blocks, not last - first. A single-block batch was reported as size 0.

github-actions bot assigned avilagaston9 Feb 9, 2026

avilagaston9 changed the title ~~Extract prover benchmarking tooling into standalone PR~~ feat(l2): add prover benchmarking tooling and documentation Feb 9, 2026

github-actions bot added the L2 Rollup client label Feb 9, 2026

github-project-automation bot added this to ethrex_l2 Feb 9, 2026

Extract load test round logic into run_round function and use it once

a786cd1

or in a loop depending on --endless, instead of a loop-with-break pattern.

avilagaston9 force-pushed the bench/prover-tooling branch from 34dcf6b to a786cd1 Compare February 9, 2026 15:27

avilagaston9 added 11 commits February 9, 2026 12:41

Change load_test and wait_until_all_included to take references for c…

48ec1c1

…lient, tx_builder, and accounts instead of owned values, removing unnecessary clones at the call sites in run_round.

Change --timed flag default to false so it must be explicitly opted in.

9fd1d07

This removes the need for --no-timed; users pass --timed when they want proving time measurement, and omit it otherwise.

Clarify GPU acceleration option in prover benchmarking guide by showing

131c650

both the default and gpu-enabled cargo run commands.

Add PROVER_ARGS variable to Makefile prover targets so extra flags like

8375cb2

--timed can be passed via make (e.g. make init-prover-sp1 PROVER_ARGS=--timed). Update docs to use the Makefile commands with GPU=true and PROVER_ARGS examples.

Use PROVER_CLIENT_TIMED env var in Makefile prover targets instead of

d0fbcaa

PROVER_ARGS, so timing is enabled with TIMED=true (e.g. make init-prover-sp1 GPU=true TIMED=true).

Use PROVER_CLIENT_TIMED env var directly instead of a TIMED Makefile …

baae118

…alias. Clap already reads the env var, so no Makefile plumbing is needed. Reverts the TIMED/PROVER_ARGS additions to the Makefile targets.

Remove load-test target from crates/l2/Makefile and use the existing

ff9e680

root Makefile target instead. Update docs to run make load-test from the repo root.

Add automatic server specs detection (CPU, RAM, GPU) to the benchmark

f87ed2a

script. The report now includes a Server Specs section with hardware info detected from /proc/cpuinfo, /proc/meminfo, sysctl, and nvidia-smi.

Add batches-to-prove prompt to agent setup instructions and include

db185aa

user edits to prover benchmarking guide.

Clarify that LOAD_TEST_RPC_URL must point to the L2 node RPC (port 1729

7e49164

by default) in the prover benchmarking workflow.

avilagaston9 added 2 commits February 9, 2026 18:20

Remove redundant prover_type field from Prover struct — the backend

8bd6f49

already exposes it via the prover_type() trait method.

avilagaston9 added 4 commits February 10, 2026 10:49

avilagaston9 added a commit that referenced this pull request Feb 10, 2026

Add comment explaining the #[allow(dead_code)] on needed_proof_types in

df807d2

ProofCoordinator: the field is used again in PR #6157 and is preserved to simplify merge conflicts.

avilagaston9 added 7 commits February 10, 2026 17:53

Flatten handle_request into sequential early returns instead of a nested

a059a43

if/else block. Each condition (proof already exists, batch not found, no input for version) is checked and returned independently.

Move the prover type check before the store query in handle_request so

5d34ed6

that unneeded prover types are rejected immediately without hitting the database.

avilagaston9 marked this pull request as ready for review February 11, 2026 13:43

avilagaston9 requested a review from ilitteri as a code owner February 11, 2026 13:43

Copilot AI review requested due to automatic review settings February 11, 2026 13:43

avilagaston9 requested review from a team and ManuelBilbao as code owners February 11, 2026 13:43

Copilot started reviewing on behalf of avilagaston9 February 11, 2026 13:44 View session

Copilot AI reviewed Feb 11, 2026

View reviewed changes

crates/l2/Makefile Outdated Show resolved Hide resolved

greptile-apps bot reviewed Feb 11, 2026

View reviewed changes

crates/l2/prover/src/backend/openvm.rs Show resolved Hide resolved

crates/l2/prover/src/backend/zisk.rs Show resolved Hide resolved

Fix --sp1 flag in Makefile to pass explicit 'true' value (the arg uses

0740241

ArgAction::Set, not SetTrue), shut down the prover process when the coordinator rejects its type instead of retrying forever, and cache the metrics endpoint response in bench_metrics.sh to avoid 3N HTTP requests.

avilagaston9 mentioned this pull request Feb 11, 2026

feat(l2): enable distributed proving and multi-batch verification #6158

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(l2): add prover benchmarking tooling and documentation #6157

feat(l2): add prover benchmarking tooling and documentation #6157

Uh oh!

avilagaston9 commented Feb 9, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Feb 9, 2026

Uh oh!

github-actions bot commented Feb 9, 2026

Uh oh!

github-actions bot commented Feb 9, 2026

Uh oh!

github-actions bot commented Feb 9, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Feb 11, 2026

Uh oh!

github-actions bot commented Feb 11, 2026

Uh oh!

github-actions bot commented Feb 11, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

greptile-apps bot commented Feb 11, 2026

Important Files Changed

Uh oh!

greptile-apps bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

feat(l2): add prover benchmarking tooling and documentation #6157

Are you sure you want to change the base?

feat(l2): add prover benchmarking tooling and documentation #6157

Uh oh!

Conversation

avilagaston9 commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Description

Checklist

Uh oh!

github-actions bot commented Feb 9, 2026

🤖 Kimi Code Review

Review Summary

Issues Found

1. Potential Overflow in l1_committer.rs (Line 976)

2. Path Construction in deployer.rs (Lines 1018, 1025, 1029)

3. Error Handling in prover.rs (Lines 108-111)

4. Missing Validation in sp1_bench_metrics.sh

5. Race Condition in Load Test

Positive Aspects

Minor Suggestions

Uh oh!

github-actions bot commented Feb 9, 2026

🤖 Claude Code Review

Code Review: PR #6157 - Extract Prover Benchmarking Tooling

Summary

Critical Issues

1. Bash Script: Potential Integer Overflow (scripts/sp1_bench_metrics.sh:85-93)

2. Bash Script: Division by Zero Risk (scripts/sp1_bench_metrics.sh:97)

High Priority Issues

3. Load Test: Unwrap Without Error Context (tooling/load_test/src/main.rs:232-234)

4. Load Test: Missing Error Propagation Context (tooling/load_test/src/main.rs:285-288)

Medium Priority Issues

5. Prover: Potential U64 Overflow in Milliseconds (crates/l2/prover/src/prover.rs:110)

6. Bash Script: Inefficient Metric Fetching (scripts/sp1_bench_metrics.sh:30-37)

7. Load Test: Nonce Race Condition (tooling/load_test/src/main.rs:232)

Low Priority / Style Issues

8. Load Test: Clone in Tight Loop (tooling/load_test/src/main.rs:240-260)

9. Prover: Duplicate Log Messages (crates/l2/prover/src/prover.rs:111-124)

10. Documentation: Missing Batch Size Explanation (docs/l2/prover-benchmarking.md:127)

Positive Observations

Security Considerations

Performance Considerations

Recommendations

Verdict

Uh oh!

github-actions bot commented Feb 9, 2026

🤖 Codex Code Review

Uh oh!

github-actions bot commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Lines of code report

Uh oh!

github-actions bot commented Feb 11, 2026

🤖 Kimi Code Review

Review Summary

✅ Good Changes

⚠️ Issues Found

1. Unimplemented Functions (Critical)

2. Potential Overflow (Medium)

3. Missing Error Handling (Medium)

4. Documentation Inconsistency (Low)

🔧 Suggested Improvements

🎯 Security Considerations

📊 Performance Impact

Uh oh!

github-actions bot commented Feb 11, 2026

🤖 Codex Code Review

Uh oh!

github-actions bot commented Feb 11, 2026

🤖 Claude Code Review

PR Review: feat(l2): add prover benchmarking tooling and documentation

Overview

Critical Issues

1. Proof Coordinator Logic Change - Potential Race Condition Risk ⚠️

2. Error Handling - Prover Terminates on ProverTypeNotNeeded 🔴

3. Off-by-One Fix Verification Needed ⚠️

Major Issues

4. Load Test - Nonce Handling Still Fragile

avilagaston9 commented Feb 9, 2026 •

edited

Loading

1. Potential Overflow in `l1_committer.rs` (Line 976)

2. Path Construction in `deployer.rs` (Lines 1018, 1025, 1029)

3. Error Handling in `prover.rs` (Lines 108-111)

4. Missing Validation in `sp1_bench_metrics.sh`

github-actions bot commented Feb 9, 2026 •

edited

Loading