feat(l1_sender): redesign error handling with phase-based struct by Artemka374-claude · Pull Request #1095 · matter-labs/zksync-os-server

Artemka374-claude · 2026-03-26T14:53:25Z

Summary

Replace monolithic run_l1_sender with L1SenderLoop, a phase-based struct (receive → send_pending → wait_for_inclusion → forward_downstream) that categorizes all errors instead of crashing the binary
Add L1SendError enum with three variants: Transient (RPC issues → exponential backoff), Recoverable (gas/blob fee cap exceeded, tx timeout, nonce conflict → 30s wait + retry), Fatal (tx revert, data corruption, channel closed → crash as before)
Gas and blob fee caps now block sending before any tx is submitted (previously: warn and send anyway). Adds dedicated GasBlocked and BlobFeeBlocked metric states for early alerting
Fix the known Infura crash: replace .expect("no pending block") with fallback to BlockId::latest()
Commands are never lost on errors — partial send progress (in-flight txs) survives transient RPC failures across retry cycles
Make report_tx_receipt, report_blob_base_fee, report_l1_eip_1559_estimation, get_balance, and get_transaction_count non-fatal (log warn and continue)

Test Plan

cargo fmt --all --check passes
cargo clippy --all-targets --all-features --workspace -- -D warnings passes
cargo nextest run --release --workspace --exclude zksync_os_integration_tests — 175/175 tests pass
Integration test failures are pre-existing (anvil --load-state JSON parsing issue unrelated to this change)

No new tests added — the L1 sender has no existing unit tests and adding meaningful ones requires mocking the alloy provider, which is a separate effort tracked as a follow-up.

🤖 Generated with Claude Code

Replace the monolithic `run_l1_sender` function with `L1SenderLoop`, a phase-based struct that categorizes all errors and handles them appropriately instead of crashing the binary. Error categories: - Transient (RPC timeouts, rate limits) → exponential backoff, retry - Recoverable (GasBlocked, BlobFeeBlocked, TxTimeout, NonceTooLow) → wait for external condition, retry with 30s sleep - Fatal (tx revert, data corruption, channel closed) → crash as before Key changes: - `L1SenderLoop` tracks three collections: `pending_commands`, `in_flight`, and `completed`. Commands are never lost on errors — partial send progress survives transient RPC failures. - Gas and blob fee caps are checked before any tx is submitted. Exceeding the cap now returns `GasBlocked`/`BlobFeeBlocked` instead of warning and sending a doomed tx. - Pending block fallback: replace `.expect("no pending block")` with a fallback to `BlockId::latest()` to fix the known Infura crash. - Metrics: add `GasBlocked`, `BlobFeeBlocked`, `TransientBackoff` states; add `transient_errors` counter and `recoverable_errors` labeled counter. Make `report_tx_receipt` / `report_blob_base_fee` / `report_l1_eip_1559_estimation` non-fatal (log on parse error). - Informational calls (`get_balance`, `get_transaction_count`) after inclusion are now non-fatal — log warning and continue. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

These docs were added for planning purposes and belong in the docs repo, not in the server codebase. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Artemka374 and others added 3 commits March 26, 2026 14:20

some docs

fce0e81

chore: remove l1-sender design docs from branch

305c747

These docs were added for planning purposes and belong in the docs repo, not in the server codebase. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(l1_sender): redesign error handling with phase-based struct#1095

feat(l1_sender): redesign error handling with phase-based struct#1095
Artemka374-claude wants to merge 3 commits intomatter-labs:mainfrom
Artemka374-claude:feat/l1-sender-error-handling

Artemka374-claude commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Artemka374-claude commented Mar 26, 2026

Summary

Test Plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants