Skip to content

feat(l1_sender): redesign error handling with phase-based struct#1095

Open
Artemka374-claude wants to merge 3 commits intomatter-labs:mainfrom
Artemka374-claude:feat/l1-sender-error-handling
Open

feat(l1_sender): redesign error handling with phase-based struct#1095
Artemka374-claude wants to merge 3 commits intomatter-labs:mainfrom
Artemka374-claude:feat/l1-sender-error-handling

Conversation

@Artemka374-claude
Copy link
Copy Markdown
Contributor

Summary

  • Replace monolithic run_l1_sender with L1SenderLoop, a phase-based struct (receivesend_pendingwait_for_inclusionforward_downstream) that categorizes all errors instead of crashing the binary
  • Add L1SendError enum with three variants: Transient (RPC issues → exponential backoff), Recoverable (gas/blob fee cap exceeded, tx timeout, nonce conflict → 30s wait + retry), Fatal (tx revert, data corruption, channel closed → crash as before)
  • Gas and blob fee caps now block sending before any tx is submitted (previously: warn and send anyway). Adds dedicated GasBlocked and BlobFeeBlocked metric states for early alerting
  • Fix the known Infura crash: replace .expect("no pending block") with fallback to BlockId::latest()
  • Commands are never lost on errors — partial send progress (in-flight txs) survives transient RPC failures across retry cycles
  • Make report_tx_receipt, report_blob_base_fee, report_l1_eip_1559_estimation, get_balance, and get_transaction_count non-fatal (log warn and continue)

Test Plan

  • cargo fmt --all --check passes
  • cargo clippy --all-targets --all-features --workspace -- -D warnings passes
  • cargo nextest run --release --workspace --exclude zksync_os_integration_tests — 175/175 tests pass
  • Integration test failures are pre-existing (anvil --load-state JSON parsing issue unrelated to this change)

No new tests added — the L1 sender has no existing unit tests and adding meaningful ones requires mocking the alloy provider, which is a separate effort tracked as a follow-up.

🤖 Generated with Claude Code

Artemka374 and others added 3 commits March 26, 2026 14:20
Replace the monolithic `run_l1_sender` function with `L1SenderLoop`,
a phase-based struct that categorizes all errors and handles them
appropriately instead of crashing the binary.

Error categories:
- Transient (RPC timeouts, rate limits) → exponential backoff, retry
- Recoverable (GasBlocked, BlobFeeBlocked, TxTimeout, NonceTooLow) →
  wait for external condition, retry with 30s sleep
- Fatal (tx revert, data corruption, channel closed) → crash as before

Key changes:
- `L1SenderLoop` tracks three collections: `pending_commands`,
  `in_flight`, and `completed`. Commands are never lost on errors —
  partial send progress survives transient RPC failures.
- Gas and blob fee caps are checked before any tx is submitted.
  Exceeding the cap now returns `GasBlocked`/`BlobFeeBlocked` instead
  of warning and sending a doomed tx.
- Pending block fallback: replace `.expect("no pending block")` with
  a fallback to `BlockId::latest()` to fix the known Infura crash.
- Metrics: add `GasBlocked`, `BlobFeeBlocked`, `TransientBackoff`
  states; add `transient_errors` counter and `recoverable_errors`
  labeled counter. Make `report_tx_receipt` / `report_blob_base_fee` /
  `report_l1_eip_1559_estimation` non-fatal (log on parse error).
- Informational calls (`get_balance`, `get_transaction_count`) after
  inclusion are now non-fatal — log warning and continue.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
These docs were added for planning purposes and belong in the docs repo,
not in the server codebase.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants