Skip to content

Nonce-too-low (-32603) misclassified as retriable error, causing cascading provider pausing and relayer deadlock #681

@snnbotchway

Description

@snnbotchway

Describe the bug

When send_raw_transaction receives a JSON-RPC error with code -32603 (Internal Error) and message "Transaction nonce too low", the relayer classifies it as a retriable error. Since this error is caused by a stale nonce (transaction already mined under a different hash), retrying the same signed transaction with the same nonce against different providers always produces the same error. This leads to:

  1. 3 retries per provider (same stale nonce, same error every time — the RPC is correctly rejecting it)
  2. Provider marked as failed after retries exhaust (mark_current_as_failed())
  3. Failover to next provider → same nonce error → that provider also marked as failed
  4. All providers paused within seconds (default failure_threshold=3 is easily exceeded)
  5. Relayer deadlocked"No non-paused providers available" warning, falls back to paused providers which also fail and extend the pause window

The root cause is in src/services/provider/mod.rs, is_retriable_error():

ProviderError::RpcErrorCode { code, .. } => {
    match code {
        // -32603: Internal error (may be temporary)
        -32603 => true,  // <-- This treats ALL -32603 as retriable
    }
}

Code -32603 is a generic "Internal error" umbrella. "Transaction nonce too low" is a transaction-level error, not a provider health issue. The RPC endpoint is working correctly — it is properly rejecting a transaction with a stale nonce. Retrying the same tx against other providers and marking them as failed is incorrect behavior.

Additional contributing factors

  1. RpcHealthStore is a global singleton — all relayers share the same health state. With many relayers (we run 40), nonce-too-low errors from just a few relayers quickly cascade to pause all providers for every relayer.
  2. sync_nonce() only runs at startup and on health check failure — there is no periodic nonce re-sync, so once the nonce counter gets out of sync it stays wrong until restart.
  3. Default failure_threshold=3 is very low when shared across many relayers, making cascading pauses easy to trigger.

Steps to reproduce

  1. Configure a relayer with multiple RPC providers and Redis-based transaction counter
  2. Submit a transaction that gets resubmitted multiple times (e.g., slow chain causing status check timeouts and repricing)
  3. An earlier hash gets mined on-chain, but the relayer tracks a later resubmission hash as "current"
  4. The next status-check-triggered resubmission attempts use the already-consumed nonce
  5. RPC returns -32603: Transaction nonce too low
  6. Relayer retries 3x per provider, marks each as failed, fails over to next provider
  7. All providers become paused → relayer is unable to process any transactions
  8. Only a container restart (which triggers sync_nonce()) resolves the issue

Application logs

2026-03-02T08:00:22.947Z WARN transaction_submission_handler: rpc call failed operation_name=send_raw_transaction provider_url=https://[REDACTED] attempt=1 max_retries=3 error=JSON-RPC error (code -32603): Transaction nonce too low retriable=true tx_id=43b7bb6c relayer_id=feed_provider_143_1 command=Resubmit

2026-03-02T08:00:25.900Z WARN transaction_submission_handler: all retry attempts failed, marking as failed and switching to next provider max_retries=3 provider_url=https://[REDACTED] error=JSON-RPC error (code -32603): Transaction nonce too low failover_count=1 max_failovers=3 tx_id=43b7bb6c relayer_id=feed_provider_143_1 command=Resubmit

2026-03-02T08:00:31.726Z ERROR transaction_submission_handler: rpc call failed after attempts across providers operation_name=send_raw_transaction total_attempts=12 failover_count=4 error=JSON-RPC error (code -32603): Transaction nonce too low

Suggested fix

  1. In is_retriable_error, make -32603 conditionally retriable based on error message:

    • "nonce too low" → non-retriable
    • Other -32603 messages → retriable (keep current behavior)
  2. should_mark_provider_failed should return false for nonce-related errors

  3. Consider triggering a nonce re-sync when "nonce too low" is received, rather than just failing the transaction

  4. Consider making failure_threshold configurable per-relayer or making RpcHealthStore per-relayer instead of global

Version Information

openzeppelin-relayer 1.3.0

Network Type

EVM (Monad Mainnet, chain ID 143)

Deployment Type

Docker container (ECS)

Platform

Linux (x86)

Metadata

Metadata

Assignees

No one assigned

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions