Nonce-too-low (-32603) misclassified as retriable error, causing cascading provider pausing and relayer deadlock

## Describe the bug

When `send_raw_transaction` receives a JSON-RPC error with code `-32603` (Internal Error) and message "Transaction nonce too low", the relayer classifies it as a retriable error. Since this error is caused by a stale nonce (transaction already mined under a different hash), retrying the same signed transaction with the same nonce against different providers always produces the same error. This leads to:

1. **3 retries per provider** (same stale nonce, same error every time — the RPC is correctly rejecting it)
2. **Provider marked as failed** after retries exhaust (`mark_current_as_failed()`)
3. **Failover to next provider** → same nonce error → that provider also marked as failed
4. **All providers paused** within seconds (default `failure_threshold=3` is easily exceeded)
5. **Relayer deadlocked** — `"No non-paused providers available"` warning, falls back to paused providers which also fail and extend the pause window

The root cause is in `src/services/provider/mod.rs`, `is_retriable_error()`:

```rust
ProviderError::RpcErrorCode { code, .. } => {
    match code {
        // -32603: Internal error (may be temporary)
        -32603 => true,  // <-- This treats ALL -32603 as retriable
    }
}
```

Code `-32603` is a generic "Internal error" umbrella. "Transaction nonce too low" is a **transaction-level error**, not a provider health issue. The RPC endpoint is working correctly — it is properly rejecting a transaction with a stale nonce. Retrying the same tx against other providers and marking them as failed is incorrect behavior.

### Additional contributing factors

1. **`RpcHealthStore` is a global singleton** — all relayers share the same health state. With many relayers (we run 40), nonce-too-low errors from just a few relayers quickly cascade to pause all providers for every relayer.
2. **`sync_nonce()` only runs at startup and on health check failure** — there is no periodic nonce re-sync, so once the nonce counter gets out of sync it stays wrong until restart.
3. **Default `failure_threshold=3`** is very low when shared across many relayers, making cascading pauses easy to trigger.

## Steps to reproduce

1. Configure a relayer with multiple RPC providers and Redis-based transaction counter
2. Submit a transaction that gets resubmitted multiple times (e.g., slow chain causing status check timeouts and repricing)
3. An earlier hash gets mined on-chain, but the relayer tracks a later resubmission hash as "current"
4. The next status-check-triggered resubmission attempts use the already-consumed nonce
5. RPC returns `-32603: Transaction nonce too low`
6. Relayer retries 3x per provider, marks each as failed, fails over to next provider
7. All providers become paused → relayer is unable to process any transactions
8. Only a container restart (which triggers `sync_nonce()`) resolves the issue

## Application logs

```text
2026-03-02T08:00:22.947Z WARN transaction_submission_handler: rpc call failed operation_name=send_raw_transaction provider_url=https://[REDACTED] attempt=1 max_retries=3 error=JSON-RPC error (code -32603): Transaction nonce too low retriable=true tx_id=43b7bb6c relayer_id=feed_provider_143_1 command=Resubmit

2026-03-02T08:00:25.900Z WARN transaction_submission_handler: all retry attempts failed, marking as failed and switching to next provider max_retries=3 provider_url=https://[REDACTED] error=JSON-RPC error (code -32603): Transaction nonce too low failover_count=1 max_failovers=3 tx_id=43b7bb6c relayer_id=feed_provider_143_1 command=Resubmit

2026-03-02T08:00:31.726Z ERROR transaction_submission_handler: rpc call failed after attempts across providers operation_name=send_raw_transaction total_attempts=12 failover_count=4 error=JSON-RPC error (code -32603): Transaction nonce too low
```

## Suggested fix

1. In `is_retriable_error`, make `-32603` conditionally retriable based on error message:
   - "nonce too low" → **non-retriable**
   - Other `-32603` messages → retriable (keep current behavior)

2. `should_mark_provider_failed` should return `false` for nonce-related errors

3. Consider triggering a nonce re-sync when "nonce too low" is received, rather than just failing the transaction

4. Consider making `failure_threshold` configurable per-relayer or making `RpcHealthStore` per-relayer instead of global

## Version Information

openzeppelin-relayer 1.3.0

## Network Type

EVM (Monad Mainnet, chain ID 143)

## Deployment Type

Docker container (ECS)

## Platform

Linux (x86)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nonce-too-low (-32603) misclassified as retriable error, causing cascading provider pausing and relayer deadlock #681

Describe the bug

Additional contributing factors

Steps to reproduce

Application logs

Suggested fix

Version Information

Network Type

Deployment Type

Platform

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Nonce-too-low (-32603) misclassified as retriable error, causing cascading provider pausing and relayer deadlock #681

Description

Describe the bug

Additional contributing factors

Steps to reproduce

Application logs

Suggested fix

Version Information

Network Type

Deployment Type

Platform

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions