-
Notifications
You must be signed in to change notification settings - Fork 54
Description
Describe the bug
When send_raw_transaction receives a JSON-RPC error with code -32603 (Internal Error) and message "Transaction nonce too low", the relayer classifies it as a retriable error. Since this error is caused by a stale nonce (transaction already mined under a different hash), retrying the same signed transaction with the same nonce against different providers always produces the same error. This leads to:
- 3 retries per provider (same stale nonce, same error every time — the RPC is correctly rejecting it)
- Provider marked as failed after retries exhaust (
mark_current_as_failed()) - Failover to next provider → same nonce error → that provider also marked as failed
- All providers paused within seconds (default
failure_threshold=3is easily exceeded) - Relayer deadlocked —
"No non-paused providers available"warning, falls back to paused providers which also fail and extend the pause window
The root cause is in src/services/provider/mod.rs, is_retriable_error():
ProviderError::RpcErrorCode { code, .. } => {
match code {
// -32603: Internal error (may be temporary)
-32603 => true, // <-- This treats ALL -32603 as retriable
}
}Code -32603 is a generic "Internal error" umbrella. "Transaction nonce too low" is a transaction-level error, not a provider health issue. The RPC endpoint is working correctly — it is properly rejecting a transaction with a stale nonce. Retrying the same tx against other providers and marking them as failed is incorrect behavior.
Additional contributing factors
RpcHealthStoreis a global singleton — all relayers share the same health state. With many relayers (we run 40), nonce-too-low errors from just a few relayers quickly cascade to pause all providers for every relayer.sync_nonce()only runs at startup and on health check failure — there is no periodic nonce re-sync, so once the nonce counter gets out of sync it stays wrong until restart.- Default
failure_threshold=3is very low when shared across many relayers, making cascading pauses easy to trigger.
Steps to reproduce
- Configure a relayer with multiple RPC providers and Redis-based transaction counter
- Submit a transaction that gets resubmitted multiple times (e.g., slow chain causing status check timeouts and repricing)
- An earlier hash gets mined on-chain, but the relayer tracks a later resubmission hash as "current"
- The next status-check-triggered resubmission attempts use the already-consumed nonce
- RPC returns
-32603: Transaction nonce too low - Relayer retries 3x per provider, marks each as failed, fails over to next provider
- All providers become paused → relayer is unable to process any transactions
- Only a container restart (which triggers
sync_nonce()) resolves the issue
Application logs
2026-03-02T08:00:22.947Z WARN transaction_submission_handler: rpc call failed operation_name=send_raw_transaction provider_url=https://[REDACTED] attempt=1 max_retries=3 error=JSON-RPC error (code -32603): Transaction nonce too low retriable=true tx_id=43b7bb6c relayer_id=feed_provider_143_1 command=Resubmit
2026-03-02T08:00:25.900Z WARN transaction_submission_handler: all retry attempts failed, marking as failed and switching to next provider max_retries=3 provider_url=https://[REDACTED] error=JSON-RPC error (code -32603): Transaction nonce too low failover_count=1 max_failovers=3 tx_id=43b7bb6c relayer_id=feed_provider_143_1 command=Resubmit
2026-03-02T08:00:31.726Z ERROR transaction_submission_handler: rpc call failed after attempts across providers operation_name=send_raw_transaction total_attempts=12 failover_count=4 error=JSON-RPC error (code -32603): Transaction nonce too low
Suggested fix
-
In
is_retriable_error, make-32603conditionally retriable based on error message:- "nonce too low" → non-retriable
- Other
-32603messages → retriable (keep current behavior)
-
should_mark_provider_failedshould returnfalsefor nonce-related errors -
Consider triggering a nonce re-sync when "nonce too low" is received, rather than just failing the transaction
-
Consider making
failure_thresholdconfigurable per-relayer or makingRpcHealthStoreper-relayer instead of global
Version Information
openzeppelin-relayer 1.3.0
Network Type
EVM (Monad Mainnet, chain ID 143)
Deployment Type
Docker container (ECS)
Platform
Linux (x86)