-
Notifications
You must be signed in to change notification settings - Fork 57
Nonce counter management bugs: RESET_STORAGE_ON_START doesn't clear counters, no periodic re-sync, INCR race condition #683
Description
Describe the bug
The nonce counter management system (RedisTransactionCounter) has several bugs that cause permanent nonce desync, making all new transactions fail with wrong nonces until manual intervention.
Bug 1: RESET_STORAGE_ON_START does not clear transaction counter Redis keys
When RESET_STORAGE_ON_START=true is set, the relayer clears repository data (transaction store, policy store) but does not clear the transaction_counter Redis keys (pattern: {prefix}:transaction_counter:{relayer_id}:{address}).
If a relayer's nonce counter gets inflated (e.g. from stuck/retried transactions), the bad counter persists across restarts even with RESET_STORAGE_ON_START=true. Since sync_nonce() uses max(on_chain_nonce, redis_counter), the inflated Redis counter always wins and the relayer continues using wrong nonces.
Bug 2: sync_nonce only runs at startup and on health check failure
sync_nonce() (in src/domain/relayer/evm/evm_relayer.rs) only executes:
- At relayer startup
- When a health check fails
There is no periodic nonce re-synchronization. If the nonce counter drifts out of sync during normal operation (e.g., a transaction gets mined via a different path, or a resubmission succeeds with an earlier nonce), the counter stays wrong indefinitely.
Bug 3: Race condition when manually deleting counter key
Even manually deleting the Redis transaction counter key doesn't reliably fix the issue. The RedisTransactionCounter uses INCR to atomically increment the counter for each new transaction. If a new transaction is submitted between the key deletion and the next sync_nonce() call, INCR on a non-existent key creates it at value 1 (essentially nonce 0), which is also wrong. The nonce needs to be re-synced before the next transaction is submitted.
Relationship to cascading provider pausing
These nonce management bugs are the root cause that leads to the cascading provider pausing described in #681. When the nonce counter is out of sync, every send_raw_transaction returns "Transaction nonce too low" (-32603), which the retry logic misclassifies as a provider health issue.
Steps to reproduce
Bug 1 (RESET_STORAGE_ON_START):
- Run a relayer with Redis-based transaction counter
- Submit transactions — counter increments in Redis
- Some transactions get stuck/retried, inflating the counter above on-chain nonce
- Restart with
RESET_STORAGE_ON_START=true - Observe: transaction store is cleared, but
transaction_counterkey still has inflated value sync_nonce()picksmax(on_chain, redis_counter)= inflated value- New transactions use wrong nonce and fail
Bug 2 (No periodic sync):
- Run a relayer normally
- A transaction with nonce N gets resubmitted multiple times (different gas prices)
- An earlier hash (nonce N) gets mined
- The counter still has
N + resubmission_countas next nonce - All subsequent transactions fail with nonce too high/low
- No recovery without restart or health check failure
Bug 3 (INCR race condition):
- Identify a relayer with inflated nonce counter in Redis
- Delete the key:
DEL {prefix}:transaction_counter:{relayer_id}:{address} - Before
sync_nonce()runs, a new transaction submission callsINCRon the (now missing) key - Redis creates key with value 1 (nonce 0) — also wrong
- Nonce is still desynced
Suggested fix
- RESET_STORAGE_ON_START should also clear transaction counter keys — include
transaction_counter:*pattern in the reset logic - Add periodic nonce re-sync — run
sync_nonce()on a configurable interval (e.g., every 60 seconds), not just at startup - Trigger nonce re-sync on "nonce too low" errors — when
send_raw_transactionreturns a nonce error, immediately re-sync from chain before the next transaction attempt - Use SET instead of INCR for nonce management — or add a check-and-sync mechanism that detects drift
Version Information
openzeppelin-relayer 1.3.0
Network Type
EVM (Monad Mainnet, chain ID 143)
Deployment Type
Docker container (ECS)
Platform
Linux (x86)