Nonce counter management bugs: RESET_STORAGE_ON_START doesn't clear counters, no periodic re-sync, INCR race condition

## Describe the bug

The nonce counter management system (`RedisTransactionCounter`) has several bugs that cause permanent nonce desync, making all new transactions fail with wrong nonces until manual intervention.

### Bug 1: RESET_STORAGE_ON_START does not clear transaction counter Redis keys

When `RESET_STORAGE_ON_START=true` is set, the relayer clears repository data (transaction store, policy store) but does **not** clear the `transaction_counter` Redis keys (pattern: `{prefix}:transaction_counter:{relayer_id}:{address}`).

If a relayer's nonce counter gets inflated (e.g. from stuck/retried transactions), the bad counter persists across restarts even with `RESET_STORAGE_ON_START=true`. Since `sync_nonce()` uses `max(on_chain_nonce, redis_counter)`, the inflated Redis counter always wins and the relayer continues using wrong nonces.

### Bug 2: sync_nonce only runs at startup and on health check failure

`sync_nonce()` (in `src/domain/relayer/evm/evm_relayer.rs`) only executes:
1. At relayer startup
2. When a health check fails

There is **no periodic nonce re-synchronization**. If the nonce counter drifts out of sync during normal operation (e.g., a transaction gets mined via a different path, or a resubmission succeeds with an earlier nonce), the counter stays wrong indefinitely.

### Bug 3: Race condition when manually deleting counter key

Even manually deleting the Redis transaction counter key doesn't reliably fix the issue. The `RedisTransactionCounter` uses `INCR` to atomically increment the counter for each new transaction. If a new transaction is submitted between the key deletion and the next `sync_nonce()` call, `INCR` on a non-existent key creates it at value 1 (essentially nonce 0), which is also wrong. The nonce needs to be re-synced before the next transaction is submitted.

### Relationship to cascading provider pausing

These nonce management bugs are the root cause that leads to the cascading provider pausing described in #681. When the nonce counter is out of sync, every `send_raw_transaction` returns "Transaction nonce too low" (-32603), which the retry logic misclassifies as a provider health issue.

## Steps to reproduce

### Bug 1 (RESET_STORAGE_ON_START):
1. Run a relayer with Redis-based transaction counter
2. Submit transactions — counter increments in Redis
3. Some transactions get stuck/retried, inflating the counter above on-chain nonce
4. Restart with `RESET_STORAGE_ON_START=true`
5. Observe: transaction store is cleared, but `transaction_counter` key still has inflated value
6. `sync_nonce()` picks `max(on_chain, redis_counter)` = inflated value
7. New transactions use wrong nonce and fail

### Bug 2 (No periodic sync):
1. Run a relayer normally
2. A transaction with nonce N gets resubmitted multiple times (different gas prices)
3. An earlier hash (nonce N) gets mined
4. The counter still has `N + resubmission_count` as next nonce
5. All subsequent transactions fail with nonce too high/low
6. No recovery without restart or health check failure

### Bug 3 (INCR race condition):
1. Identify a relayer with inflated nonce counter in Redis
2. Delete the key: `DEL {prefix}:transaction_counter:{relayer_id}:{address}`
3. Before `sync_nonce()` runs, a new transaction submission calls `INCR` on the (now missing) key
4. Redis creates key with value 1 (nonce 0) — also wrong
5. Nonce is still desynced

## Suggested fix

1. **RESET_STORAGE_ON_START should also clear transaction counter keys** — include `transaction_counter:*` pattern in the reset logic
2. **Add periodic nonce re-sync** — run `sync_nonce()` on a configurable interval (e.g., every 60 seconds), not just at startup
3. **Trigger nonce re-sync on "nonce too low" errors** — when `send_raw_transaction` returns a nonce error, immediately re-sync from chain before the next transaction attempt
4. **Use SET instead of INCR for nonce management** — or add a check-and-sync mechanism that detects drift

## Version Information

openzeppelin-relayer 1.3.0

## Network Type

EVM (Monad Mainnet, chain ID 143)

## Deployment Type

Docker container (ECS)

## Platform

Linux (x86)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nonce counter management bugs: RESET_STORAGE_ON_START doesn't clear counters, no periodic re-sync, INCR race condition #683

Describe the bug

Bug 1: RESET_STORAGE_ON_START does not clear transaction counter Redis keys

Bug 2: sync_nonce only runs at startup and on health check failure

Bug 3: Race condition when manually deleting counter key

Relationship to cascading provider pausing

Steps to reproduce

Bug 1 (RESET_STORAGE_ON_START):

Bug 2 (No periodic sync):

Bug 3 (INCR race condition):

Suggested fix

Version Information

Network Type

Deployment Type

Platform

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Nonce counter management bugs: RESET_STORAGE_ON_START doesn't clear counters, no periodic re-sync, INCR race condition #683

Description

Describe the bug

Bug 1: RESET_STORAGE_ON_START does not clear transaction counter Redis keys

Bug 2: sync_nonce only runs at startup and on health check failure

Bug 3: Race condition when manually deleting counter key

Relationship to cascading provider pausing

Steps to reproduce

Bug 1 (RESET_STORAGE_ON_START):

Bug 2 (No periodic sync):

Bug 3 (INCR race condition):

Suggested fix

Version Information

Network Type

Deployment Type

Platform

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions