Skip to content

Conversation

@m-Peter
Copy link
Collaborator

@m-Peter m-Peter commented Nov 24, 2025

Closes: #922

Description

To avoid the noise from all the invalid transactions from trading bots, we keep track of the last 15 submitted transaction hashes, on a 10 second interval, with the aim to reject incoming transactions that match any of the tracked hashes.


For contributor use:

  • Targeted PR against master branch
  • Linked to Github issue with discussion and accepted design OR link to spec that describes this work.
  • Code follows the standards mentioned here.
  • Updated relevant documentation
  • Re-reviewed Files changed in the Github PR explorer
  • Added appropriate labels

Summary by CodeRabbit

  • Bug Fixes

    • Switched duplicate detection to nonce-based per-account activity metadata for more reliable batching and deduplication.
    • Removed a previously exported duplicate-transaction error from the public surface.
    • Suppressed noisy logging when resolving block tags.
  • Refactor

    • Simplified batching and single-submit flows, removed per-instance locking, and standardized log field names.
    • Adjusted cached activity and transaction tracking to support nonce history and batching semantics.
  • Tests

    • Rewrote batching tests to submit multiple nonces in parallel, verify final balances, and updated timing.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 24, 2025

Walkthrough

Replaces time-only EOA activity tracking with per-EOA metadata (last submission time + recent nonces) for nonce-based duplicate detection; removes exported ErrDuplicateTransaction; drops SingleTxPool mutex in favor of keystore channel usage; standardizes log fields and updates tests for the new batching behavior.

Changes

Cohort / File(s) Summary
Error constant removed
models/errors/errors.go
Removed exported ErrDuplicateTransaction.
Batching & EOA activity
services/requester/batch_tx_pool.go
Introduced eoaActivityMetadata (lastSubmission, txNonces); changed eoaActivity cache to store metadata; removed txHash from pooledEvmTx; added nonce-based duplicate detection and updated Add/process flows, logging, constants eoaActivityCacheSize / maxTrackedTxNoncesPerEOA, and public assertions/initialization.
SingleTxPool concurrency & signing
services/requester/single_tx_pool.go
Removed mux sync.Mutex; removed fetchSigningAccountKey(); use keystore.Take() for signing; derive sender (DeriveTxSender) for logs/errors; updated interface assertion style.
Requester logging
services/requester/requester.go
Renamed log field "evm-id""evm_tx" in SendRawTransaction.
API error logging
api/utils.go
Removed logging side-effects from resolveBlockTag error paths (now returns errors without logging).
Tests updated
tests/tx_batching_test.go
Renamed test; reworked to submit nonces [0,1,2,3,2,3,4,5] with parallel submissions; switched to final balance-based assertion; changed TxStateValidation to LocalIndexValidation; increased TxBatchInterval to 2.5s and added indexing delay in setup.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Client
    participant BatchTxPool
    participant EOAActivityCache
    participant Requester
    participant FlowNode
    Note over BatchTxPool,EOAActivityCache: Add path with nonce-based dedupe
    Client->>BatchTxPool: Submit tx (includes nonce)
    BatchTxPool->>EOAActivityCache: Load metadata for EOA
    alt metadata not found
        BatchTxPool->>Requester: submitSingleTransaction(tx)
    else metadata exists
        alt time since lastSubmission > TxBatchInterval
            BatchTxPool->>BatchTxPool: decide submitSingle or append to batch
            BatchTxPool->>Requester: submitBatch or submitSingleTransaction
        else within interval
            BatchTxPool->>BatchTxPool: append tx to pooled batch (track nonce)
        end
    end
    Requester->>FlowNode: Send transaction(s)
    FlowNode-->>Requester: response / error
    Requester-->>BatchTxPool: result
    BatchTxPool->>EOAActivityCache: update lastSubmission and txNonces (prune)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25–40 minutes

  • Focus review areas:
    • services/requester/batch_tx_pool.go — metadata lifecycle, nonce-tracking/pruning, cache semantics, batching decisions, and error/retry semantics.
    • services/requester/single_tx_pool.go — ensure keystore.Take() usage preserves concurrency/safety and removing the mutex doesn't regress signing correctness.
    • tests/tx_batching_test.go — timing/config changes and determinism of balance-based assertions and indexing delay.

Possibly related issues

  • #857 — Implements in-memory per-EOA nonce tracking which aligns with this PR’s eoaActivityMetadata.txNonces and duplicate-detection changes.
  • #922 — Requests rejecting invalid/duplicate EVM txs at gateway level; nonce-based dedupe here advances that objective.

Possibly related PRs

  • #916 — Previously added txHash-based duplicate checks and ErrDuplicateTransaction; this PR removes that approach and replaces it with nonce-based metadata (directly related/conflicting intent).
  • #835 — Earlier BatchTxPool changes to EOA activity and submitSingle flow; this PR evolves those areas into metadata-based tracking.
  • #852 — Modifies BatchTxPool single-tx submission and EOA activity handling; overlaps with this PR's refactor.

Suggested reviewers

  • zhangchiqing
  • janezpodhostnik
  • peterargue

Poem

🐰 I stored my nonces snug and neat,
No duplicate hops on every street.
The mutex sleeps while keystore sings,
Batches gather on nimble wings.
Hooray — the gateway hums with tidy beats!

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Out of Scope Changes check ⚠️ Warning Changes across files focus on implementing duplicate transaction detection and rejection, with refactoring to support nonce tracking and metadata management. One unrelated change is present: removing error logging in api/utils.go's resolveBlockTag function. The logging removal in api/utils.go appears unrelated to duplicate tx rejection. Either move this change to a separate PR or clarify its connection to the primary objective.
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately reflects the main change in this PR: implementing duplicate transaction rejection to prevent previously submitted transactions from being resubmitted to the tx pool.
Linked Issues check ✅ Passed The PR implements nonce-based duplicate detection and tracking of submitted transactions to reject invalid submissions, directly addressing issue #922's goal of reducing invalid transaction submissions to the Flow network.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch mpeter/submitted-tx-validations

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
services/requester/batch_tx_pool.go (2)

26-29: EOA activity metadata and cache TTL align with duplicate-detection goals

Using eoaActivityMetadata with a capped txHashes slice and a 10s eoaActivityCacheTTL matches the “last 15 tx hashes over ~10s” requirement and keeps lookup cost bounded. If you expect operators to tune the duplicate window over time, consider deriving eoaActivityCacheTTL from config instead of hardcoding 10 * time.Second, but the current choice is reasonable for now.

Also applies to: 36-39, 59-60, 89-93


135-149: Consider recording EOA activity only when submission succeeds

The duplicate check via slices.Contains(lastActivity.txHashes, txHash) is straightforward and efficient with the 15-hash cap. One nuance is that you always update lastActivity and add the hash to eoaActivity, even if submitSingleTransaction (or upstream build/send) fails. That means a client retrying after a transient internal error can be rejected as “already submitted” for up to the TTL, even though the transaction never made it to Flow.

If you want duplicate protection to apply only to successfully accepted submissions (single or future batched), you can gate the metadata update on err == nil:

-	// Update metadata for the last EOA activity
-	lastActivity.submittedAt = time.Now()
-	lastActivity.txHashes = append(lastActivity.txHashes, txHash)
-	// To avoid the slice of hashes from growing indefinitely,
-	// maintain only a handful of the last tx hashes.
-	if len(lastActivity.txHashes) > 15 {
-		lastActivity.txHashes = lastActivity.txHashes[1:]
-	}
-
-	t.eoaActivity.Add(from, lastActivity)
-
-	return err
+	// Update metadata for the last EOA activity only on successful add/submit.
+	if err == nil {
+		lastActivity.submittedAt = time.Now()
+		lastActivity.txHashes = append(lastActivity.txHashes, txHash)
+		// To avoid the slice of hashes from growing indefinitely,
+		// maintain only a handful of the last tx hashes.
+		if len(lastActivity.txHashes) > 15 {
+			lastActivity.txHashes = lastActivity.txHashes[1:]
+		}
+
+		t.eoaActivity.Add(from, lastActivity)
+	}
+
+	return err

This keeps the rejection behavior focused on truly duplicated submissions rather than transient failures.

Also applies to: 170-170, 175-175, 179-189

tests/tx_batching_test.go (2)

517-568: Duplicate-submission test accurately exercises hash-based rejection

The nonce pattern and assertions (exactly two errors, each containing the expected “already submitted” hash) give good coverage of the new hash-based duplicate detection while still verifying that all non-duplicate transactions execute successfully. The only minor trade-off is that the test is now coupled to specific precomputed hashes; if chain ID or signing details change, you’ll need to update those literals. If you want to reduce that coupling, you could derive the expected duplicate hashes inline from the first submissions before asserting on the error messages.


334-335: Update comments to reflect the new 2.5s TxBatchInterval in tests

setupGatewayNode now sets TxBatchInterval to 2.5 seconds, but the explanatory comments in the “recent” and “non-recent” interval tests still mention 2 seconds / 3 seconds, which can be misleading when someone tweaks the interval again.

Consider updating the comments to reference the actual configured value instead of hardcoded numbers, for example:

-	// For the E2E tests the `cfg.TxBatchInterval` is equal
-	// to 2 seconds.
+	// For the E2E tests, `cfg.TxBatchInterval` is currently
+	// set in setupGatewayNode (2.5 seconds at the time of writing).

and similarly for the “3 seconds” wording in the non-recent test.

Also applies to: 448-449, 609-610

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bbeb971 and f7a3610.

📒 Files selected for processing (3)
  • models/errors/errors.go (1 hunks)
  • services/requester/batch_tx_pool.go (6 hunks)
  • tests/tx_batching_test.go (3 hunks)
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-06-19T11:36:25.478Z
Learnt from: m-Peter
Repo: onflow/flow-evm-gateway PR: 831
File: services/requester/batch_tx_pool.go:0-0
Timestamp: 2025-06-19T11:36:25.478Z
Learning: In Go, when copying maps that contain slices (like `map[gethCommon.Address][]pooledEvmTx`), perform deep copies by iterating over the map and copying each slice individually using `make()` and `copy()` to avoid shared references that could lead to race conditions and data corruption.

Applied to files:

  • services/requester/batch_tx_pool.go
📚 Learning: 2025-06-17T10:29:35.941Z
Learnt from: m-Peter
Repo: onflow/flow-evm-gateway PR: 831
File: services/requester/pool.go:202-214
Timestamp: 2025-06-17T10:29:35.941Z
Learning: In Flow EVM Gateway's transaction pool (services/requester/pool.go), signing keys acquired with keystore.Take() are intentionally held until the transaction is sealed, not released immediately after sending. The key release happens asynchronously when sealed transactions are ingested via event_subscriber.go NotifyTransaction() -> keystore unsafeUnlockKey() -> key.Done(). This pattern ensures transaction integrity by preventing key reuse before transaction finalization.

Applied to files:

  • services/requester/batch_tx_pool.go
  • tests/tx_batching_test.go
📚 Learning: 2025-11-12T13:21:59.080Z
Learnt from: m-Peter
Repo: onflow/flow-evm-gateway PR: 890
File: tests/integration_test.go:692-707
Timestamp: 2025-11-12T13:21:59.080Z
Learning: In the flow-evm-gateway integration tests (tests/integration_test.go), the gateway's RPC server continues to operate independently even after the context passed to bootstrap.Run() is cancelled. Context cancellation affects upstream dependencies (like the Emulator acting as an Access Node) but does not immediately shut down the RPC server, allowing tests to verify backup failover scenarios.

Applied to files:

  • tests/tx_batching_test.go
🧬 Code graph analysis (1)
services/requester/batch_tx_pool.go (3)
services/requester/single_tx_pool.go (1)
  • SingleTxPool (28-41)
services/requester/tx_pool.go (1)
  • TxPool (16-18)
models/errors/errors.go (1)
  • ErrInvalid (23-23)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Test
🔇 Additional comments (1)
models/errors/errors.go (1)

31-32: Transaction error constants remain coherent after duplicate-error removal

ErrFailedTransaction and ErrInvalidTransaction are composed correctly, and removing a dedicated duplicate error in favor of generic ErrInvalid for pre-submission checks is consistent with the new batch pool behavior, assuming all references to the old duplicate error were cleaned up.

@m-Peter m-Peter force-pushed the mpeter/submitted-tx-validations branch from f7a3610 to 582d698 Compare November 24, 2025 16:30
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
services/requester/batch_tx_pool.go (2)

26-29: Confirm EOA activity TTL vs configurable batch interval

You’re hard-coding eoaActivityCacheTTL to 10 seconds while batching behavior still depends on the configurable TxBatchInterval. That’s probably fine (issue description mentions a 10-second window), but it means:

  • Duplicate detection will never look back further than 10 seconds, even if TxBatchInterval is larger.
  • Memory usage is capped by 10k EOAs × 15 hashes for that 10s window.

If operators ever run with a much larger TxBatchInterval, you may want to either:

  • Tie eoaActivityCacheTTL to TxBatchInterval (e.g., max(TxBatchInterval, 10s)), or
  • Document explicitly that duplicate protection is limited to a fixed 10-second horizon.

170-170: Consider extracting 15 into a named constant for tracked hashes

The pruning logic is clear, but 15 as a literal is a bit magic:

if err == nil {
    lastActivity.submittedAt = time.Now()
    lastActivity.txHashes = append(lastActivity.txHashes, txHash)
    if len(lastActivity.txHashes) > 15 {
        lastActivity.txHashes = lastActivity.txHashes[1:]
    }
    t.eoaActivity.Add(from, lastActivity)
}

To make the intent self‑documenting and adjustable, consider:

-const (
-	eoaActivityCacheSize = 10_000
-	eoaActivityCacheTTL  = time.Second * 10
-)
+const (
+	eoaActivityCacheSize       = 10_000
+	eoaActivityCacheTTL        = 10 * time.Second
+	maxTrackedTxHashesPerEOA   = 15
+)
-	if err == nil {
-		lastActivity.submittedAt = time.Now()
-		lastActivity.txHashes = append(lastActivity.txHashes, txHash)
-		if len(lastActivity.txHashes) > 15 {
-			lastActivity.txHashes = lastActivity.txHashes[1:]
-		}
-
-		t.eoaActivity.Add(from, lastActivity)
-	}
+	if err == nil {
+		lastActivity.submittedAt = time.Now()
+		lastActivity.txHashes = append(lastActivity.txHashes, txHash)
+		if len(lastActivity.txHashes) > maxTrackedTxHashesPerEOA {
+			lastActivity.txHashes = lastActivity.txHashes[1:]
+		}
+
+		t.eoaActivity.Add(from, lastActivity)
+	}

Pure readability/maintainability improvement; behavior stays the same.

Also applies to: 179-190

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f7a3610 and 582d698.

📒 Files selected for processing (3)
  • models/errors/errors.go (1 hunks)
  • services/requester/batch_tx_pool.go (6 hunks)
  • tests/tx_batching_test.go (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • models/errors/errors.go
  • tests/tx_batching_test.go
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-06-19T11:36:25.478Z
Learnt from: m-Peter
Repo: onflow/flow-evm-gateway PR: 831
File: services/requester/batch_tx_pool.go:0-0
Timestamp: 2025-06-19T11:36:25.478Z
Learning: In Go, when copying maps that contain slices (like `map[gethCommon.Address][]pooledEvmTx`), perform deep copies by iterating over the map and copying each slice individually using `make()` and `copy()` to avoid shared references that could lead to race conditions and data corruption.

Applied to files:

  • services/requester/batch_tx_pool.go
📚 Learning: 2025-06-17T10:29:35.941Z
Learnt from: m-Peter
Repo: onflow/flow-evm-gateway PR: 831
File: services/requester/pool.go:202-214
Timestamp: 2025-06-17T10:29:35.941Z
Learning: In Flow EVM Gateway's transaction pool (services/requester/pool.go), signing keys acquired with keystore.Take() are intentionally held until the transaction is sealed, not released immediately after sending. The key release happens asynchronously when sealed transactions are ingested via event_subscriber.go NotifyTransaction() -> keystore unsafeUnlockKey() -> key.Done(). This pattern ensures transaction integrity by preventing key reuse before transaction finalization.

Applied to files:

  • services/requester/batch_tx_pool.go
🧬 Code graph analysis (1)
services/requester/batch_tx_pool.go (3)
services/requester/single_tx_pool.go (1)
  • SingleTxPool (28-41)
services/requester/tx_pool.go (1)
  • TxPool (16-18)
models/errors/errors.go (1)
  • ErrInvalid (23-23)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Test
🔇 Additional comments (2)
services/requester/batch_tx_pool.go (2)

36-39: EOA activity metadata + LRU wiring looks consistent

Using eoaActivityMetadata{submittedAt, txHashes} in an expirable.LRU keyed by EOA, plus the var _ TxPool = (*BatchTxPool)(nil) assertion, cleanly encapsulates:

  • Batching recency (submittedAt vs TxBatchInterval), and
  • Short-horizon duplicate detection (txHashes bounded by cache TTL and array size).

The wiring in NewBatchTxPool (shared SingleTxPool, map for pooledTxs, mutex, and the expirable LRU) looks coherent with existing concurrency patterns and should behave deterministically under load.

Also applies to: 57-60, 62-62, 89-99


135-148: I'll verify the review comment by examining the codebase and checking the Hash type behavior.

Now let me verify the code file exists and check the actual implementation:

Incorrect review comment — %s formatting is correct for common.Hash

The review comment mischaracterizes how Go's fmt package formats common.Hash. The Hash type implements a custom Format method that handles the 's' verb by writing the hex representation with a "0x" prefix. Since Hash implements fmt.Formatter, the %s verb will produce the proper hex string output (e.g., "0x1234..."), not the artifact mentioned in the comment.

The String method implements the stringer interface and produces identical output. The current code is already correct; no changes are needed.

Likely an incorrect or invalid review comment.

@m-Peter m-Peter force-pushed the mpeter/submitted-tx-validations branch from 582d698 to 86c6bda Compare November 24, 2025 16:44
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
services/requester/batch_tx_pool.go (1)

136-149: Duplicate‑submission check and metadata update align with design; consider Publish ordering as a small follow‑up

The new flow in Add:

  • Computes txHash and rejects when found && slices.Contains(lastActivity.txHashes, txHash), so only byte‑identical re‑submissions from the same EOA are blocked.
  • Uses lastActivity.submittedAt with TxBatchInterval to decide whether to batch or send individually, and
  • On success, records submittedAt = time.Now(), appends txHash, and prunes to the last maxTrackedTxHashesPerEOA before writing back to eoaActivity.

This cleanly achieves the “reject already‑submitted txs within a short, bounded window” goal and keeps batching behavior intact, including the first‑time !found case (where the zero‑value metadata is safely initialized on success).

One minor consideration: t.txPublisher.Publish(tx) is still called before the dedup check, so duplicate submissions that are rejected at this point will still generate “pending transaction” events downstream. If you don’t want observers to see events for txs that are immediately rejected, you could move the Publish call to just after this validation block.

Also applies to: 171-177, 180-191

tests/tx_batching_test.go (2)

335-336: TxBatchInterval test configuration and comments mostly consistent; one stale reference

Setting TxBatchInterval to time.Millisecond * 2500 in setupGatewayNode and documenting 2.5s in the “recent interval” test comment keeps the test environment aligned with mainnet behavior and with the batching logic in BatchTxPool. The only thing to watch for is that the comment in Test_MultipleTransactionSubmissionsWithinNonRecentInterval still mentions a 2‑second interval; if you want the docs fully accurate, updating that comment to 2.5 seconds would avoid future confusion.

Also applies to: 609-610


517-568: Duplicate‑submission test correctly exercises the new behavior; assertions are quite strict

The new Test_TransactionSubmissionWithPreviouslySubmittedTransactions:

  • Uses a nonce pattern [0,1,2,3,2,3,4,5] to create two exact‑hash duplicates and verifies that:
    • Exactly two submissions fail, and
    • Both failures contain the expected “a tx with hash 0x… has already been submitted” messages.
  • Then ensures all successfully submitted txs receive successful receipts.

This is a solid end‑to‑end validation of the new dedup logic and confirms that non‑duplicate txs still execute. The only potential nit is test brittleness: it assumes both duplicate submissions fail and that their errors appear in errors[0] and errors[1] with those exact hash strings. If you ever change signing parameters or error formatting, the test will need updates. As a small robustness improvement, you could make the assertions order‑insensitive (e.g., check that the two expected substrings appear somewhere in the errors slice) rather than bound to specific indices.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 582d698 and 86c6bda.

📒 Files selected for processing (3)
  • models/errors/errors.go (1 hunks)
  • services/requester/batch_tx_pool.go (6 hunks)
  • tests/tx_batching_test.go (4 hunks)
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-11-12T13:21:59.080Z
Learnt from: m-Peter
Repo: onflow/flow-evm-gateway PR: 890
File: tests/integration_test.go:692-707
Timestamp: 2025-11-12T13:21:59.080Z
Learning: In the flow-evm-gateway integration tests (tests/integration_test.go), the gateway's RPC server continues to operate independently even after the context passed to bootstrap.Run() is cancelled. Context cancellation affects upstream dependencies (like the Emulator acting as an Access Node) but does not immediately shut down the RPC server, allowing tests to verify backup failover scenarios.

Applied to files:

  • tests/tx_batching_test.go
📚 Learning: 2025-06-17T10:29:35.941Z
Learnt from: m-Peter
Repo: onflow/flow-evm-gateway PR: 831
File: services/requester/pool.go:202-214
Timestamp: 2025-06-17T10:29:35.941Z
Learning: In Flow EVM Gateway's transaction pool (services/requester/pool.go), signing keys acquired with keystore.Take() are intentionally held until the transaction is sealed, not released immediately after sending. The key release happens asynchronously when sealed transactions are ingested via event_subscriber.go NotifyTransaction() -> keystore unsafeUnlockKey() -> key.Done(). This pattern ensures transaction integrity by preventing key reuse before transaction finalization.

Applied to files:

  • tests/tx_batching_test.go
  • services/requester/batch_tx_pool.go
📚 Learning: 2025-06-19T11:36:25.478Z
Learnt from: m-Peter
Repo: onflow/flow-evm-gateway PR: 831
File: services/requester/batch_tx_pool.go:0-0
Timestamp: 2025-06-19T11:36:25.478Z
Learning: In Go, when copying maps that contain slices (like `map[gethCommon.Address][]pooledEvmTx`), perform deep copies by iterating over the map and copying each slice individually using `make()` and `copy()` to avoid shared references that could lead to race conditions and data corruption.

Applied to files:

  • services/requester/batch_tx_pool.go
🧬 Code graph analysis (1)
services/requester/batch_tx_pool.go (4)
services/requester/single_tx_pool.go (1)
  • SingleTxPool (28-41)
services/requester/tx_pool.go (1)
  • TxPool (16-18)
api/pool.go (1)
  • TxPool (8-8)
models/errors/errors.go (1)
  • ErrInvalid (23-23)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Test
🔇 Additional comments (2)
models/errors/errors.go (1)

31-32: Composed ErrInvalidTransaction sentinel looks good

Defining ErrInvalidTransaction as %w: %w over ErrInvalid and ErrFailedTransaction gives you a single sentinel while preserving errors.Is matches for both underlying categories. This keeps NewInvalidTransactionError’s wrapping behavior consistent without introducing new error types.

services/requester/batch_tx_pool.go (1)

26-30: Per‑EOA activity metadata and bounded dedup window look appropriate

Storing eoaActivityMetadata{submittedAt, txHashes} in a 10k‑entry expirable LRU, with a 10s TTL and at most 15 hashes per EOA, gives you a clear, bounded dedup window (~150k hashes max in memory) and keeps the batching heuristics (based on submittedAt vs TxBatchInterval) independent from the cache’s eviction policy. This matches the PR intent of “last 15 hashes over ~10 seconds” without impacting the existing batching semantics.

Also applies to: 37-40, 60-61, 90-94

@m-Peter m-Peter force-pushed the mpeter/submitted-tx-validations branch from 86c6bda to 62deb8b Compare November 25, 2025 16:59
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
tests/tx_batching_test.go (1)

546-556: Hardcoded transaction hashes create fragile test assertions.

These assertions rely on deterministic tx hashes derived from fixed inputs (private key, nonces, amounts). While this works, any change to the signing parameters or transaction structure could silently break the test without clear indication of why.

Consider extracting the expected hashes programmatically by signing the duplicate transactions upfront and storing their hashes, then comparing against those:

// Before the loop, pre-compute expected duplicate hashes
expectedDupHashes := make(map[common.Hash]bool)
for _, nonce := range []uint64{2, 3} {
    signed, _, _ := evmSign(big.NewInt(1_000_000_000), 23_500, eoaKey, nonce, &testAddr, nil)
    expectedDupHashes[signed.Hash()] = true
}
// Then in assertions, verify errors reference hashes in expectedDupHashes
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 86c6bda and 62deb8b.

📒 Files selected for processing (3)
  • models/errors/errors.go (1 hunks)
  • services/requester/batch_tx_pool.go (5 hunks)
  • tests/tx_batching_test.go (6 hunks)
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-11-12T13:21:59.080Z
Learnt from: m-Peter
Repo: onflow/flow-evm-gateway PR: 890
File: tests/integration_test.go:692-707
Timestamp: 2025-11-12T13:21:59.080Z
Learning: In the flow-evm-gateway integration tests (tests/integration_test.go), the gateway's RPC server continues to operate independently even after the context passed to bootstrap.Run() is cancelled. Context cancellation affects upstream dependencies (like the Emulator acting as an Access Node) but does not immediately shut down the RPC server, allowing tests to verify backup failover scenarios.

Applied to files:

  • tests/tx_batching_test.go
📚 Learning: 2025-06-17T10:29:35.941Z
Learnt from: m-Peter
Repo: onflow/flow-evm-gateway PR: 831
File: services/requester/pool.go:202-214
Timestamp: 2025-06-17T10:29:35.941Z
Learning: In Flow EVM Gateway's transaction pool (services/requester/pool.go), signing keys acquired with keystore.Take() are intentionally held until the transaction is sealed, not released immediately after sending. The key release happens asynchronously when sealed transactions are ingested via event_subscriber.go NotifyTransaction() -> keystore unsafeUnlockKey() -> key.Done(). This pattern ensures transaction integrity by preventing key reuse before transaction finalization.

Applied to files:

  • tests/tx_batching_test.go
  • services/requester/batch_tx_pool.go
📚 Learning: 2025-06-19T11:36:25.478Z
Learnt from: m-Peter
Repo: onflow/flow-evm-gateway PR: 831
File: services/requester/batch_tx_pool.go:0-0
Timestamp: 2025-06-19T11:36:25.478Z
Learning: In Go, when copying maps that contain slices (like `map[gethCommon.Address][]pooledEvmTx`), perform deep copies by iterating over the map and copying each slice individually using `make()` and `copy()` to avoid shared references that could lead to race conditions and data corruption.

Applied to files:

  • services/requester/batch_tx_pool.go
🧬 Code graph analysis (2)
tests/tx_batching_test.go (1)
config/config.go (2)
  • TxStateValidation (37-37)
  • LocalIndexValidation (40-40)
services/requester/batch_tx_pool.go (4)
services/requester/single_tx_pool.go (1)
  • SingleTxPool (28-41)
services/requester/tx_pool.go (1)
  • TxPool (16-18)
api/pool.go (1)
  • TxPool (8-8)
models/errors/errors.go (1)
  • ErrInvalid (23-23)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Test
🔇 Additional comments (9)
models/errors/errors.go (1)

31-32: LGTM!

The removal of ErrDuplicateTransaction is appropriate since duplicate detection is now handled via per-EOA metadata in BatchTxPool, with the error constructed inline using errs.ErrInvalid.

tests/tx_batching_test.go (2)

620-622: Sleep-based synchronization for indexing.

The 2-second sleep runs concurrently with bootstrap.Run to allow indexing to catch up. While this works, it adds to test execution time. If flakiness occurs, consider increasing the timeout or using a more deterministic readiness check.


607-609: LGTM!

Configuration changes align with the new duplicate detection approach: LocalIndexValidation enables local index-based validation, and the 2.5-second batch interval matches mainnet behavior.

services/requester/batch_tx_pool.go (6)

26-30: LGTM!

The constants provide sensible defaults: 10-second TTL for activity tracking, 15 tracked hashes per EOA, and 10,000 cache size. These values align with the PR objective to reduce noise from bot resubmissions.


37-40: LGTM!

The eoaActivityMetadata struct cleanly encapsulates per-EOA tracking data. Using a value type with a small slice (max 15 hashes) ensures efficient cache operations.


138-148: Duplicate detection logic is correct.

The check using slices.Contains on a max-15 element slice is efficient. The error wraps errs.ErrInvalid with an informative message including the transaction hash, which helps with debugging.


170-180: Good correctness fix for transaction ordering.

The added check len(t.pooledTxs[from]) > 0 ensures that if there are pending pooled transactions for this EOA, subsequent transactions are added to the same batch rather than submitted individually. This prevents potential ordering issues where a "stale" transaction could race ahead of earlier pooled transactions.


191-200: Metadata update logic is sound.

The pattern correctly:

  1. Only updates on successful submission (after error check)
  2. Uses FIFO pruning (drops oldest hash first)
  3. Stores the updated value back to the cache

Note: When !found, eoaActivity is the zero-value of eoaActivityMetadata (empty txHashes slice, zero lastSubmission), which is correctly populated before Add.


217-220: Map swap pattern is thread-safe.

The pattern of assigning txsGroupedByAddress := t.pooledTxs and then creating a new map t.pooledTxs = make(...) under lock effectively transfers ownership of the old map (and its slice values) to the processing goroutine. This avoids the need for deep copying since Add() will only write to the new map.

Based on learnings about deep copying maps with slices, this swap approach is a valid alternative that achieves the same safety guarantees.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
services/requester/batch_tx_pool.go (3)

26-29: Consider documenting / de‑magic‑numbering cache sizing constants

eoaActivityCacheSize = 10_000 and maxTrackedTxHashesPerEOA = 15 look reasonable, but they’re opaque to a reader. A brief comment tying these to expected traffic patterns and the 10s window (or making them configurable) would make future tuning safer.


41-52: Top‑level comment could mention duplicate‑tx rejection

The BatchTxPool comment still describes batching/ordering behavior well, but it doesn’t mention the new “recent hash tracking + duplicate rejection” behavior. Adding a short sentence about rejecting recently submitted tx hashes would help future readers understand why eoaActivityCache exists.


129-142: Duplicate tx hash rejection logic is sound; consider observability

The slices.Contains(eoaActivity.txHashes, txHash) check correctly rejects re‑submitted txs for the same EOA and wraps them as errs.ErrInvalid. One thing you might want is a metric/log counter (e.g., a dedicated “duplicate txs rejected” metric, or treating them as dropped) so operators can see how noisy certain EOAs/bots are; right now they’re silent from a metrics perspective.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 62deb8b and c6b18b1.

📒 Files selected for processing (1)
  • services/requester/batch_tx_pool.go (5 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-06-19T11:36:25.478Z
Learnt from: m-Peter
Repo: onflow/flow-evm-gateway PR: 831
File: services/requester/batch_tx_pool.go:0-0
Timestamp: 2025-06-19T11:36:25.478Z
Learning: In Go, when copying maps that contain slices (like `map[gethCommon.Address][]pooledEvmTx`), perform deep copies by iterating over the map and copying each slice individually using `make()` and `copy()` to avoid shared references that could lead to race conditions and data corruption.

Applied to files:

  • services/requester/batch_tx_pool.go
📚 Learning: 2025-06-17T10:29:35.941Z
Learnt from: m-Peter
Repo: onflow/flow-evm-gateway PR: 831
File: services/requester/pool.go:202-214
Timestamp: 2025-06-17T10:29:35.941Z
Learning: In Flow EVM Gateway's transaction pool (services/requester/pool.go), signing keys acquired with keystore.Take() are intentionally held until the transaction is sealed, not released immediately after sending. The key release happens asynchronously when sealed transactions are ingested via event_subscriber.go NotifyTransaction() -> keystore unsafeUnlockKey() -> key.Done(). This pattern ensures transaction integrity by preventing key reuse before transaction finalization.

Applied to files:

  • services/requester/batch_tx_pool.go
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Test
🔇 Additional comments (7)
services/requester/batch_tx_pool.go (7)

6-6: fmt import usage looks correct

fmt is used only for the wrapped invalid‑tx error; no issues here.


36-39: Per‑EOA activity metadata struct is well‑factored

Bundling lastSubmission and txHashes into eoaActivityMetadata keeps the Add() logic clear, and the zero value is safe on cache misses. No changes needed.


56-59: BatchTxPool fields align with intended behavior

pooledTxs + txMux for batching and eoaActivityCache for per‑EOA activity/duplicate tracking is a clean separation of concerns. No structural or concurrency issues spotted here.


61-61: Nice use of interface assertion idiom

var _ TxPool = (*BatchTxPool)(nil) is the preferred compile‑time check; good improvement.


88-98: Verify EOAActivityCache TTL is compatible with batching interval

eoaActivityCache uses config.EOAActivityCacheTTL while batching decisions use config.TxBatchInterval. To avoid surprising behavior (e.g., metadata expiring while we still expect batching / duplicate protection), it would be good to ensure EOAActivityCacheTTL >= TxBatchInterval and document that contract in config or a comment.


161-188: Add() batching vs single‑submission control flow looks correct

The three scenarios based on found and time.Since(eoaActivity.lastSubmission) are consistent, and the hasBatch pattern correctly appends to any existing batch while falling back to a single submission when none exists. err handling is safe in all branches (only set when calling submitSingleTransaction, and checked once afterward).


193-203: Metadata update and hash list trimming behave as intended

Updating eoaActivity only after a successful add/submit and then appending txHash (while trimming to the last maxTrackedTxHashesPerEOA) matches the “track last N recent hashes” requirement and keeps memory bounded. The logic to drop the oldest entry via txHashes[1:] is simple and correct given the small fixed N.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
services/requester/batch_tx_pool.go (1)

162-178: Minor TOCTOU window in batch checking.

There's a small window between releasing the lock (line 172) and calling submitSingleTransaction (line 177) where another goroutine could add transactions to pooledTxs[from]. However, the impact is minimal—worst case is the transaction gets submitted individually when it could have been batched, which doesn't affect correctness.

If you want to tighten this, you could hold the lock during submitSingleTransaction for Case 2, or restructure to check-and-submit atomically. Given the low impact, this is optional.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 856da05 and 33d6624.

📒 Files selected for processing (2)
  • services/requester/batch_tx_pool.go (5 hunks)
  • tests/tx_batching_test.go (6 hunks)
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-06-19T11:36:25.478Z
Learnt from: m-Peter
Repo: onflow/flow-evm-gateway PR: 831
File: services/requester/batch_tx_pool.go:0-0
Timestamp: 2025-06-19T11:36:25.478Z
Learning: In Go, when copying maps that contain slices (like `map[gethCommon.Address][]pooledEvmTx`), perform deep copies by iterating over the map and copying each slice individually using `make()` and `copy()` to avoid shared references that could lead to race conditions and data corruption.

Applied to files:

  • services/requester/batch_tx_pool.go
📚 Learning: 2025-06-17T10:29:35.941Z
Learnt from: m-Peter
Repo: onflow/flow-evm-gateway PR: 831
File: services/requester/pool.go:202-214
Timestamp: 2025-06-17T10:29:35.941Z
Learning: In Flow EVM Gateway's transaction pool (services/requester/pool.go), signing keys acquired with keystore.Take() are intentionally held until the transaction is sealed, not released immediately after sending. The key release happens asynchronously when sealed transactions are ingested via event_subscriber.go NotifyTransaction() -> keystore unsafeUnlockKey() -> key.Done(). This pattern ensures transaction integrity by preventing key reuse before transaction finalization.

Applied to files:

  • services/requester/batch_tx_pool.go
  • tests/tx_batching_test.go
📚 Learning: 2025-11-12T13:21:59.080Z
Learnt from: m-Peter
Repo: onflow/flow-evm-gateway PR: 890
File: tests/integration_test.go:692-707
Timestamp: 2025-11-12T13:21:59.080Z
Learning: In the flow-evm-gateway integration tests (tests/integration_test.go), the gateway's RPC server continues to operate independently even after the context passed to bootstrap.Run() is cancelled. Context cancellation affects upstream dependencies (like the Emulator acting as an Access Node) but does not immediately shut down the RPC server, allowing tests to verify backup failover scenarios.

Applied to files:

  • tests/tx_batching_test.go
🧬 Code graph analysis (1)
services/requester/batch_tx_pool.go (4)
services/requester/single_tx_pool.go (1)
  • SingleTxPool (28-40)
services/requester/tx_pool.go (1)
  • TxPool (16-18)
api/pool.go (1)
  • TxPool (8-8)
models/errors/errors.go (1)
  • ErrInvalid (23-23)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Test
🔇 Additional comments (12)
services/requester/batch_tx_pool.go (8)

26-39: LGTM!

The constants and eoaActivityMetadata type are well-structured for tracking per-EOA activity and recent transaction nonces.


53-61: LGTM!

The struct fields and interface assertion are appropriate. Using (*BatchTxPool)(nil) for interface assertion is idiomatic Go.


88-98: LGTM!

The cache initialization properly uses eoaActivityMetadata and the configured TTL.


129-140: LGTM!

The duplicate nonce detection is straightforward and correct. Using slices.Contains is appropriate given the small max size of 15.


191-200: Verify metadata update happens only on success.

The metadata update correctly happens after confirming err == nil (line 187). The nonce trimming at lines 196-198 keeps only the most recent 15 nonces by removing from the front—this is intentional behavior to bound memory while still catching recent duplicates.


217-220: LGTM!

The map swap pattern is thread-safe here: by replacing t.pooledTxs with a new empty map under the lock, subsequent Add() calls operate on the new map while the batch processor iterates over the old one. Based on learnings, this avoids the deep-copy concern because no concurrent modifications occur on the same slices.


245-249: LGTM!

Sorting by nonce before batch submission ensures correct execution order regardless of arrival order or any prior reordering.


283-312: LGTM!

Clean implementation for single transaction submission with appropriate error handling and metrics recording.

tests/tx_batching_test.go (4)

335-335: LGTM!

Comment correctly updated to reflect the new 2.5-second batch interval.


517-568: LGTM!

The test correctly validates duplicate nonce rejection:

  • Submits nonces [0, 1, 2, 3, 2, 3, 4, 5] where 2 and 3 are intentional duplicates
  • Verifies exactly 2 errors with the expected messages
  • Confirms the 6 successful transactions are executed

This effectively covers the new duplicate submission protection.


607-622: Configuration changes align with production settings.

  • LocalIndexValidation is appropriate for integration tests
  • The 2.5-second batch interval matches mainnet
  • The 2-second sleep allows indexing to catch up

One minor concern: the fixed 2-second sleep could cause flakiness if indexing occasionally takes longer. Consider using Eventually or a readiness check if tests become flaky.


449-449: LGTM!

Comment correctly updated to match the 2.5-second batch interval.

// Reject transactions that have already been submitted,
// as they are *likely* to fail.
if found && slices.Contains(eoaActivity.txNonces, nonce) {
return fmt.Errorf(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we return an error here, or just silently drop the tx? What happens on a geth node?

IIRC, bots will commonly resubmit tx at a specific nonce to increase the gas price in an attempt to get it to be executed earlier in the block, or decrease the gas price to effectively cancel the tx. I'm not totally sure, but I suspect these bots would expect submitting the tx to succeed (even if it was never executed)

Copy link
Collaborator Author

@m-Peter m-Peter Nov 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, matching the functionality from Geth node, a silent drop seems more reasonable here. If we return an error, that might make bots try harder with more requests 😅

Updated in 85a2560 . Also added an info log line, so that we can track all such cases.

Comment on lines 192 to 193
eoaActivity.lastSubmission = time.Now()
eoaActivity.txNonces = append(eoaActivity.txNonces, nonce)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to protect these writes? Is it possible another request is concurrently updating/reading them?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, good point 👍 Added some lock-protection in 0b94bc1 .

const eoaActivityCacheSize = 10_000
const (
eoaActivityCacheSize = 10_000
maxTrackedTxNoncesPerEOA = 15
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is 15 enough? I think the data we'd need to cache is small enough we could use a larger value here without tying up too much memory.

Copy link
Collaborator Author

@m-Peter m-Peter Nov 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I've updated this value to 30 in 70b0bba .

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
services/requester/batch_tx_pool.go (1)

129-201: Race condition in read-modify-write of eoaActivityMetadata.

The metadata is read at line 129, modified at lines 192-193, and written back at line 201. If two goroutines process transactions from the same EOA concurrently, one's nonce update can be lost:

  1. Goroutine A: reads txNonces = [1, 2]
  2. Goroutine B: reads txNonces = [1, 2]
  3. Goroutine A: appends 3 → [1, 2, 3], writes to cache
  4. Goroutine B: appends 4 → [1, 2, 4], writes to cache (nonce 3 lost)

This also allows duplicates to slip through the check at line 134.

Consider protecting the entire read-modify-write sequence. One approach:

+var eoaActivityMux sync.Mutex // or use a per-EOA lock map for better concurrency

 func (t *BatchTxPool) Add(
 	ctx context.Context,
 	tx *gethTypes.Transaction,
 ) error {
 	// ... early code ...

+	eoaActivityMux.Lock()
+	defer eoaActivityMux.Unlock()
+
 	eoaActivity, found := t.eoaActivityCache.Get(from)
 	nonce := tx.Nonce()
 	// ... rest of the method ...

Alternatively, use a sync.Map keyed by EOA for per-address locking to maintain concurrency across different EOAs.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 33d6624 and 3833d6e.

📒 Files selected for processing (1)
  • services/requester/batch_tx_pool.go (5 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-06-19T11:36:25.478Z
Learnt from: m-Peter
Repo: onflow/flow-evm-gateway PR: 831
File: services/requester/batch_tx_pool.go:0-0
Timestamp: 2025-06-19T11:36:25.478Z
Learning: In Go, when copying maps that contain slices (like `map[gethCommon.Address][]pooledEvmTx`), perform deep copies by iterating over the map and copying each slice individually using `make()` and `copy()` to avoid shared references that could lead to race conditions and data corruption.

Applied to files:

  • services/requester/batch_tx_pool.go
📚 Learning: 2025-06-17T10:29:35.941Z
Learnt from: m-Peter
Repo: onflow/flow-evm-gateway PR: 831
File: services/requester/pool.go:202-214
Timestamp: 2025-06-17T10:29:35.941Z
Learning: In Flow EVM Gateway's transaction pool (services/requester/pool.go), signing keys acquired with keystore.Take() are intentionally held until the transaction is sealed, not released immediately after sending. The key release happens asynchronously when sealed transactions are ingested via event_subscriber.go NotifyTransaction() -> keystore unsafeUnlockKey() -> key.Done(). This pattern ensures transaction integrity by preventing key reuse before transaction finalization.

Applied to files:

  • services/requester/batch_tx_pool.go
🧬 Code graph analysis (1)
services/requester/batch_tx_pool.go (4)
services/requester/single_tx_pool.go (1)
  • SingleTxPool (28-40)
services/requester/tx_pool.go (1)
  • TxPool (16-18)
api/pool.go (1)
  • TxPool (8-8)
models/errors/errors.go (1)
  • ErrInvalid (23-23)
🔇 Additional comments (7)
services/requester/batch_tx_pool.go (7)

26-29: Constants look reasonable.

The cache size of 10,000 EOAs and 15 tracked nonces per EOA provide a good balance between memory usage and duplicate detection coverage for typical trading bot patterns.


36-39: Clean metadata struct design.

The eoaActivityMetadata struct effectively encapsulates both timing and nonce history tracking for duplicate detection.


88-98: Initialization looks correct.

The eoaActivityCache is properly initialized with configurable TTL, and the pool map is correctly allocated.


132-140: Verify intended behavior for replacement transactions.

Standard Ethereum behavior allows replacing a pending transaction by resubmitting the same nonce with a higher gas price. The current implementation rejects any transaction with a previously-seen nonce, which blocks legitimate replacement transactions.

If this is intentional to reduce bot noise, consider documenting this deviation from standard behavior. Otherwise, you could compare gas prices before rejecting.


159-185: Branching logic is sound.

The three cases are handled correctly, and the mutex protects pooledTxs modifications appropriately. The case 2 handling (checking for existing batch before deciding to submit individually) prevents transaction ordering issues.


218-221: Safe map swap pattern.

The swap-and-replace pattern correctly avoids the need for deep copying - after line 220, the old slices in txsGroupedByAddress are exclusively owned by this goroutine while Add() works with the fresh empty map. Based on learnings about map copy safety, this approach is correct.


246-250: Nonce-based sorting ensures correct execution order.

Sorting pooled transactions by nonce before batch submission is essential for proper EVM execution semantics.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
services/requester/batch_tx_pool.go (1)

190-192: Nonce marked as submitted before batch actually executes.

For batched transactions, the nonce is recorded at line 192, but the actual Flow transaction is sent later in processPooledTransactions (line 223). If batch submission fails (e.g., at line 228), the nonce remains tracked, causing legitimate retry attempts to be rejected until the cache entry expires (15s).

This was raised in past reviews. Consider updating the error message to be more accurate:

 	if found && slices.Contains(eoaActivity.txNonces, nonce) {
+		t.logger.Info().
+			Str("evm_tx", tx.Hash().Hex()).
+			Str("from", from.Hex()).
+			Uint64("nonce", nonce).
-			Msg("tx with same nonce has been already submitted")
+			Msg("tx with same nonce was recently submitted or is pending batch execution")
 		return nil
 	}

Based on learnings, past review feedback on this concern.

🧹 Nitpick comments (1)
services/requester/batch_tx_pool.go (1)

34-37: Consider documenting the concurrency model.

The eoaActivityMetadata struct is read and written without explicit locking. While the LRU cache itself is thread-safe, the pattern of Get-modify-Add creates a race window where concurrent requests for the same EOA could overwrite each other's nonce updates.

Add a comment clarifying the concurrency expectations:

+// eoaActivityMetadata tracks recent transaction activity for an EOA.
+// Note: Updates are not atomic across Get/Add operations. Concurrent
+// submissions from the same EOA may result in lost nonce tracking, but
+// this is acceptable as it only affects the duplicate detection heuristic.
 type eoaActivityMetadata struct {
 	lastSubmission time.Time
 	txNonces       []uint64
 }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3833d6e and df23310.

📒 Files selected for processing (5)
  • cmd/run/cmd.go (1 hunks)
  • services/requester/batch_tx_pool.go (4 hunks)
  • services/requester/requester.go (1 hunks)
  • services/requester/single_tx_pool.go (3 hunks)
  • tests/tx_batching_test.go (6 hunks)
✅ Files skipped from review due to trivial changes (1)
  • services/requester/requester.go
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-06-17T10:29:35.941Z
Learnt from: m-Peter
Repo: onflow/flow-evm-gateway PR: 831
File: services/requester/pool.go:202-214
Timestamp: 2025-06-17T10:29:35.941Z
Learning: In Flow EVM Gateway's transaction pool (services/requester/pool.go), signing keys acquired with keystore.Take() are intentionally held until the transaction is sealed, not released immediately after sending. The key release happens asynchronously when sealed transactions are ingested via event_subscriber.go NotifyTransaction() -> keystore unsafeUnlockKey() -> key.Done(). This pattern ensures transaction integrity by preventing key reuse before transaction finalization.

Applied to files:

  • services/requester/single_tx_pool.go
  • tests/tx_batching_test.go
📚 Learning: 2025-11-12T13:21:59.080Z
Learnt from: m-Peter
Repo: onflow/flow-evm-gateway PR: 890
File: tests/integration_test.go:692-707
Timestamp: 2025-11-12T13:21:59.080Z
Learning: In the flow-evm-gateway integration tests (tests/integration_test.go), the gateway's RPC server continues to operate independently even after the context passed to bootstrap.Run() is cancelled. Context cancellation affects upstream dependencies (like the Emulator acting as an Access Node) but does not immediately shut down the RPC server, allowing tests to verify backup failover scenarios.

Applied to files:

  • tests/tx_batching_test.go
🧬 Code graph analysis (2)
services/requester/single_tx_pool.go (2)
services/requester/tx_pool.go (1)
  • TxPool (16-18)
api/pool.go (1)
  • TxPool (8-8)
tests/tx_batching_test.go (1)
config/config.go (2)
  • TxStateValidation (37-37)
  • LocalIndexValidation (40-40)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Test
🔇 Additional comments (13)
cmd/run/cmd.go (2)

293-293: TTL increase extends the duplicate detection window.

The default increased from 10s to 15s, meaning transactions from the same EOA with the same nonce will be rejected as duplicates for a longer period. This aligns with the batching interval changes and helps reduce noise from trading bots, but ensure this duration is appropriate for your use case.


296-299: LGTM! Defensive error handling added.

The panic ensures any misconfiguration in flag deprecation is caught early during initialization.

tests/tx_batching_test.go (4)

335-335: LGTM! Comments updated to match configuration.

The interval references now correctly reflect the 2.5-second TxBatchInterval used in tests.

Also applies to: 449-449


517-517: LGTM! Clearer test name.

The new name better conveys that the test verifies rejection of previously submitted transactions.


592-592: LGTM! Validation mode appropriate for batch testing.

Using LocalIndexValidation is correct for batch mode, as it validates against the local state index without waiting for transaction sealing.


594-594: Test timing setup looks reasonable.

The 2.5-second batch interval matches mainnet, and the 2-second sleep allows the local index to catch up. If this becomes flaky on slower systems, consider using assert.Eventually instead of a fixed sleep.

Also applies to: 605-607

services/requester/batch_tx_pool.go (4)

24-27: LGTM! Reasonable cache sizing.

The cache can track 10,000 EOAs with up to 30 recent nonces each, providing good coverage without excessive memory usage.


161-184: LGTM! Batching logic handles pending batches correctly.

The check at line 166 ensures that if an EOA has pending transactions in the batch pool (due to congestion), subsequent transactions are added to that batch rather than submitted individually, maintaining proper ordering.


190-202: EOA metadata update happens after successful submission.

The metadata is only updated after successful add/submit (line 186 error check), which is correct. The pruning logic properly maintains the most recent maxTrackedTxNoncesPerEOA entries.

However, as noted in past reviews, concurrent updates to the same EOA's metadata are not atomic. The Get-modify-Add pattern (lines 127, 191-192, 200) can lead to lost updates if two requests for the same EOA interleave. This is acceptable for a best-effort duplicate detection heuristic but should be documented.


127-139: Duplicate transaction rejection is intentional—clarify whether this aligns with your API expectations.

The code silently drops transactions with duplicate nonces. The test Test_TransactionSubmissionWithPreviouslySubmittedTransactions explicitly validates this by sending nonces [0, 1, 2, 3, 2, 3, 4, 5] and expecting only 6 unique transactions to succeed.

This behavior differs from standard Ethereum nodes, which allow replacement transactions (same nonce, higher gas price). However, the code comment ("as they are likely to fail") indicates this is intentional design for the Flow EVM gateway—not a bug. The implementation provides no mechanism for gas price comparison, as pooledEvmTx stores only the nonce, not gas price.

Before treating this as a bug, confirm:

  • Is transaction replacement a required feature for your use case?
  • If yes, this would require structural changes (storing gas price, comparing replacements).
  • If no, add clarification to the code comment that replacement transactions are intentionally not supported.
services/requester/single_tx_pool.go (3)

42-42: LGTM! Cleaner interface assertion style.

Using (*SingleTxPool)(nil) is preferred as it avoids allocation compared to &SingleTxPool{}.


152-153: LGTM! Log field naming consistency improved.

The field names now use underscores and clearer suffixes (flow_tx, evm_tx), consistent with logging elsewhere in the codebase.


190-194: LGTM! Helpful concurrency comment and code simplification.

The comment clarifies that keystore.Take() uses channel-based access, eliminating the need for additional locking. The direct call simplifies the code by removing the wrapper function.

Based on learnings, the keystore pattern intentionally holds keys until transaction sealing.

@m-Peter m-Peter force-pushed the mpeter/submitted-tx-validations branch from bb0e2a5 to 0b94bc1 Compare November 30, 2025 14:23
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
tests/tx_batching_test.go (1)

517-553: Test relies on implicit validation via balance check.

The test correctly expects no errors from sendRawTx since duplicate nonce transactions are now silently dropped (per the updated BatchTxPool.Add behavior). The expectedBalance = 6 * transferAmount assertion implicitly validates that only 6 unique nonce transactions were executed.

This approach is valid given the silent-drop behavior. For additional observability in test failures, consider adding a log or comment clarifying that indices 4 and 5 (nonces 2 and 3) are expected to be silently dropped.

services/requester/batch_tx_pool.go (1)

127-139: Potential race in duplicate detection.

The duplicate check at line 127 occurs without holding txMux, while the metadata update at lines 198-208 happens under the lock. Two concurrent Add() calls for the same EOA and nonce could both pass the duplicate check before either updates the cache, resulting in duplicate submissions.

Given this is a best-effort optimization to reduce (not guarantee elimination of) duplicate transactions, the race window is small, and adding a lock around the entire Add() method could impact throughput, this may be acceptable. However, if stricter deduplication is desired:

 func (t *BatchTxPool) Add(
 	ctx context.Context,
 	tx *gethTypes.Transaction,
 ) error {
 	t.txPublisher.Publish(tx) // publish pending transaction event
+	t.txMux.Lock()
+	defer t.txMux.Unlock()

 	from, err := models.DeriveTxSender(tx)

This would serialize all Add() calls but guarantee no duplicate submissions slip through.

Also applies to: 194-210

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bb0e2a5 and 0b94bc1.

📒 Files selected for processing (5)
  • api/utils.go (0 hunks)
  • services/requester/batch_tx_pool.go (7 hunks)
  • services/requester/requester.go (1 hunks)
  • services/requester/single_tx_pool.go (5 hunks)
  • tests/tx_batching_test.go (6 hunks)
💤 Files with no reviewable changes (1)
  • api/utils.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • services/requester/requester.go
  • services/requester/single_tx_pool.go
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-11-12T13:21:59.080Z
Learnt from: m-Peter
Repo: onflow/flow-evm-gateway PR: 890
File: tests/integration_test.go:692-707
Timestamp: 2025-11-12T13:21:59.080Z
Learning: In the flow-evm-gateway integration tests (tests/integration_test.go), the gateway's RPC server continues to operate independently even after the context passed to bootstrap.Run() is cancelled. Context cancellation affects upstream dependencies (like the Emulator acting as an Access Node) but does not immediately shut down the RPC server, allowing tests to verify backup failover scenarios.

Applied to files:

  • tests/tx_batching_test.go
📚 Learning: 2025-06-17T10:29:35.941Z
Learnt from: m-Peter
Repo: onflow/flow-evm-gateway PR: 831
File: services/requester/pool.go:202-214
Timestamp: 2025-06-17T10:29:35.941Z
Learning: In Flow EVM Gateway's transaction pool (services/requester/pool.go), signing keys acquired with keystore.Take() are intentionally held until the transaction is sealed, not released immediately after sending. The key release happens asynchronously when sealed transactions are ingested via event_subscriber.go NotifyTransaction() -> keystore unsafeUnlockKey() -> key.Done(). This pattern ensures transaction integrity by preventing key reuse before transaction finalization.

Applied to files:

  • tests/tx_batching_test.go
📚 Learning: 2025-03-07T01:35:09.751Z
Learnt from: peterargue
Repo: onflow/flow-evm-gateway PR: 772
File: services/requester/keystore/key_store.go:50-62
Timestamp: 2025-03-07T01:35:09.751Z
Learning: In the flow-evm-gateway codebase, panics are acceptable in scenarios where immediate detection of critical bugs is desired during development and testing, particularly for invariant violations that should never occur in a correctly functioning system (e.g., when a key is available but locked in the keystore implementation).

Applied to files:

  • services/requester/batch_tx_pool.go
🧬 Code graph analysis (2)
tests/tx_batching_test.go (1)
config/config.go (2)
  • TxStateValidation (37-37)
  • LocalIndexValidation (40-40)
services/requester/batch_tx_pool.go (3)
services/requester/single_tx_pool.go (1)
  • SingleTxPool (28-40)
services/requester/tx_pool.go (1)
  • TxPool (16-18)
api/pool.go (1)
  • TxPool (8-8)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Test
🔇 Additional comments (10)
tests/tx_batching_test.go (3)

335-335: LGTM!

Comment correctly updated to reflect the 2.5 seconds TxBatchInterval configuration used in E2E tests.


449-449: LGTM!

Consistent documentation update for the TxBatchInterval value.


592-594: LGTM!

Configuration updates align with the PR objectives:

  • LocalIndexValidation enables local index-based transaction validation.
  • TxBatchInterval of 2.5 seconds matches mainnet behavior as noted in the comment.
services/requester/batch_tx_pool.go (7)

24-37: LGTM!

Well-structured constants and metadata type:

  • eoaActivityCacheSize of 10,000 provides reasonable capacity.
  • maxTrackedTxNoncesPerEOA of 30 (updated from 15 per reviewer feedback) balances memory usage with duplicate detection window.
  • eoaActivityMetadata cleanly encapsulates the tracking data.

51-100: LGTM!

The BatchTxPool struct and constructor are well-organized:

  • Clean embedding of SingleTxPool for shared functionality.
  • Proper initialization of the expirable LRU cache with configurable TTL.
  • Background goroutine for batch processing is correctly started.

127-139: LGTM!

Duplicate nonce detection with silent drop:

  • Matches geth behavior for transaction replacement scenarios (per PR discussion).
  • Info-level logging provides observability for tracking such cases.
  • Linear search on 30 nonces is negligible overhead.

158-184: LGTM!

The three-case handling logic is well-structured:

  • Case 1 & 2 (no batch): Submit individually for immediate execution.
  • Case 2 (with pending batch): Append to existing batch to maintain ordering.
  • Case 3 (recent activity): Pool for batching.

The mutex correctly protects pooledTxs access in each case.


237-241: LGTM!

Error logging correctly identifies the EOA address for debugging batch submission failures. The continue behavior appropriately processes remaining addresses despite individual failures.


284-289: LGTM!

Consistent error handling with appropriate metric recording for dropped transactions on send failure.


318-323: LGTM!

Consistent error handling pattern with the batch submission path. Single transaction drop is correctly recorded on send failure.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
services/requester/batch_tx_pool.go (3)

127-139: Nonce-only deduplication changes replacement semantics; confirm this is acceptable

The dedup check drops any subsequent tx from an EOA that reuses a tracked nonce, regardless of payload (gas price, data, etc.). This is great for suppressing noisy “nonce too low” bot spam, but it also disables geth-style replacement / gas-bump semantics at the gateway level.

If that trade-off is intentional, this is fine; if you want something closer to geth behavior later, you may need to:

  • Include tx hash (or at least a hash-of-core-fields) in the metadata, and/or
  • Allow higher-fee replacements while still blocking exact duplicates.

For now, I’d suggest documenting this behavior clearly in operator/user-facing docs so the deviation from geth expectations is explicit.


186-193: Consider including structured fields for failed single submissions

The error log only embeds the EOA in the message. For debugging noisy or failing clients it may help to add structured fields (e.g. evm_tx, from, nonce) instead of relying on formatted strings, to match the structured logging style you use elsewhere.


238-245: TransactionsDropped metrics semantics with retries

In the batched path, batchSubmitTransactionsForSameAddress increments TransactionsDropped(len(hexEncodedTxs)) on both build and send errors, but processPooledTransactions then requeues the same pooledTxs slice for retry. That means txs which eventually succeed will still be counted as “dropped”, potentially inflating this metric.

In the single-submit path, by contrast, failures are final drops, so the metric is accurate there.

If TransactionsDropped is meant to represent definitive loss, consider:

  • Only incrementing it when you decide not to requeue, or
  • Introducing a separate metric (e.g. “send failures” or “batch retries”) for transient errors while leaving TransactionsDropped for permanent ones.

This is observability-only; behavior is otherwise correct.

Also applies to: 288-292, 322-326

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0b94bc1 and e103d57.

📒 Files selected for processing (1)
  • services/requester/batch_tx_pool.go (7 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-03-07T01:35:09.751Z
Learnt from: peterargue
Repo: onflow/flow-evm-gateway PR: 772
File: services/requester/keystore/key_store.go:50-62
Timestamp: 2025-03-07T01:35:09.751Z
Learning: In the flow-evm-gateway codebase, panics are acceptable in scenarios where immediate detection of critical bugs is desired during development and testing, particularly for invariant violations that should never occur in a correctly functioning system (e.g., when a key is available but locked in the keystore implementation).

Applied to files:

  • services/requester/batch_tx_pool.go
📚 Learning: 2025-06-17T10:29:35.941Z
Learnt from: m-Peter
Repo: onflow/flow-evm-gateway PR: 831
File: services/requester/pool.go:202-214
Timestamp: 2025-06-17T10:29:35.941Z
Learning: In Flow EVM Gateway's transaction pool (services/requester/pool.go), signing keys acquired with keystore.Take() are intentionally held until the transaction is sealed, not released immediately after sending. The key release happens asynchronously when sealed transactions are ingested via event_subscriber.go NotifyTransaction() -> keystore unsafeUnlockKey() -> key.Done(). This pattern ensures transaction integrity by preventing key reuse before transaction finalization.

Applied to files:

  • services/requester/batch_tx_pool.go
🧬 Code graph analysis (1)
services/requester/batch_tx_pool.go (3)
services/requester/single_tx_pool.go (1)
  • SingleTxPool (28-40)
services/requester/tx_pool.go (1)
  • TxPool (16-18)
api/pool.go (1)
  • TxPool (8-8)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Test
🔇 Additional comments (3)
services/requester/batch_tx_pool.go (3)

24-27: Per-EOA activity cache and metadata struct look sound

Bounded cache size and per-EOA nonce history give a predictable memory footprint while supporting recent-activity checks; the eoaActivityMetadata shape aligns with how you use it in Add. No changes requested here.

Also applies to: 34-37


161-177: Batch vs single submission decision flow reads correctly

The split between:

  • “no activity” → single submit,
  • “activity older than TxBatchInterval” → reuse existing batch if present, otherwise single submit,
  • “activity within TxBatchInterval” → pool for batching,

matches the comments and seems like a good balance between latency and batching efficiency, especially with the extra check to attach to an in-flight batch when it still exists.

Also applies to: 180-184


54-57: Batch pool struct wiring and cache construction look correct

Embedding SingleTxPool, wiring pooledTxs with txMux, and constructing eoaActivityCache via expirable.NewLRU all line up with how the type is later used. The var _ TxPool = (*BatchTxPool)(nil) assertion is also the idiomatic pointer-style check.

Also applies to: 59-59, 86-96

Comment on lines 322 to 323
t.txMux.Lock()
defer t.txMux.Unlock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the eoaActivity object is still read outside of the lock in Add(). I think we need to add the mutex to eoaActivityMetadata and aquire it each time we need to interact with the object.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this approach, but got a rather serious warning:

call of t.eoaActivityCache.Add copies lock value:
github.com/onflow/flow-evm-gateway/services/requester.eoaActivityMetadata contains sync.Mutex

In 41c7839, I changed the lock acquisition to what we had before.

This ensures that only one requests will:

  • read the EOA activity from the cache
  • submit the incoming transaction
  • update the EOA activity with the submitted transaction nonce

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
services/requester/batch_tx_pool.go (1)

127-142: Verify thread safety and consider the tracking timing.

The txMux lock protects this critical section, but there's a subtle concern: eoaActivityCache.Get(from) returns the struct by value, meaning the txNonces slice header in the returned copy points to the same backing array as other copies. Later, at line 324 in updateEOAActivityMetadata, you append to this slice, potentially modifying or reallocating that shared backing array.

While the lock held throughout Add() prevents concurrent modification for the same EOA, a defensive approach is to treat txNonces as immutable and always copy before mutating. This aligns with the pattern for deep-copying maps with slices to avoid shared references.

Additionally, note that for batched transactions, the nonce is marked as "submitted" at line 193 before the batch is actually sent in processPooledTransactions. If batch submission fails (lines 226-230), the nonce remains tracked until TTL expiration, causing retries to be rejected. The re-queueing logic at lines 229-230 mitigates this, but retries from the client would still be dropped during the TTL window.

Based on learnings, deep copying slices in map values prevents race conditions.

Consider applying the defensive slice copy pattern in updateEOAActivityMetadata:

 func (t *BatchTxPool) updateEOAActivityMetadata(
 	from gethCommon.Address,
 	nonce uint64,
 ) {
 	// Update metadata for the last EOA activity only on successful add/submit.
 	eoaActivity, _ := t.eoaActivityCache.Get(from)
 	eoaActivity.lastSubmission = time.Now()
-	eoaActivity.txNonces = append(eoaActivity.txNonces, nonce)
+
+	// Copy existing slice before appending to avoid shared backing arrays.
+	newTxNonces := make([]uint64, len(eoaActivity.txNonces), len(eoaActivity.txNonces)+1)
+	copy(newTxNonces, eoaActivity.txNonces)
+	newTxNonces = append(newTxNonces, nonce)
+
 	// To avoid the slice of nonces from growing indefinitely,
 	// keep only the last `maxTrackedTxNoncesPerEOA` nonces.
-	if len(eoaActivity.txNonces) > maxTrackedTxNoncesPerEOA {
-		firstKeep := len(eoaActivity.txNonces) - maxTrackedTxNoncesPerEOA
-		eoaActivity.txNonces = eoaActivity.txNonces[firstKeep:]
+	if len(newTxNonces) > maxTrackedTxNoncesPerEOA {
+		firstKeep := len(newTxNonces) - maxTrackedTxNoncesPerEOA
+		newTxNonces = newTxNonces[firstKeep:]
 	}
 
+	eoaActivity.txNonces = newTxNonces
 	t.eoaActivityCache.Add(from, eoaActivity)
 }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cc3bad3 and 41c7839.

📒 Files selected for processing (1)
  • services/requester/batch_tx_pool.go (7 hunks)
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-03-07T01:35:09.751Z
Learnt from: peterargue
Repo: onflow/flow-evm-gateway PR: 772
File: services/requester/keystore/key_store.go:50-62
Timestamp: 2025-03-07T01:35:09.751Z
Learning: In the flow-evm-gateway codebase, panics are acceptable in scenarios where immediate detection of critical bugs is desired during development and testing, particularly for invariant violations that should never occur in a correctly functioning system (e.g., when a key is available but locked in the keystore implementation).

Applied to files:

  • services/requester/batch_tx_pool.go
📚 Learning: 2025-06-19T11:36:25.478Z
Learnt from: m-Peter
Repo: onflow/flow-evm-gateway PR: 831
File: services/requester/batch_tx_pool.go:0-0
Timestamp: 2025-06-19T11:36:25.478Z
Learning: In Go, when copying maps that contain slices (like `map[gethCommon.Address][]pooledEvmTx`), perform deep copies by iterating over the map and copying each slice individually using `make()` and `copy()` to avoid shared references that could lead to race conditions and data corruption.

Applied to files:

  • services/requester/batch_tx_pool.go
📚 Learning: 2025-06-17T10:29:35.941Z
Learnt from: m-Peter
Repo: onflow/flow-evm-gateway PR: 831
File: services/requester/pool.go:202-214
Timestamp: 2025-06-17T10:29:35.941Z
Learning: In Flow EVM Gateway's transaction pool (services/requester/pool.go), signing keys acquired with keystore.Take() are intentionally held until the transaction is sealed, not released immediately after sending. The key release happens asynchronously when sealed transactions are ingested via event_subscriber.go NotifyTransaction() -> keystore unsafeUnlockKey() -> key.Done(). This pattern ensures transaction integrity by preventing key reuse before transaction finalization.

Applied to files:

  • services/requester/batch_tx_pool.go
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Test
🔇 Additional comments (10)
services/requester/batch_tx_pool.go (10)

24-27: LGTM!

The cache size and nonce tracking limits are well-balanced for production use.


34-37: Struct design is clean.

The metadata struct appropriately tracks per-EOA submission state. Since this is stored in a cache that returns by value, slice mutations must be handled carefully to avoid shared backing arrays.


51-57: LGTM!

The updated struct fields and type changes properly support the new nonce-based duplicate detection with per-EOA metadata tracking.


59-59: LGTM!

The pointer-type assertion is correct for the interface implementation.


86-96: LGTM!

The cache initialization with TTL-based expiration provides proper automatic cleanup of stale EOA metadata.


164-178: LGTM!

The logic correctly handles the edge case where pooled transactions haven't been processed yet due to congestion, ensuring the current transaction joins that batch to maintain ordering guarantees.


185-195: Error handling structure is correct.

The metadata update at line 193 correctly happens only after successful single transaction submission (error check at line 185). For batched transactions, the metadata is updated immediately upon adding to the batch, which is before actual submission—this timing is acceptable given the re-queueing logic for failed batches.


222-230: Re-queueing logic is sound.

The error handling correctly re-queues failed batches for retry while protecting the pooledTxs map with txMux. Note that the nonces from these transactions are already tracked in eoaActivityCache (line 193), so duplicate submissions from clients during the TTL window will be rejected even while the batch is being retried.


273-276: LGTM!

The updated comments clearly document that dropped transaction metrics are recorded for both batch and single submission failures.

Also applies to: 308-310


317-333: Function logic is correct, but see earlier comment about defensive copying.

The metadata update logic correctly maintains the nonce history and enforces the size limit. The slice mutation concern was addressed in the earlier comment about lines 127-142.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
services/requester/batch_tx_pool.go (1)

283-337: Async pattern doesn't reduce lock hold time.

The comment at line 290 states the lock shouldn't be held long, but the goroutine + select pattern at lines 296-334 still blocks the caller (holding txMux) for up to 3 seconds. This doesn't reduce lock contention; it just adds complexity. The async pattern would only help if the lock were released before waiting.

Consider either:

  1. Release txMux before calling submitSingleTransaction (requires careful ordering of metadata updates)
  2. Make this truly non-blocking by removing the wait (but then error handling becomes more complex)

The current implementation works but defeats the purpose of the async pattern stated in the comments.

🧹 Nitpick comments (1)
services/requester/batch_tx_pool.go (1)

339-355: Consider defensive copying of txNonces slice to prevent potential races.

While the current txMux locking makes this safe, the in-place mutation at lines 346 and 351 shares the underlying array across cache copies. Based on learnings about slice handling in concurrent map access, consider using immutable slice updates:

// Copy existing slice before appending to avoid shared references
newTxNonces := make([]uint64, len(eoaActivity.txNonces), len(eoaActivity.txNonces)+1)
copy(newTxNonces, eoaActivity.txNonces)
newTxNonces = append(newTxNonces, nonce)

// Prune if needed
if len(newTxNonces) > maxTrackedTxNoncesPerEOA {
	firstKeep := len(newTxNonces) - maxTrackedTxNoncesPerEOA
	prunedNonces := make([]uint64, len(newTxNonces)-firstKeep)
	copy(prunedNonces, newTxNonces[firstKeep:])
	newTxNonces = prunedNonces
}

eoaActivity.txNonces = newTxNonces

This prevents future refactoring from inadvertently introducing races if the locking strategy changes.

Based on learnings, deep copying slices in concurrent map access is a best practice.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6ccf8db and 3582554.

📒 Files selected for processing (1)
  • services/requester/batch_tx_pool.go (7 hunks)
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-03-07T01:35:09.751Z
Learnt from: peterargue
Repo: onflow/flow-evm-gateway PR: 772
File: services/requester/keystore/key_store.go:50-62
Timestamp: 2025-03-07T01:35:09.751Z
Learning: In the flow-evm-gateway codebase, panics are acceptable in scenarios where immediate detection of critical bugs is desired during development and testing, particularly for invariant violations that should never occur in a correctly functioning system (e.g., when a key is available but locked in the keystore implementation).

Applied to files:

  • services/requester/batch_tx_pool.go
📚 Learning: 2025-06-19T11:36:25.478Z
Learnt from: m-Peter
Repo: onflow/flow-evm-gateway PR: 831
File: services/requester/batch_tx_pool.go:0-0
Timestamp: 2025-06-19T11:36:25.478Z
Learning: In Go, when copying maps that contain slices (like `map[gethCommon.Address][]pooledEvmTx`), perform deep copies by iterating over the map and copying each slice individually using `make()` and `copy()` to avoid shared references that could lead to race conditions and data corruption.

Applied to files:

  • services/requester/batch_tx_pool.go
📚 Learning: 2025-06-17T10:29:35.941Z
Learnt from: m-Peter
Repo: onflow/flow-evm-gateway PR: 831
File: services/requester/pool.go:202-214
Timestamp: 2025-06-17T10:29:35.941Z
Learning: In Flow EVM Gateway's transaction pool (services/requester/pool.go), signing keys acquired with keystore.Take() are intentionally held until the transaction is sealed, not released immediately after sending. The key release happens asynchronously when sealed transactions are ingested via event_subscriber.go NotifyTransaction() -> keystore unsafeUnlockKey() -> key.Done(). This pattern ensures transaction integrity by preventing key reuse before transaction finalization.

Applied to files:

  • services/requester/batch_tx_pool.go
🧬 Code graph analysis (1)
services/requester/batch_tx_pool.go (3)
services/requester/single_tx_pool.go (1)
  • SingleTxPool (28-40)
services/requester/tx_pool.go (1)
  • TxPool (16-18)
api/pool.go (1)
  • TxPool (8-8)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Test
🔇 Additional comments (7)
services/requester/batch_tx_pool.go (7)

24-27: LGTM!

The cache size and nonce tracking limits are well-chosen for handling high-frequency bot traffic while maintaining reasonable memory usage.


34-37: LGTM!

The metadata structure appropriately captures both temporal and nonce-based submission history for duplicate detection.


51-57: LGTM!

The struct composition appropriately extends SingleTxPool with batching capabilities and per-EOA activity tracking.


61-101: LGTM!

The initialization properly sets up the batch pool with the LRU cache and background processing goroutine.


130-142: LGTM! Duplicate detection correctly implemented.

The nonce-based duplicate detection appropriately rejects transactions that have been recently submitted from the same EOA, reducing redundant work. The silent drop with info logging matches typical node behavior for duplicate submissions.


161-183: LGTM! Batching logic correctly handles multiple submission scenarios.

The three-case logic appropriately decides between individual submission and batching based on EOA activity timing. The addition at lines 165-172 correctly handles the edge case where an EOA has pending batched transactions even though the batch interval has elapsed.


221-231: Verify retry semantics don't cause persistent duplicate rejections.

The retry mechanism re-queues failed batches, but nonces were already marked "submitted" at line 193. If batch submission repeatedly fails, new incoming transactions with the same nonce will be rejected until cache entries expire (per TTL). Consider logging the re-queue event to track retry frequency.

Based on past review discussions, this trade-off may be acceptable given the TTL-based expiration, but verify that retry failures are monitored.

Comment on lines +127 to +128
t.txMux.Lock()
defer t.txMux.Unlock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need a global lock? can we just put the mutex in eoaActivityMetadata and lock per entry? that would significantly reduce contention in the API.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we just put the mutex in eoaActivityMetadata and lock per entry?

Right now, the cache is of type *expirable.LRU[gethCommon.Address, eoaActivityMetadata], so it complains when I add the mutex to eoaActivityMetadata, with:

call of t.eoaActivityCache.Add copies lock value: requester.eoaActivityMetadata contains sync.Mutex

That can be solved by changing the cache type to *eoaActivityMetadata instead of simply eoaActivityMetadata.

that would significantly reduce contention in the API.

Looking at the response times of eth_sendRawTransaction on Grafana, it certainly does now. Average response time is about 130ms, which is pretty fast.

do we need a global lock?

It seems to be necessary, in order to tackle the case of concurrent requests from the same EOA, with the same nonce. This is a case that I still see happening on MN & TN.
But I think it's worth investigating your suggestion of having the mutex in eoaActivityMetadata and lock per entry. This will only slow down requests from the same sender, and it will not affect normal users. The global lock that I have now seems that it will affect all users regardless.

defer cancel()

// build & submit transaction
go func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to run this in a separate gorountine? It looks like it's blocking anyway

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's definitely blocking. The problem lies in the fact that it will run until it succeeds, or until it fails with:

client: rpc error: code = Canceled desc = context canceled

and in some cases this may take any number of seconds, e.g. 10 seconds.

For some reason, it does not respect the context.WithTimeout(ctx, time.Second*3), for any duration value.
It runs for as long as it needs, and simply errors out with:

client: rpc error: code = DeadlineExceeded desc = context deadline exceeded

I tested this out locally, by adding:

time.Sleep(time.Second * 10)

on the SendTransaction method of flow-go-sdk.

As a work-around to this, I let it run in a separate goroutine, and if <-ctx.Done() happens first, it means that the configured timeout for the ctx has already passed, and we can abort the execution.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, I see. in that case, please add some comments explaining why it's needed. it wasn't obvious to me.

return err
// This method is called while holding the `t.txMux` lock,
// don't let it run for a long time, to avoid lock-contention
ctx, cancel := context.WithTimeout(ctx, time.Second*3)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

having a timeout is a good idea, and I think we could reduce contention by changing the mutex's scope

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we should use a lower timeout duration here? It makes use of SendTransaction & GetAccountKeyAtLatestBlock AN gRPC endpoints.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the AN uses 3s for its request to LNs, so you want it to be at least 3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add additional TX validations to reject invalid txs

3 participants