Skip to content

feat(sync): decode ERC-20/ERC-721 transfers + drop orphan logs#46

Merged
satyakwok merged 1 commit into
mainfrom
fix/erc20-transfer-handler-clean
May 20, 2026
Merged

feat(sync): decode ERC-20/ERC-721 transfers + drop orphan logs#46
satyakwok merged 1 commit into
mainfrom
fix/erc20-transfer-handler-clean

Conversation

@satyakwok
Copy link
Copy Markdown
Member

@satyakwok satyakwok commented May 20, 2026

Replaces #45 (which was based on feat/multi-endpoint-batch-writes and conflicted with main). This branch carries the same single commit, cherry-picked onto current main.

See #45 for the full PR description; nothing else changed.

Test plan

  • cargo test -p indexer-sync token_decode (4/4 passing)
  • Deployed on testnet host; backfilled 22k blocks past the previously-stuck point
  • Bridge transfers correctly populate token_transfers (verified via Sentrix testnet sUSDC route, 5 mints + 1 burn)
  • Steady-state tail loop processes new transfers within safe-lag

Summary by CodeRabbit

  • New Features

    • Token transfers (ERC‑20 and ERC‑721) are now decoded from logs and persisted during block synchronization.
    • Transfers are included in both backfill and live sync flows and written atomically with blocks/txs/logs.
    • Batch insertion for token transfers to improve sync performance.
  • Tests

    • Unit tests added to validate ERC‑20/ERC‑721 decoding and malformed-log handling.

Review Change Stack

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 20, 2026

Caution

Review failed

Pull request was closed or merged during review

📝 Walkthrough

Walkthrough

This PR adds decoding of ERC-20/ERC-721 Transfer logs into TokenTransfer records, a DB batch-insert function for TokenTransfer rows, and integrates token_transfers into BlockBundle. Backfill and ingest paths filter logs to those whose tx_hash exists in decoded transactions, derive token_transfers from filtered logs, and both single-block and batched writers persist transfers atomically with blocks/transactions/logs.

Sequence Diagram

sequenceDiagram
  participant Source as backfill.rs / ingest_one
  participant Filter as Orphan Filter
  participant Decoder as token_decode::decode_transfer
  participant Writer as block_writer
  participant DB as Database
  Source->>Source: Fetch block logs and decoded transactions
  Source->>Filter: Build tx_hash set from transactions
  Source->>Filter: Filter logs by tx_hash presence
  Filter->>Decoder: Filtered logs
  loop Each log
    Decoder->>Decoder: Decode to TokenTransfer (validate topic0, parse addresses/amount/token_id)
  end
  Decoder->>Source: token_transfers: Vec<TokenTransfer>
  Source->>Writer: BlockBundle { blocks, txs, logs, token_transfers }
  Writer->>DB: BEGIN TRANSACTION
  Writer->>DB: INSERT blocks
  Writer->>DB: INSERT transactions
  Writer->>DB: INSERT logs
  Writer->>DB: INSERT token_transfers (batch via token_transfers::insert_batch)
  Writer->>DB: COMMIT
  DB-->>Writer: Success
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • Sentriscloud/indexer-rs#33: Restructures backfill/fetch_one which overlaps bundle construction and log handling modified here.
  • Sentriscloud/indexer-rs#35: Refactors block_writer batch write paths (COPY/atomic batch), intersecting writer integration points updated in this PR.
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Description check ⚠️ Warning The description is largely incomplete; it defers to PR #45 instead of providing comprehensive details about the changeset, and lacks required sections like Scope, Checks, and Deploy impact checkboxes. Expand the description to include all required template sections: Scope checkboxes, Checks (forge build/test/fmt/slither), Storage layout review, and Deploy impact assessment with clear selections.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main changes: decoding ERC-20/ERC-721 transfers and filtering orphan logs from the sync module.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/erc20-transfer-handler-clean

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/db/src/token_transfers.rs`:
- Around line 33-34: The doc comment above the batch insert claiming "Retry-safe
via `(tx_hash, log_index)` ON CONFLICT DO NOTHING" is incorrect; either
implement the promised dedup behavior by adding a unique constraint on (tx_hash,
log_index) and using ON CONFLICT DO NOTHING in the batch insert SQL, or update
the doc comment to remove the ON CONFLICT claim and clearly state that the table
has no unique constraint (matching the note at line 57). Locate the comment and
the batch-insert logic in token_transfers.rs (search for the doc string
mentioning tx_hash, log_index and the batch insert function) and make the
change: if you choose the DB change, add the unique constraint and adjust SQL to
use ON CONFLICT; otherwise, revise the comment to accurately reflect no
conflict-handling.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 5b44eca0-8206-4f1a-80d2-5f9129bd4c74

📥 Commits

Reviewing files that changed from the base of the PR and between e88f529 and 74a8eb9.

📒 Files selected for processing (5)
  • crates/db/src/token_transfers.rs
  • crates/sync/src/backfill.rs
  • crates/sync/src/block_writer.rs
  • crates/sync/src/lib.rs
  • crates/sync/src/token_decode.rs

Comment on lines +33 to +34
/// Batch insert. Retry-safe via `(tx_hash, log_index)` ON CONFLICT DO NOTHING
/// — matches the dedup contract on the raw `logs` table.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Doc comment claims ON CONFLICT handling that doesn't exist.

The doc comment states "Retry-safe via (tx_hash, log_index) ON CONFLICT DO NOTHING" but line 57 explicitly notes "No ON CONFLICT — table has no unique constraint." This inconsistency could mislead future maintainers into assuming duplicate protection exists when it doesn't.

📝 Suggested fix
-/// Batch insert. Retry-safe via `(tx_hash, log_index)` ON CONFLICT DO NOTHING
-/// — matches the dedup contract on the raw `logs` table.
+/// Batch insert. Note: table lacks a unique constraint on `(tx_hash, log_index)`,
+/// so retry safety relies on the writer's atomic cursor advance preventing
+/// re-processing. Adding the unique index is a follow-up migration.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
/// Batch insert. Retry-safe via `(tx_hash, log_index)` ON CONFLICT DO NOTHING
/// — matches the dedup contract on the raw `logs` table.
/// Batch insert. Note: table lacks a unique constraint on `(tx_hash, log_index)`,
/// so retry safety relies on the writer's atomic cursor advance preventing
/// re-processing. Adding the unique index is a follow-up migration.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/db/src/token_transfers.rs` around lines 33 - 34, The doc comment above
the batch insert claiming "Retry-safe via `(tx_hash, log_index)` ON CONFLICT DO
NOTHING" is incorrect; either implement the promised dedup behavior by adding a
unique constraint on (tx_hash, log_index) and using ON CONFLICT DO NOTHING in
the batch insert SQL, or update the doc comment to remove the ON CONFLICT claim
and clearly state that the table has no unique constraint (matching the note at
line 57). Locate the comment and the batch-insert logic in token_transfers.rs
(search for the doc string mentioning tx_hash, log_index and the batch insert
function) and make the change: if you choose the DB change, add the unique
constraint and adjust SQL to use ON CONFLICT; otherwise, revise the comment to
accurately reflect no conflict-handling.

Two related fixes that surfaced together during testnet recovery on
2026-05-20.

## ERC-20 / ERC-721 Transfer decoder

The `token_transfers` table existed but was never populated — the
`indexer-handlers` crate is still a Phase-0 placeholder, so raw logs
landed in `logs` and went nowhere. Scan UIs that asked for an
address's token balances saw an empty list even though the underlying
Transfer events were present.

Inline decoder in `sync::token_decode`:

- topic0 `0xddf252ad…` matches the canonical Transfer signature.
- ERC-20: topics = [sig, from, to], 32-byte data carries `amount`.
- ERC-721: topics = [sig, from, to, token_id], no data, amount = 1.
- ERC-1155 is out of scope here (different topic0, richer encoding).

Decoded transfers flow through `BlockBundle.token_transfers` and the
existing block-writer transaction — they commit atomically with the
block, txs, and logs the chain already wrote.

False-positive risk: any contract emitting a Transfer-shaped event
with three indexed args will decode as ERC-721 here. Acceptable for
the visibility goal; precise registry-driven classification is the
declarative-handler workstream's job.

## Orphan-log filter

Sentrix's native `/chain/blocks/<n>` and `eth_getLogs` can disagree —
a tx whose effects reverted gets stripped from the block tx vec but
its log envelopes still come back from `eth_getLogs`. The
`logs.tx_hash → transactions.hash` FK then blew the whole batch
transaction and the backfill loop stalled at the first such block.

Drop logs whose tx_hash isn't backed by a tx row in the same bundle
(both fetch_one + ingest_one paths). Such logs are orphans on this
chain by definition; preserving them would only buy us repeated
FK-violation rollbacks.

## Schema note

`token_transfers` has no unique constraint on (tx_hash, log_index),
so the batch insert skips ON CONFLICT. The writer's atomic cursor
advance prevents re-processing the same block in the steady state;
reorg recovery deletes downstream rows before re-insert. Adding the
unique index is a follow-up migration.
@satyakwok satyakwok force-pushed the fix/erc20-transfer-handler-clean branch from 74a8eb9 to 60eaacd Compare May 20, 2026 23:53
@satyakwok satyakwok merged commit 2214b7a into main May 20, 2026
7 of 8 checks passed
satyakwok added a commit that referenced this pull request May 21, 2026
…arn (#47)

Two audit follow-ups against the ERC-20 handler PR (#46):

## M-3: token_transfers unique constraint

`token_transfers` had only `PRIMARY KEY (id)` when shipped. On reorg
recovery + replay, the same `(tx_hash, log_index)` rows would re-insert
without a uniqueness gate, producing duplicate transfer rows that
double-count balances downstream. The atomic cursor advance in the
steady-state writer prevented this in practice, but reorg / forensic-
rerun paths bypass that protection.

- New migration `0003_token_transfers_unique.sql` adds
  `transfers_tx_log_unique_idx` after deduping any existing duplicates.
- `insert_batch` re-enabled `ON CONFLICT (tx_hash, log_index) DO NOTHING`
  so replay is now idempotent at the DB layer in addition to the
  cursor-based writer guard.

The DELETE in the migration is a no-op for chains that have only run
the indexer post-#46 (no duplicates accumulated); kept in place to
keep the migration replay-safe on chains that did.

## M-5: raise orphan-log filter to warn

`fetch_one` / `ingest_one` drop logs whose tx_hash isn't backed by a
tx row in the same bundle (added in #46 to work around a Sentrix-side
data inconsistency where reverted-tx log envelopes still come back
from `eth_getLogs`). Original commit logged at `debug!`, which is
silent at default log level. Sustained high-volume drops here signal
a real chain bug worth investigating, so promote to `warn!` to make
the gap visible without changing logging config.

Verified locally: no compile changes, existing tests pass.

Co-authored-by: satyakwok <satyakwok@users.noreply.github.com>
satyakwok added a commit that referenced this pull request May 21, 2026
Audit M-4 follow-up against #46. The Transfer selector
`0xddf252ad…` is shared by:

- ERC-20 `Transfer(address indexed, address indexed, uint256)` —
  three topics + 32-byte unindexed data carrying amount.
- ERC-721 `Transfer(address indexed, address indexed, uint256 indexed)`
  — four topics, no unindexed data.

Custom events with three indexed args + unindexed data (eg.
`Transfer(address indexed, address indexed, address indexed, uint256)`)
hit the same selector. The original decoder distinguished only on
topic count, so any such custom event got recorded as an ERC-721 with
a fabricated `amount = 1` and a token_id taken from whatever the third
indexed argument happened to be. That polluted `token_transfers` with
false NFT rows and broke balance aggregation on the indexer's
`/accounts/{addr}/tokens` endpoint.

Per ERC-721 spec the Transfer event has NO unindexed data. Add a
data-must-be-empty gate to the ERC-721 path; return `None` if data is
present so the row drops out of the index entirely. Real NFT contracts
emit empty data and continue to decode correctly. Real custom events
with three indexed args fall through silently — the cost is missing
them from `token_transfers`, but they weren't meaningfully decoded
before either.

Adds a regression test (`skips_three_indexed_args_with_nonempty_data`).

Co-authored-by: satyakwok <satyakwok@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant