Skip to content

Conversation

@jakeloo
Copy link
Member

@jakeloo jakeloo commented Sep 12, 2025

Summary by CodeRabbit

  • New Features

    • Clear separation of incoming and outgoing token balance updates across ERC20, ERC721, ERC1155, and ERC6909 (separate from/to materialized views).
  • Refactor

    • Reorganized balance processing to handle "from" and "to" sides independently, with standardized direction indicators and positive/negative balance deltas.
    • Replaced stateful projection aggregates with non-stateful aggregates for token and owner projections.
  • Performance

    • Improved reliability and throughput of token balance updates by reducing contention between update paths.
  • Chores

    • Database migration to apply the new view and projection structure.

@zeet-co
Copy link

zeet-co bot commented Sep 12, 2025

We're building your pull request over on Zeet.
Click me for more info about your build and deployment.
Once built, this branch can be tested at: https://insight-8453-base-backfill-migrat-c33079.insight.zeet.app before merging 😉

@coderabbitai
Copy link

coderabbitai bot commented Sep 12, 2025

Walkthrough

Splits each token_balances materialized view into direction-specific _from_mv and _to_mv for ERC20/721/1155/6909, and replaces stateful aggregate functions with standard aggregates in token_balances/owner_balances PROJECTIONs. All edits are SQL changes across two ClickHouse migration files.

Changes

Cohort / File(s) Summary
ClickHouse materialized views split per direction
internal/tools/clickhouse/0009_clickhouse_create_token_balances_mv.sql
Replaces single per-token-type MVs with pairs: *_from_mv (uses from_address, negative balance_delta, direction='from', WHERE token_type='ercXX') and *_to_mv (uses to_address, positive balance_delta, direction='to', WHERE token_type='ercXX') for ERC20, ERC721, ERC1155, ERC6909. Renames previous MVs to *_from_mv and adds corresponding *_to_mv. All target the same token_balances.
ClickHouse PROJECTION aggregates simplified
internal/tools/clickhouse/0008_clickhouse_create_token_balances.sql
Replaces stateful aggregate functions with plain aggregates in projections: sumState(...)sum(...), minState(...)min(...), maxState(...)max(...) for owner_balances_projection and token_balances_projection; grouping keys and aliases unchanged.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant ERC20 as token_transfers_erc20
  participant ERC721 as token_transfers_erc721
  participant ERC1155 as token_transfers_erc1155
  participant ERC6909 as token_transfers_erc6909
  rect rgb(240,248,255)
    note over ERC20,ERC6909: Materialized Views (split by direction)
    participant MV20F as erc20_from_mv
    participant MV20T as erc20_to_mv
    participant MV721F as erc721_from_mv
    participant MV721T as erc721_to_mv
    participant MV1155F as erc1155_from_mv
    participant MV1155T as erc1155_to_mv
    participant MV6909F as erc6909_from_mv
    participant MV6909T as erc6909_to_mv
  end
  participant TB as token_balances

  ERC20-->>MV20F: insert transfer rows (token_type='erc20')
  ERC20-->>MV20T: insert transfer rows (token_type='erc20')
  MV20F->>TB: upsert {address=from_address, balance_delta=-amount, direction='from'}
  MV20T->>TB: upsert {address=to_address, balance_delta=+amount, direction='to'}

  ERC721-->>MV721F: insert transfer rows (token_type='erc721')
  ERC721-->>MV721T: insert transfer rows (token_type='erc721')
  MV721F->>TB: upsert {address=from_address, balance_delta=-1, direction='from'}
  MV721T->>TB: upsert {address=to_address, balance_delta=+1, direction='to'}

  ERC1155-->>MV1155F: insert transfer rows (token_type='erc1155')
  ERC1155-->>MV1155T: insert transfer rows (token_type='erc1155')
  MV1155F->>TB: upsert {address=from_address, balance_delta=-amount, direction='from'}
  MV1155T->>TB: upsert {address=to_address, balance_delta=+amount, direction='to'}

  ERC6909-->>MV6909F: insert transfer rows (token_type='erc6909')
  ERC6909-->>MV6909T: insert transfer rows (token_type='erc6909')
  MV6909F->>TB: upsert {address=from_address, balance_delta=-amount, direction='from'}
  MV6909T->>TB: upsert {address=to_address, balance_delta=+amount, direction='to'}
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Pre-merge checks (2 passed, 1 inconclusive)

❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Title Check ❓ Inconclusive The current title "Update Schema" is too vague to convey the primary changes in this PR, which split existing token_balances materialized views into separate from/to views and also replaced stateful projection aggregates with non-state functions. It does not indicate the main functional change or affected files, so a reviewer scanning history would not understand the intent. For those reasons the title check is inconclusive. Please rename the PR to a concise, specific title such as "Split token_balances MVs into from/to variants and update projections to use non-state aggregates" or, if you want to emphasize only the MV change, "Split token_balances materialized views into from/to variants". Also update the PR description to briefly call out both the MV renames/additions and the projection aggregate changes so the title clearly matches the primary intent.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
✨ Finishing touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch jl/schema

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
internal/tools/clickhouse/0009_clickhouse_create_token_balances_mv.sql (2)

46-66: ERC721 deltas and columns are sane; align zero-address handling with ERC20 decision.

Using ±1 is correct for 721. Apply the same mint/burn policy consistently.

Optional filters:

-WHERE token_type = 'erc721';
+WHERE token_type = 'erc721' AND from_address != '0x0000000000000000000000000000000000000000';
-WHERE token_type = 'erc721';
+WHERE token_type = 'erc721' AND to_address != '0x0000000000000000000000000000000000000000';

Also applies to: 67-87


5-19: Projection/index alignment with query paths.

Given our prior preference to optimize address lookups by (token_type, chain_id, address, token_id), confirm token_balances projections/primary key are still optimal with owner_address and new direction. If queries filter by direction, consider adding it to a projection key.

Also applies to: 49-63, 93-107, 136-150

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between efbdc81 and ef8063b.

📒 Files selected for processing (1)
  • internal/tools/clickhouse/0009_clickhouse_create_token_balances_mv.sql (8 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-07-22T18:30:59.800Z
Learnt from: iuwqyir
PR: thirdweb-dev/insight#240
File: internal/tools/clickhouse_create_token_balances_mv.sql:8-17
Timestamp: 2025-07-22T18:30:59.800Z
Learning: In the token_balances table projection design, the user prefers to have the address_projection optimized for queries by (token_type, chain_id, address, token_id) without including owner, as the main table is already optimized for owner-based query paths.

Applied to files:

  • internal/tools/clickhouse/0009_clickhouse_create_token_balances_mv.sql
🔇 Additional comments (4)
internal/tools/clickhouse/0009_clickhouse_create_token_balances_mv.sql (4)

2-22: ERC20 from/to views look correct; confirm type/cast and zero-address policy.

  • Casting with toInt256(amount) assumes source fits signed 256; verify source type and overflow behavior.
  • Decide whether to exclude mint/burn rows (zero address) from owner balances.

Optional zero-address filter:

-FROM token_transfers 
-WHERE token_type = 'erc20';
+FROM token_transfers 
+WHERE token_type = 'erc20' AND from_address != '0x0000000000000000000000000000000000000000';
-FROM token_transfers 
-WHERE token_type = 'erc20';
+FROM token_transfers 
+WHERE token_type = 'erc20' AND to_address != '0x0000000000000000000000000000000000000000';

Also applies to: 23-44


89-109: ERC1155 assumes per-token exploded rows; verify upstream flattening.

If token_transfers holds batched arrays for ERC1155, this SELECT will mis-shape data. Confirm it’s already exploded to (token_id, amount) rows; if not, unnest before inserting.

Also applies to: 110-130


132-152: ERC6909 cast and semantics: confirm amount units and event coverage.

  • Ensure amount carries the correct unit for 6909 and fits Int256.
  • Confirm this view is fed only by 6909 transfer-like events you intend to count.

Also applies to: 153-172


2-4: Intentional no-POPULATE?

If you need historical backfill on creation, add POPULATE; otherwise confirm a separate backfill job exists.

Comment on lines +2 to 4
CREATE MATERIALIZED VIEW IF NOT EXISTS token_balances_erc20_from_mv
TO token_balances
AS
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Drop legacy single MVs to avoid double-inserts.

If token_balances_ercXX_mv still exists, it will continue inserting alongside the new _from_mv/_to_mv. Add defensive drops in this migration.

+-- Safety: remove legacy single-sided MVs if present
+DROP MATERIALIZED VIEW IF EXISTS token_balances_erc20_mv;
+DROP MATERIALIZED VIEW IF EXISTS token_balances_erc721_mv;
+DROP MATERIALIZED VIEW IF EXISTS token_balances_erc1155_mv;
+DROP MATERIALIZED VIEW IF EXISTS token_balances_erc6909_mv;

Also applies to: 23-26, 46-48, 67-69, 89-91, 110-112, 132-134, 153-155

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
internal/tools/clickhouse/0008_clickhouse_create_token_balances.sql (2)

34-38: Switching from State-aggregates to plain aggregates in a projection is OK; check downstream usage and consider alias cleanup.

  • Aggregate projections can use plain sum/min/max; the optimizer will read from the projection when GROUP BY matches. Verify no queries still use -Merge/-Finalization over these aliases. (clickhouse.com)
  • The aliases still end with _state but now hold finalized values; consider renaming for clarity.
  • Minor perf/readability: replace if(is_deleted = 0, 1, -1) with a branchless expression to avoid UInt underflow:
-      sum(balance_delta * if(is_deleted = 0, 1, -1)) AS balance_state,
+      sum(balance_delta * (1 - 2*toInt8(is_deleted))) AS balance_state,

49-53: Same note here; also consider including token_type in this projection to align with common query filters.

  • Same aggregate/projection concerns as above. (clickhouse.com)
  • Given past preference to optimize by (token_type, chain_id, address, token_id), consider adding token_type to SELECT/GROUP BY so this projection is chosen for queries filtering by token_type without falling back to the base table. Example:
-- sketch
SELECT
  chain_id,
  token_type,
  token_address,
  token_id,
  owner_address,
  sum(balance_delta * (1 - 2*toInt8(is_deleted))) AS balance_state,
  min(block_number) AS min_block_number_state,
  min(block_timestamp) AS min_block_timestamp_state,
  max(block_number) AS max_block_number_state,
  max(block_timestamp) AS max_block_timestamp_state
GROUP BY chain_id, token_type, token_address, token_id, owner_address
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between ef8063b and 8cc72f3.

📒 Files selected for processing (1)
  • internal/tools/clickhouse/0008_clickhouse_create_token_balances.sql (2 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: iuwqyir
PR: thirdweb-dev/insight#240
File: internal/tools/clickhouse_create_token_balances_mv.sql:8-17
Timestamp: 2025-07-22T18:30:59.800Z
Learning: In the token_balances table projection design, the user prefers to have the address_projection optimized for queries by (token_type, chain_id, address, token_id) without including owner, as the main table is already optimized for owner-based query paths.
📚 Learning: 2025-07-22T18:30:59.800Z
Learnt from: iuwqyir
PR: thirdweb-dev/insight#240
File: internal/tools/clickhouse_create_token_balances_mv.sql:8-17
Timestamp: 2025-07-22T18:30:59.800Z
Learning: In the token_balances table projection design, the user prefers to have the address_projection optimized for queries by (token_type, chain_id, address, token_id) without including owner, as the main table is already optimized for owner-based query paths.

Applied to files:

  • internal/tools/clickhouse/0008_clickhouse_create_token_balances.sql

@jakeloo jakeloo merged commit f711248 into main Sep 13, 2025
5 checks passed
@jakeloo jakeloo deleted the jl/schema branch September 13, 2025 07:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants