Skip to content

Conversation

@nischitpra
Copy link
Collaborator

@nischitpra nischitpra commented Aug 29, 2025

Summary by CodeRabbit

  • New Features

    • Blocks now include a token_transfers array and a new materialized view populates token transfer rows into the token_transfers table.
  • Chores

    • Removed several legacy materialized views that previously populated token transfer data.
    • Made certain materialized-view creations idempotent (won’t fail if already present).

@coderabbitai
Copy link

coderabbitai bot commented Aug 29, 2025

Walkthrough

Adds a new token_transfers Array(Tuple(...)) column to default.insert_null_block_data, removes five materialized views that previously populated token_transfers from logs, and adds a new materialized view default.insert_token_transfers_mv that expands token_transfers from default.insert_null_block_data into default.token_transfers.

Changes

Cohort / File(s) Summary
ClickHouse table DDL update
internal/tools/clickhouse/0004_clickhouse_create_insert_null_table.sql
Added token_transfers Array(Tuple(...)) column (inserted after refund_address), minor formatting/parenthesis placement changes, punctuation/end-of-file adjustments; table definition otherwise unchanged.
Token transfer materialized views removal
internal/tools/clickhouse/0007_clickhouse_create_token_transfers_mv.sql
Deleted file content, removing five MVs that projected token transfers from logs: token_transfers_erc20_mv, token_transfers_erc721_mv, token_transfers_erc1155_single_mv, token_transfers_erc1155_batch_mv, token_transfers_erc6909_mv.
New materialized view to project token_transfers
internal/tools/clickhouse/0014_clickhouse_create_insert_token_transfers_mv.sql
Added insert_token_transfers_mv (TO token_transfers) selecting ARRAY JOIN token_transfers AS tt from insert_null_block_data and inserting expanded columns: chain_id, token_type, token_address, token_id, from_address, to_address, block_number, block_timestamp, transaction_hash, transaction_index, amount, log_index, batch_index, insert_timestamp, is_deleted.
Idempotent MV creation flags added
internal/tools/clickhouse/0011_clickhouse_create_address_transactions_mv.sql, internal/tools/clickhouse/0013_clickhouse_create_address_transfers_mv.sql
Each CREATE MATERIALIZED VIEW changed to CREATE MATERIALIZED VIEW IF NOT EXISTS ... (no other query changes).

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Producer as Logs Producer
  participant CH as ClickHouse
  participant Table as default.insert_null_block_data
  participant MV as default.insert_token_transfers_mv
  participant TT as default.token_transfers

  rect rgba(200,230,255,0.20)
    note right of CH: Previous flow (removed)
    Producer->>CH: Insert logs
    CH-->>CH: Materialized Views (ERC20/721/1155/6909)  %% removed
    CH->>TT: Direct inserts into token_transfers (via removed MVs)
  end

  rect rgba(220,255,220,0.20)
    note right of CH: New flow
    Producer->>Table: Insert block row (includes token_transfers Array)
    Table->>MV: ARRAY JOIN token_transfers AS tt (MV triggers)
    MV->>TT: Insert expanded token transfer rows (includes insert_timestamp,is_deleted)
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch np/null_table_schema

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
internal/tools/clickhouse/0004_clickhouse_create_insert_null_table.sql (3)

92-105: Schema design impact: denormalizing token_transfers into this sink table.

This increases row width and duplication; check ingestion payloads, backfill strategy, and downstream queries that previously read token_transfers table/MVs.

  • Confirm writers now populate this nested column.
  • Plan migration/backfill and deprecation of consumers of the old MVs/table.
  • Consider compression/codecs for large Array(Tuple) payloads.

108-109: End-of-statement delimiter.

Add a semicolon (and final newline) for compatibility with some migration runners.

-ENGINE = Null
+ENGINE = Null;

91-91: Traces tuple close is syntactically fine; keep this style consistent across columns.

If you prefer symmetry with other columns, reduce excess closing parens to the minimal required.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between c4b86db and 6ab912b.

📒 Files selected for processing (2)
  • internal/tools/clickhouse/0004_clickhouse_create_insert_null_table.sql (5 hunks)
  • internal/tools/clickhouse/0007_clickhouse_create_token_transfers_mv.sql (0 hunks)
💤 Files with no reviewable changes (1)
  • internal/tools/clickhouse/0007_clickhouse_create_token_transfers_mv.sql
🔇 Additional comments (5)
internal/tools/clickhouse/0004_clickhouse_create_insert_null_table.sql (5)

92-105: Verify LowCardinality inside Array(Tuple) on your ClickHouse version.

Nested LowCardinality(String) had version-dependent limitations; ensure your cluster version fully supports it to avoid unexpected conversions or insert errors.

Would you like me to draft a quick compatibility checklist for your CH version?


25-25: Paren/comma placement looks fine.


57-57: Transactions tuple close/comma is consistent with prior style.


70-70: Logs tuple close/comma is correct.


92-106: Syntax error: missing comma after token_transfers column.

Without a trailing comma, insert_timestamp is parsed as part of the previous column definition.

Apply:

-        batch_index Nullable(UInt16)))
+        batch_index Nullable(UInt16))),

Likely an incorrect or invalid review comment.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
internal/tools/clickhouse/0014_clickhouse_create_insert_token_transfers_mv.sql (1)

10-10: Consider normalizing timestamps to UTC (and precision if needed).

If the destination uses UTC, wrap block_timestamp with toUTC(...). If millisecond precision is required elsewhere, consider DateTime64(3).

Apply if UTC normalization is desired:

-    tt.8 AS block_timestamp,
+    toUTC(tt.8) AS block_timestamp,

Also applies to: 27-27

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 6ab912b and dc75352.

📒 Files selected for processing (1)
  • internal/tools/clickhouse/0014_clickhouse_create_insert_token_transfers_mv.sql (1 hunks)
🔇 Additional comments (4)
internal/tools/clickhouse/0014_clickhouse_create_insert_token_transfers_mv.sql (4)

1-1: Add ON CLUSTER clause and backfill plan

  • If this MV is deployed on a ClickHouse cluster, include ON CLUSTER <cluster_name> in the CREATE MATERIALIZED VIEW statement to propagate it across all nodes.
  • Define and document a backfill strategy: use POPULATE for a one-time load (noting it may miss concurrent writes) or implement a controlled manual backfill job, then outline cutover steps once legacy MVs are retired.

No existing ON CLUSTER usage found across SQL files—confirm cluster deployment requirements with the team.


35-36: Confirm ARRAY JOIN behavior & deduplication

  • ARRAY JOIN drops rows when token_transfers is empty; switch to LEFT ARRAY JOIN if you need to retain those source rows.
  • The token_transfers table uses a ReplacingMergeTree ordered by (chain_id, token_address, block_number, transaction_index, log_index) with insert_timestamp as the version column, so duplicate entries collapse at merge time. Ensure this aligns with your dedup requirements.

21-32: Remove off-by-one warning – the token_transfers tuple defines chain_id as its first element, so using tt.2 through tt.13 correctly maps the remaining 12 fields.


2-18: No schema mismatch: insert_token_transfers_mv columns match default.token_transfers exactly.

Comment on lines 5 to 8
token_address FixedString(42),
token_id UInt256,
from_address FixedString(42),
to_address FixedString(42),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Validate FixedString lengths for addresses and tx hash.

FixedString(42) assumes 0x + 40 hex; FixedString(66) assumes 0x + 64 hex. Ensure upstream always includes the 0x prefix and canonical length; otherwise inserts will fail or silently truncate.

Also applies to: 11-11


🏁 Script executed:

#!/bin/bash
# Grep writers/serializers to confirm addresses & hashes include '0x' and exact lengths
rg -nC2 -P '\b(0x)?[0-9a-fA-F]{40}\b|\b(0x)?[0-9a-fA-F]{64}\b' -g '!**/node_modules/**'

Length of output: 12564


🏁 Script executed:

#!/bin/bash
# Locate the materialized view file and inspect its initial lines and FixedString declarations
FILE=$(rg -l "0014_clickhouse_create_insert_token_transfers_mv.sql")
echo "File: $FILE"
sed -n '1,200p' "$FILE"
rg -n "FixedString" "$FILE"

Length of output: 219


🏁 Script executed:

#!/bin/bash
# Locate and inspect the materialized view file for FixedString declarations
FILE=$(find . -type f -path '*internal/tools/clickhouse/*0014_clickhouse_create_insert_token_transfers_mv.sql' | head -n1)
echo "Located MV file at: $FILE"
head -n 50 "$FILE"
rg -n "FixedString" "$FILE"

Length of output: 1726


Enforce 0x-prefixed 40/64-hex strings before inserting into FixedString(42/66)
FixedString columns truncate or pad silently. Ensure the ingestion pipeline always normalizes Ethereum addresses (0x+40-hex) and transaction hashes (0x+64-hex)—add an upstream validation or conversion step to guarantee the 0x prefix and full canonical length.

transaction_index UInt64,
amount UInt256,
log_index UInt64,
batch_index Nullable(UInt16),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Nullable batch_index: add explicit cast if the tuple element is non-nullable.

If tt.13 is UInt16 (not Nullable(UInt16)), inserts will fail when nulls appear. Cast proactively.

-    tt.13 AS batch_index,
+    CAST(tt.13 AS Nullable(UInt16)) AS batch_index,

Also applies to: 32-32

🤖 Prompt for AI Agents
internal/tools/clickhouse/0014_clickhouse_create_insert_token_transfers_mv.sql
around line 15 (also applies to line 32): the materialized view inserts assume
tt.13 may be nullable but if tt.13 is UInt16 (non-nullable) inserts will fail
when NULLs occur; update the INSERT/SELECT expressions that populate batch_index
to cast the tuple element explicitly to a nullable type (e.g., use CAST(tt.13 AS
Nullable(UInt16)) or toNullable(tt.13)) so batch_index is always inserted as
Nullable(UInt16).

Comment on lines +33 to +34
insert_timestamp,
is_deleted
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Soft-delete semantics won’t propagate for existing rows.

MVs are append-only. If is_deleted changes in insert_null_block_data after initial insert, the MV won’t update existing token_transfers. Ensure the destination table uses an engine/pattern (e.g., ReplacingMergeTree with a version) that can accept compensating rows, or handle deletes via a separate stream.


🏁 Script executed:

#!/bin/bash
# Inspect token_transfers engine and primary key for upsert/dedup strategy
rg -nC3 -P 'CREATE\s+TABLE\s+(default\.)?token_transfers\b.*\n.*ENGINE\s*=\s*[A-Za-z]+' --type=sql

Length of output: 100


🏁 Script executed:

#!/bin/bash
# Locate the CREATE TABLE definition for token_transfers
rg -nC3 'CREATE\s+TABLE.*token_transfers' -g '*.sql'
# Fallback: list all usages of token_transfers in SQL files for context
rg -nC3 'token_transfers' -g '*.sql'

Length of output: 13207


🏁 Script executed:

#!/bin/bash
# Extract the engine and ordering key for token_transfers
sed -n '1,50p' internal/tools/clickhouse/0006_clickhouse_create_token_transfers.sql

Length of output: 1560


🏁 Script executed:

#!/bin/bash
sed -n '1,200p' internal/tools/clickhouse/0006_clickhouse_create_token_transfers.sql

Length of output: 3784


🏁 Script executed:

#!/bin/bash
# Print the CREATE TABLE definition for insert_null_block_data
sed -n '1,200p' internal/tools/clickhouse/0004_clickhouse_create_insert_null_table.sql

Length of output: 3607


Soft-delete flags won’t propagate through this MV. Materialized views only append on inserts to insert_null_block_data; updates to is_deleted there won’t re-emit rows into token_transfers. To support soft-deletes, switch token_transfers to an engine that collapses on is_deleted (e.g. CollapsingMergeTree(is_deleted)) or use a proper version column with ReplacingMergeTree—and emit compensating delete rows via a dedicated MV/stream.

🤖 Prompt for AI Agents
In
internal/tools/clickhouse/0014_clickhouse_create_insert_token_transfers_mv.sql
around lines 33-34, the materialized view appends rows from
insert_null_block_data so updates to is_deleted do not propagate and
soft-deletes are lost; either change the token_transfers table engine to a
collapsing engine (e.g. CollapsingMergeTree(is_deleted)) so tombstone markers
collapse rows on read, or implement a proper versioning approach: add a version
column and use ReplacingMergeTree(version) (or maintain explicit tombstone rows)
and create/update a secondary MV or stream that emits compensating
delete/tombstone rows when is_deleted is set so deletes are represented in
token_transfers.

@jakeloo jakeloo merged commit 101165b into main Aug 29, 2025
4 of 5 checks passed
@jakeloo jakeloo deleted the np/null_table_schema branch August 29, 2025 20:40
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (3)
internal/tools/clickhouse/0014_clickhouse_create_insert_token_transfers_mv.sql (3)

6-12: Normalize 0x-prefixed FixedString fields upstream.

token_address (FixedString(42)) and transaction_hash (FixedString(66)) require 0x + 40/64 hex. Enforce canonical length/prefix before insert to avoid silent pad/truncate.


17-18: Soft-delete flags won’t propagate via this MV.

MVs are append-only; updates to is_deleted in insert_null_block_data won’t update existing rows in token_transfers. Use a collapsing/replacing engine with versioning or emit compensating tombstones.


16-16: Explicit nullable cast for batch_index.

As called out earlier, ensure batch_index is inserted as Nullable(UInt16) to avoid failures when NULLs appear.

-    tt.12 AS batch_index,
+    CAST(tt.13 AS Nullable(UInt16)) AS batch_index,
🧹 Nitpick comments (3)
internal/tools/clickhouse/0013_clickhouse_create_address_transfers_mv.sql (1)

1-1: Idempotent creation is fine; add a plan for definition drift.

IF NOT EXISTS won’t update an existing MV if its query changes. Consider a follow-up migration that drops/recreates when the SELECT changes, or add a checksum comment to detect drift during deploys.

internal/tools/clickhouse/0011_clickhouse_create_address_transactions_mv.sql (1)

1-1: Same note on MV drift with IF NOT EXISTS.

If the MV definition ever changes, IF NOT EXISTS will mask it. Plan a DROP/CREATE on change or add a guard to verify the current definition matches.

internal/tools/clickhouse/0014_clickhouse_create_insert_token_transfers_mv.sql (1)

1-3: Backfill strategy for historical rows.

IF NOT EXISTS + no POPULATE means only future inserts flow. Add a one-time backfill:
INSERT INTO token_transfers SELECT ... FROM insert_null_block_data ARRAY JOIN token_transfers AS tt
using the same corrected projection as this MV.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between dc75352 and eea1630.

📒 Files selected for processing (4)
  • internal/tools/clickhouse/0004_clickhouse_create_insert_null_table.sql (5 hunks)
  • internal/tools/clickhouse/0011_clickhouse_create_address_transactions_mv.sql (1 hunks)
  • internal/tools/clickhouse/0013_clickhouse_create_address_transfers_mv.sql (1 hunks)
  • internal/tools/clickhouse/0014_clickhouse_create_insert_token_transfers_mv.sql (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • internal/tools/clickhouse/0004_clickhouse_create_insert_null_table.sql
🔇 Additional comments (1)
internal/tools/clickhouse/0014_clickhouse_create_insert_token_transfers_mv.sql (1)

4-19: All selected fields’ types align with the token_transfers table; no mismatches detected.

Comment on lines +5 to +16
tt.1 AS token_type,
tt.2 AS token_address,
tt.3 AS token_id,
tt.4 AS from_address,
tt.5 AS to_address,
tt.6 AS block_number,
tt.7 AS block_timestamp,
tt.8 AS transaction_hash,
tt.9 AS transaction_index,
tt.10 AS amount,
tt.11 AS log_index,
tt.12 AS batch_index,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Tuple index misalignment: fields shifted by one → data corruption risk.

Per the new schema, the tuple layout starts with chain_id, so token_type should be tt.2, token_address tt.3, …, and batch_index tt.13. Current mapping is off-by-one. Fix as below; also cast batch_index to nullable.

-    tt.1 AS token_type,
-    tt.2 AS token_address,
-    tt.3 AS token_id,
-    tt.4 AS from_address,
-    tt.5 AS to_address,
-    tt.6 AS block_number,
-    tt.7 AS block_timestamp,
-    tt.8 AS transaction_hash,
-    tt.9 AS transaction_index,
-    tt.10 AS amount,
-    tt.11 AS log_index,
-    tt.12 AS batch_index,
+    tt.2 AS token_type,
+    tt.3 AS token_address,
+    tt.4 AS token_id,
+    tt.5 AS from_address,
+    tt.6 AS to_address,
+    tt.7 AS block_number,
+    tt.8 AS block_timestamp,
+    tt.9 AS transaction_hash,
+    tt.10 AS transaction_index,
+    tt.11 AS amount,
+    tt.12 AS log_index,
+    CAST(tt.13 AS Nullable(UInt16)) AS batch_index,
🤖 Prompt for AI Agents
In
internal/tools/clickhouse/0014_clickhouse_create_insert_token_transfers_mv.sql
around lines 5 to 16, the tuple indexes are off by one because the new schema
prepends chain_id; update every tt.N by incrementing the index by 1 (so
token_type becomes tt.2, token_address tt.3, token_id tt.4, from_address tt.5,
to_address tt.6, block_number tt.7, block_timestamp tt.8, transaction_hash tt.9,
transaction_index tt.10, amount tt.11, log_index tt.12, batch_index tt.13) and
ensure batch_index is cast/defined as nullable per schema (replace non-null type
with its nullable equivalent).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants