null table schema update #277

nischitpra · 2025-08-29T15:57:40Z

Summary by CodeRabbit

New Features
- Blocks now include a token_transfers array and a new materialized view populates token transfer rows into the token_transfers table.
Chores
- Removed several legacy materialized views that previously populated token transfer data.
- Made certain materialized-view creations idempotent (won’t fail if already present).

coderabbitai · 2025-08-29T15:57:48Z

Walkthrough

Adds a new token_transfers Array(Tuple(...)) column to default.insert_null_block_data, removes five materialized views that previously populated token_transfers from logs, and adds a new materialized view default.insert_token_transfers_mv that expands token_transfers from default.insert_null_block_data into default.token_transfers.

Changes

Cohort / File(s)	Summary
ClickHouse table DDL update `internal/tools/clickhouse/0004_clickhouse_create_insert_null_table.sql`	Added `token_transfers Array(Tuple(...))` column (inserted after `refund_address`), minor formatting/parenthesis placement changes, punctuation/end-of-file adjustments; table definition otherwise unchanged.
Token transfer materialized views removal `internal/tools/clickhouse/0007_clickhouse_create_token_transfers_mv.sql`	Deleted file content, removing five MVs that projected token transfers from `logs`: `token_transfers_erc20_mv`, `token_transfers_erc721_mv`, `token_transfers_erc1155_single_mv`, `token_transfers_erc1155_batch_mv`, `token_transfers_erc6909_mv`.
New materialized view to project token_transfers `internal/tools/clickhouse/0014_clickhouse_create_insert_token_transfers_mv.sql`	Added `insert_token_transfers_mv` (TO `token_transfers`) selecting `ARRAY JOIN token_transfers AS tt` from `insert_null_block_data` and inserting expanded columns: `chain_id, token_type, token_address, token_id, from_address, to_address, block_number, block_timestamp, transaction_hash, transaction_index, amount, log_index, batch_index, insert_timestamp, is_deleted`.
Idempotent MV creation flags added `internal/tools/clickhouse/0011_clickhouse_create_address_transactions_mv.sql`, `internal/tools/clickhouse/0013_clickhouse_create_address_transfers_mv.sql`	Each CREATE MATERIALIZED VIEW changed to `CREATE MATERIALIZED VIEW IF NOT EXISTS ...` (no other query changes).

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Producer as Logs Producer
  participant CH as ClickHouse
  participant Table as default.insert_null_block_data
  participant MV as default.insert_token_transfers_mv
  participant TT as default.token_transfers

  rect rgba(200,230,255,0.20)
    note right of CH: Previous flow (removed)
    Producer->>CH: Insert logs
    CH-->>CH: Materialized Views (ERC20/721/1155/6909)  %% removed
    CH->>TT: Direct inserts into token_transfers (via removed MVs)
  end

  rect rgba(220,255,220,0.20)
    note right of CH: New flow
    Producer->>Table: Insert block row (includes token_transfers Array)
    Table->>MV: ARRAY JOIN token_transfers AS tt (MV triggers)
    MV->>TT: Insert expanded token transfer rows (includes insert_timestamp,is_deleted)
  end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

✨ Finishing Touches

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch np/null_table_schema

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (3)

internal/tools/clickhouse/0004_clickhouse_create_insert_null_table.sql (3)
92-105: Schema design impact: denormalizing token_transfers into this sink table.

This increases row width and duplication; check ingestion payloads, backfill strategy, and downstream queries that previously read token_transfers table/MVs.

Confirm writers now populate this nested column.

Plan migration/backfill and deprecation of consumers of the old MVs/table.

Consider compression/codecs for large Array(Tuple) payloads.

108-109: End-of-statement delimiter.

Add a semicolon (and final newline) for compatibility with some migration runners.
-ENGINE = Null
+ENGINE = Null;
91-91: Traces tuple close is syntactically fine; keep this style consistent across columns.

If you prefer symmetry with other columns, reduce excess closing parens to the minimal required.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between c4b86db and 6ab912b.

📒 Files selected for processing (2)

internal/tools/clickhouse/0004_clickhouse_create_insert_null_table.sql (5 hunks)
internal/tools/clickhouse/0007_clickhouse_create_token_transfers_mv.sql (0 hunks)

💤 Files with no reviewable changes (1)

internal/tools/clickhouse/0007_clickhouse_create_token_transfers_mv.sql

🔇 Additional comments (5)

internal/tools/clickhouse/0004_clickhouse_create_insert_null_table.sql (5)
92-105: Verify LowCardinality inside Array(Tuple) on your ClickHouse version.

Nested LowCardinality(String) had version-dependent limitations; ensure your cluster version fully supports it to avoid unexpected conversions or insert errors.

Would you like me to draft a quick compatibility checklist for your CH version?

25-25: Paren/comma placement looks fine.

57-57: Transactions tuple close/comma is consistent with prior style.

70-70: Logs tuple close/comma is correct.

92-106: Syntax error: missing comma after token_transfers column.

Without a trailing comma, insert_timestamp is parsed as part of the previous column definition.

Apply:
-        batch_index Nullable(UInt16)))
+        batch_index Nullable(UInt16))),
Likely an incorrect or invalid review comment.

internal/tools/clickhouse/0004_clickhouse_create_insert_null_table.sql

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (1)

internal/tools/clickhouse/0014_clickhouse_create_insert_token_transfers_mv.sql (1)
10-10: Consider normalizing timestamps to UTC (and precision if needed).

If the destination uses UTC, wrap block_timestamp with toUTC(...). If millisecond precision is required elsewhere, consider DateTime64(3).

Apply if UTC normalization is desired:
-    tt.8 AS block_timestamp,
+    toUTC(tt.8) AS block_timestamp,
Also applies to: 27-27

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 6ab912b and dc75352.

📒 Files selected for processing (1)

internal/tools/clickhouse/0014_clickhouse_create_insert_token_transfers_mv.sql (1 hunks)

🔇 Additional comments (4)

internal/tools/clickhouse/0014_clickhouse_create_insert_token_transfers_mv.sql (4)

1-1: Add ON CLUSTER clause and backfill plan

If this MV is deployed on a ClickHouse cluster, include ON CLUSTER <cluster_name> in the CREATE MATERIALIZED VIEW statement to propagate it across all nodes.

Define and document a backfill strategy: use POPULATE for a one-time load (noting it may miss concurrent writes) or implement a controlled manual backfill job, then outline cutover steps once legacy MVs are retired.

No existing ON CLUSTER usage found across SQL files—confirm cluster deployment requirements with the team.

35-36: Confirm ARRAY JOIN behavior & deduplication

ARRAY JOIN drops rows when token_transfers is empty; switch to LEFT ARRAY JOIN if you need to retain those source rows.

The token_transfers table uses a ReplacingMergeTree ordered by (chain_id, token_address, block_number, transaction_index, log_index) with insert_timestamp as the version column, so duplicate entries collapse at merge time. Ensure this aligns with your dedup requirements.

21-32: Remove off-by-one warning – the token_transfers tuple defines chain_id as its first element, so using tt.2 through tt.13 correctly maps the remaining 12 fields.

2-18: No schema mismatch: insert_token_transfers_mv columns match default.token_transfers exactly.

coderabbitai · 2025-08-29T17:26:21Z

internal/tools/clickhouse/0014_clickhouse_create_insert_token_transfers_mv.sql

+    token_address FixedString(42),
+    token_id UInt256,
+    from_address FixedString(42),
+    to_address FixedString(42),


💡 Verification agent

🧩 Analysis chain

Validate FixedString lengths for addresses and tx hash.

FixedString(42) assumes 0x + 40 hex; FixedString(66) assumes 0x + 64 hex. Ensure upstream always includes the 0x prefix and canonical length; otherwise inserts will fail or silently truncate.

Also applies to: 11-11

🏁 Script executed:

#!/bin/bash # Grep writers/serializers to confirm addresses & hashes include '0x' and exact lengths rg -nC2 -P '\b(0x)?[0-9a-fA-F]{40}\b|\b(0x)?[0-9a-fA-F]{64}\b' -g '!**/node_modules/**'

Length of output: 12564

🏁 Script executed:

#!/bin/bash # Locate the materialized view file and inspect its initial lines and FixedString declarations FILE=$(rg -l "0014_clickhouse_create_insert_token_transfers_mv.sql") echo "File: $FILE" sed -n '1,200p' "$FILE" rg -n "FixedString" "$FILE"

Length of output: 219

🏁 Script executed:

#!/bin/bash # Locate and inspect the materialized view file for FixedString declarations FILE=$(find . -type f -path '*internal/tools/clickhouse/*0014_clickhouse_create_insert_token_transfers_mv.sql' | head -n1) echo "Located MV file at: $FILE" head -n 50 "$FILE" rg -n "FixedString" "$FILE"

Length of output: 1726

Enforce 0x-prefixed 40/64-hex strings before inserting into FixedString(42/66)
FixedString columns truncate or pad silently. Ensure the ingestion pipeline always normalizes Ethereum addresses (0x+40-hex) and transaction hashes (0x+64-hex)—add an upstream validation or conversion step to guarantee the 0x prefix and full canonical length.

coderabbitai · 2025-08-29T17:26:21Z

internal/tools/clickhouse/0014_clickhouse_create_insert_token_transfers_mv.sql

+    transaction_index UInt64,
+    amount UInt256,
+    log_index UInt64,
+    batch_index Nullable(UInt16),


🛠️ Refactor suggestion

Nullable batch_index: add explicit cast if the tuple element is non-nullable.

If tt.13 is UInt16 (not Nullable(UInt16)), inserts will fail when nulls appear. Cast proactively.

- tt.13 AS batch_index, + CAST(tt.13 AS Nullable(UInt16)) AS batch_index,

Also applies to: 32-32

🤖 Prompt for AI Agents

internal/tools/clickhouse/0014_clickhouse_create_insert_token_transfers_mv.sql around line 15 (also applies to line 32): the materialized view inserts assume tt.13 may be nullable but if tt.13 is UInt16 (non-nullable) inserts will fail when NULLs occur; update the INSERT/SELECT expressions that populate batch_index to cast the tuple element explicitly to a nullable type (e.g., use CAST(tt.13 AS Nullable(UInt16)) or toNullable(tt.13)) so batch_index is always inserted as Nullable(UInt16).

coderabbitai · 2025-08-29T17:26:21Z

internal/tools/clickhouse/0014_clickhouse_create_insert_token_transfers_mv.sql

+    insert_timestamp,
+    is_deleted


💡 Verification agent

🧩 Analysis chain

Soft-delete semantics won’t propagate for existing rows.

MVs are append-only. If is_deleted changes in insert_null_block_data after initial insert, the MV won’t update existing token_transfers. Ensure the destination table uses an engine/pattern (e.g., ReplacingMergeTree with a version) that can accept compensating rows, or handle deletes via a separate stream.

🏁 Script executed:

#!/bin/bash # Inspect token_transfers engine and primary key for upsert/dedup strategy rg -nC3 -P 'CREATE\s+TABLE\s+(default\.)?token_transfers\b.*\n.*ENGINE\s*=\s*[A-Za-z]+' --type=sql

Length of output: 100

🏁 Script executed:

#!/bin/bash # Locate the CREATE TABLE definition for token_transfers rg -nC3 'CREATE\s+TABLE.*token_transfers' -g '*.sql' # Fallback: list all usages of token_transfers in SQL files for context rg -nC3 'token_transfers' -g '*.sql'

Length of output: 13207

🏁 Script executed:

#!/bin/bash # Extract the engine and ordering key for token_transfers sed -n '1,50p' internal/tools/clickhouse/0006_clickhouse_create_token_transfers.sql

Length of output: 1560

🏁 Script executed:

#!/bin/bash sed -n '1,200p' internal/tools/clickhouse/0006_clickhouse_create_token_transfers.sql

Length of output: 3784

🏁 Script executed:

#!/bin/bash # Print the CREATE TABLE definition for insert_null_block_data sed -n '1,200p' internal/tools/clickhouse/0004_clickhouse_create_insert_null_table.sql

Length of output: 3607

Soft-delete flags won’t propagate through this MV. Materialized views only append on inserts to insert_null_block_data; updates to is_deleted there won’t re-emit rows into token_transfers. To support soft-deletes, switch token_transfers to an engine that collapses on is_deleted (e.g. CollapsingMergeTree(is_deleted)) or use a proper version column with ReplacingMergeTree—and emit compensating delete rows via a dedicated MV/stream.

🤖 Prompt for AI Agents

In internal/tools/clickhouse/0014_clickhouse_create_insert_token_transfers_mv.sql around lines 33-34, the materialized view appends rows from insert_null_block_data so updates to is_deleted do not propagate and soft-deletes are lost; either change the token_transfers table engine to a collapsing engine (e.g. CollapsingMergeTree(is_deleted)) so tombstone markers collapse rows on read, or implement a proper versioning approach: add a version column and use ReplacingMergeTree(version) (or maintain explicit tombstone rows) and create/update a secondary MV or stream that emits compensating delete/tombstone rows when is_deleted is set so deletes are represented in token_transfers.

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (3)

internal/tools/clickhouse/0014_clickhouse_create_insert_token_transfers_mv.sql (3)
6-12: Normalize 0x-prefixed FixedString fields upstream.

token_address (FixedString(42)) and transaction_hash (FixedString(66)) require 0x + 40/64 hex. Enforce canonical length/prefix before insert to avoid silent pad/truncate.

17-18: Soft-delete flags won’t propagate via this MV.

MVs are append-only; updates to is_deleted in insert_null_block_data won’t update existing rows in token_transfers. Use a collapsing/replacing engine with versioning or emit compensating tombstones.

16-16: Explicit nullable cast for batch_index.

As called out earlier, ensure batch_index is inserted as Nullable(UInt16) to avoid failures when NULLs appear.
-    tt.12 AS batch_index,
+    CAST(tt.13 AS Nullable(UInt16)) AS batch_index,

🧹 Nitpick comments (3)

internal/tools/clickhouse/0013_clickhouse_create_address_transfers_mv.sql (1)

1-1: Idempotent creation is fine; add a plan for definition drift.

IF NOT EXISTS won’t update an existing MV if its query changes. Consider a follow-up migration that drops/recreates when the SELECT changes, or add a checksum comment to detect drift during deploys.

internal/tools/clickhouse/0011_clickhouse_create_address_transactions_mv.sql (1)

1-1: Same note on MV drift with IF NOT EXISTS.

If the MV definition ever changes, IF NOT EXISTS will mask it. Plan a DROP/CREATE on change or add a guard to verify the current definition matches.

internal/tools/clickhouse/0014_clickhouse_create_insert_token_transfers_mv.sql (1)

1-3: Backfill strategy for historical rows.

IF NOT EXISTS + no POPULATE means only future inserts flow. Add a one-time backfill:
INSERT INTO token_transfers SELECT ... FROM insert_null_block_data ARRAY JOIN token_transfers AS tt
using the same corrected projection as this MV.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between dc75352 and eea1630.

📒 Files selected for processing (4)

internal/tools/clickhouse/0004_clickhouse_create_insert_null_table.sql (5 hunks)
internal/tools/clickhouse/0011_clickhouse_create_address_transactions_mv.sql (1 hunks)
internal/tools/clickhouse/0013_clickhouse_create_address_transfers_mv.sql (1 hunks)
internal/tools/clickhouse/0014_clickhouse_create_insert_token_transfers_mv.sql (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

internal/tools/clickhouse/0004_clickhouse_create_insert_null_table.sql

🔇 Additional comments (1)

internal/tools/clickhouse/0014_clickhouse_create_insert_token_transfers_mv.sql (1)

4-19: All selected fields’ types align with the token_transfers table; no mismatches detected.

coderabbitai · 2025-08-29T20:43:27Z

internal/tools/clickhouse/0014_clickhouse_create_insert_token_transfers_mv.sql

+    tt.1 AS token_type,
+    tt.2 AS token_address,
+    tt.3 AS token_id,
+    tt.4 AS from_address,
+    tt.5 AS to_address,
+    tt.6 AS block_number,
+    tt.7 AS block_timestamp,
+    tt.8 AS transaction_hash,
+    tt.9 AS transaction_index,
+    tt.10 AS amount,
+    tt.11 AS log_index,
+    tt.12 AS batch_index,


⚠️ Potential issue

Tuple index misalignment: fields shifted by one → data corruption risk.

Per the new schema, the tuple layout starts with chain_id, so token_type should be tt.2, token_address tt.3, …, and batch_index tt.13. Current mapping is off-by-one. Fix as below; also cast batch_index to nullable.

- tt.1 AS token_type, - tt.2 AS token_address, - tt.3 AS token_id, - tt.4 AS from_address, - tt.5 AS to_address, - tt.6 AS block_number, - tt.7 AS block_timestamp, - tt.8 AS transaction_hash, - tt.9 AS transaction_index, - tt.10 AS amount, - tt.11 AS log_index, - tt.12 AS batch_index, + tt.2 AS token_type, + tt.3 AS token_address, + tt.4 AS token_id, + tt.5 AS from_address, + tt.6 AS to_address, + tt.7 AS block_number, + tt.8 AS block_timestamp, + tt.9 AS transaction_hash, + tt.10 AS transaction_index, + tt.11 AS amount, + tt.12 AS log_index, + CAST(tt.13 AS Nullable(UInt16)) AS batch_index,

🤖 Prompt for AI Agents

In internal/tools/clickhouse/0014_clickhouse_create_insert_token_transfers_mv.sql around lines 5 to 16, the tuple indexes are off by one because the new schema prepends chain_id; update every tt.N by incrementing the index by 1 (so token_type becomes tt.2, token_address tt.3, token_id tt.4, from_address tt.5, to_address tt.6, block_number tt.7, block_timestamp tt.8, transaction_hash tt.9, transaction_index tt.10, amount tt.11, log_index tt.12, batch_index tt.13) and ensure batch_index is cast/defined as nullable per schema (replace non-null type with its nullable equivalent).

null table schema update

6ab912b

coderabbitai bot reviewed Aug 29, 2025

View reviewed changes

internal/tools/clickhouse/0004_clickhouse_create_insert_null_table.sql Outdated Show resolved Hide resolved

nischitpra added 2 commits August 29, 2025 23:04

insert token transfers mv

c0ebbc1

minor change

dc75352

coderabbitai bot reviewed Aug 29, 2025

View reviewed changes

nit schema update

eea1630

jakeloo merged commit 101165b into main Aug 29, 2025
4 of 5 checks passed

jakeloo deleted the np/null_table_schema branch August 29, 2025 20:40

coderabbitai bot reviewed Aug 29, 2025

View reviewed changes

null table schema update #277

null table schema update #277

Uh oh!

Conversation

nischitpra commented Aug 29, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Aug 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 29, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 29, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 29, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 29, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

nischitpra commented Aug 29, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Aug 29, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)