-
Notifications
You must be signed in to change notification settings - Fork 28
null table schema update #277
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughAdds a new Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor Producer as Logs Producer
participant CH as ClickHouse
participant Table as default.insert_null_block_data
participant MV as default.insert_token_transfers_mv
participant TT as default.token_transfers
rect rgba(200,230,255,0.20)
note right of CH: Previous flow (removed)
Producer->>CH: Insert logs
CH-->>CH: Materialized Views (ERC20/721/1155/6909) %% removed
CH->>TT: Direct inserts into token_transfers (via removed MVs)
end
rect rgba(220,255,220,0.20)
note right of CH: New flow
Producer->>Table: Insert block row (includes token_transfers Array)
Table->>MV: ARRAY JOIN token_transfers AS tt (MV triggers)
MV->>TT: Insert expanded token transfer rows (includes insert_timestamp,is_deleted)
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes ✨ Finishing Touches🧪 Generate unit tests
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (3)
internal/tools/clickhouse/0004_clickhouse_create_insert_null_table.sql (3)
92-105: Schema design impact: denormalizing token_transfers into this sink table.This increases row width and duplication; check ingestion payloads, backfill strategy, and downstream queries that previously read token_transfers table/MVs.
- Confirm writers now populate this nested column.
- Plan migration/backfill and deprecation of consumers of the old MVs/table.
- Consider compression/codecs for large Array(Tuple) payloads.
108-109: End-of-statement delimiter.Add a semicolon (and final newline) for compatibility with some migration runners.
-ENGINE = Null +ENGINE = Null;
91-91: Traces tuple close is syntactically fine; keep this style consistent across columns.If you prefer symmetry with other columns, reduce excess closing parens to the minimal required.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
internal/tools/clickhouse/0004_clickhouse_create_insert_null_table.sql(5 hunks)internal/tools/clickhouse/0007_clickhouse_create_token_transfers_mv.sql(0 hunks)
💤 Files with no reviewable changes (1)
- internal/tools/clickhouse/0007_clickhouse_create_token_transfers_mv.sql
🔇 Additional comments (5)
internal/tools/clickhouse/0004_clickhouse_create_insert_null_table.sql (5)
92-105: Verify LowCardinality inside Array(Tuple) on your ClickHouse version.Nested LowCardinality(String) had version-dependent limitations; ensure your cluster version fully supports it to avoid unexpected conversions or insert errors.
Would you like me to draft a quick compatibility checklist for your CH version?
25-25: Paren/comma placement looks fine.
57-57: Transactions tuple close/comma is consistent with prior style.
70-70: Logs tuple close/comma is correct.
92-106: Syntax error: missing comma after token_transfers column.Without a trailing comma, insert_timestamp is parsed as part of the previous column definition.
Apply:
- batch_index Nullable(UInt16))) + batch_index Nullable(UInt16))),Likely an incorrect or invalid review comment.
internal/tools/clickhouse/0004_clickhouse_create_insert_null_table.sql
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (1)
internal/tools/clickhouse/0014_clickhouse_create_insert_token_transfers_mv.sql (1)
10-10: Consider normalizing timestamps to UTC (and precision if needed).If the destination uses UTC, wrap
block_timestampwithtoUTC(...). If millisecond precision is required elsewhere, considerDateTime64(3).Apply if UTC normalization is desired:
- tt.8 AS block_timestamp, + toUTC(tt.8) AS block_timestamp,Also applies to: 27-27
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
internal/tools/clickhouse/0014_clickhouse_create_insert_token_transfers_mv.sql(1 hunks)
🔇 Additional comments (4)
internal/tools/clickhouse/0014_clickhouse_create_insert_token_transfers_mv.sql (4)
1-1: AddON CLUSTERclause and backfill plan
- If this MV is deployed on a ClickHouse cluster, include
ON CLUSTER <cluster_name>in theCREATE MATERIALIZED VIEWstatement to propagate it across all nodes.- Define and document a backfill strategy: use
POPULATEfor a one-time load (noting it may miss concurrent writes) or implement a controlled manual backfill job, then outline cutover steps once legacy MVs are retired.No existing
ON CLUSTERusage found across SQL files—confirm cluster deployment requirements with the team.
35-36: Confirm ARRAY JOIN behavior & deduplication
ARRAY JOINdrops rows whentoken_transfersis empty; switch toLEFT ARRAY JOINif you need to retain those source rows.- The
token_transferstable uses a ReplacingMergeTree ordered by(chain_id, token_address, block_number, transaction_index, log_index)withinsert_timestampas the version column, so duplicate entries collapse at merge time. Ensure this aligns with your dedup requirements.
21-32: Remove off-by-one warning – thetoken_transferstuple defineschain_idas its first element, so usingtt.2throughtt.13correctly maps the remaining 12 fields.
2-18: No schema mismatch:insert_token_transfers_mvcolumns matchdefault.token_transfersexactly.
| token_address FixedString(42), | ||
| token_id UInt256, | ||
| from_address FixedString(42), | ||
| to_address FixedString(42), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Validate FixedString lengths for addresses and tx hash.
FixedString(42) assumes 0x + 40 hex; FixedString(66) assumes 0x + 64 hex. Ensure upstream always includes the 0x prefix and canonical length; otherwise inserts will fail or silently truncate.
Also applies to: 11-11
🏁 Script executed:
#!/bin/bash
# Grep writers/serializers to confirm addresses & hashes include '0x' and exact lengths
rg -nC2 -P '\b(0x)?[0-9a-fA-F]{40}\b|\b(0x)?[0-9a-fA-F]{64}\b' -g '!**/node_modules/**'Length of output: 12564
🏁 Script executed:
#!/bin/bash
# Locate the materialized view file and inspect its initial lines and FixedString declarations
FILE=$(rg -l "0014_clickhouse_create_insert_token_transfers_mv.sql")
echo "File: $FILE"
sed -n '1,200p' "$FILE"
rg -n "FixedString" "$FILE"Length of output: 219
🏁 Script executed:
#!/bin/bash
# Locate and inspect the materialized view file for FixedString declarations
FILE=$(find . -type f -path '*internal/tools/clickhouse/*0014_clickhouse_create_insert_token_transfers_mv.sql' | head -n1)
echo "Located MV file at: $FILE"
head -n 50 "$FILE"
rg -n "FixedString" "$FILE"Length of output: 1726
Enforce 0x-prefixed 40/64-hex strings before inserting into FixedString(42/66)
FixedString columns truncate or pad silently. Ensure the ingestion pipeline always normalizes Ethereum addresses (0x+40-hex) and transaction hashes (0x+64-hex)—add an upstream validation or conversion step to guarantee the 0x prefix and full canonical length.
| transaction_index UInt64, | ||
| amount UInt256, | ||
| log_index UInt64, | ||
| batch_index Nullable(UInt16), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Nullable batch_index: add explicit cast if the tuple element is non-nullable.
If tt.13 is UInt16 (not Nullable(UInt16)), inserts will fail when nulls appear. Cast proactively.
- tt.13 AS batch_index,
+ CAST(tt.13 AS Nullable(UInt16)) AS batch_index,Also applies to: 32-32
🤖 Prompt for AI Agents
internal/tools/clickhouse/0014_clickhouse_create_insert_token_transfers_mv.sql
around line 15 (also applies to line 32): the materialized view inserts assume
tt.13 may be nullable but if tt.13 is UInt16 (non-nullable) inserts will fail
when NULLs occur; update the INSERT/SELECT expressions that populate batch_index
to cast the tuple element explicitly to a nullable type (e.g., use CAST(tt.13 AS
Nullable(UInt16)) or toNullable(tt.13)) so batch_index is always inserted as
Nullable(UInt16).
| insert_timestamp, | ||
| is_deleted |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Soft-delete semantics won’t propagate for existing rows.
MVs are append-only. If is_deleted changes in insert_null_block_data after initial insert, the MV won’t update existing token_transfers. Ensure the destination table uses an engine/pattern (e.g., ReplacingMergeTree with a version) that can accept compensating rows, or handle deletes via a separate stream.
🏁 Script executed:
#!/bin/bash
# Inspect token_transfers engine and primary key for upsert/dedup strategy
rg -nC3 -P 'CREATE\s+TABLE\s+(default\.)?token_transfers\b.*\n.*ENGINE\s*=\s*[A-Za-z]+' --type=sqlLength of output: 100
🏁 Script executed:
#!/bin/bash
# Locate the CREATE TABLE definition for token_transfers
rg -nC3 'CREATE\s+TABLE.*token_transfers' -g '*.sql'
# Fallback: list all usages of token_transfers in SQL files for context
rg -nC3 'token_transfers' -g '*.sql'Length of output: 13207
🏁 Script executed:
#!/bin/bash
# Extract the engine and ordering key for token_transfers
sed -n '1,50p' internal/tools/clickhouse/0006_clickhouse_create_token_transfers.sqlLength of output: 1560
🏁 Script executed:
#!/bin/bash
sed -n '1,200p' internal/tools/clickhouse/0006_clickhouse_create_token_transfers.sqlLength of output: 3784
🏁 Script executed:
#!/bin/bash
# Print the CREATE TABLE definition for insert_null_block_data
sed -n '1,200p' internal/tools/clickhouse/0004_clickhouse_create_insert_null_table.sqlLength of output: 3607
Soft-delete flags won’t propagate through this MV. Materialized views only append on inserts to insert_null_block_data; updates to is_deleted there won’t re-emit rows into token_transfers. To support soft-deletes, switch token_transfers to an engine that collapses on is_deleted (e.g. CollapsingMergeTree(is_deleted)) or use a proper version column with ReplacingMergeTree—and emit compensating delete rows via a dedicated MV/stream.
🤖 Prompt for AI Agents
In
internal/tools/clickhouse/0014_clickhouse_create_insert_token_transfers_mv.sql
around lines 33-34, the materialized view appends rows from
insert_null_block_data so updates to is_deleted do not propagate and
soft-deletes are lost; either change the token_transfers table engine to a
collapsing engine (e.g. CollapsingMergeTree(is_deleted)) so tombstone markers
collapse rows on read, or implement a proper versioning approach: add a version
column and use ReplacingMergeTree(version) (or maintain explicit tombstone rows)
and create/update a secondary MV or stream that emits compensating
delete/tombstone rows when is_deleted is set so deletes are represented in
token_transfers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (3)
internal/tools/clickhouse/0014_clickhouse_create_insert_token_transfers_mv.sql (3)
6-12: Normalize 0x-prefixed FixedString fields upstream.
token_address(FixedString(42)) andtransaction_hash(FixedString(66)) require0x+ 40/64 hex. Enforce canonical length/prefix before insert to avoid silent pad/truncate.
17-18: Soft-delete flags won’t propagate via this MV.MVs are append-only; updates to
is_deletedininsert_null_block_datawon’t update existing rows intoken_transfers. Use a collapsing/replacing engine with versioning or emit compensating tombstones.
16-16: Explicit nullable cast for batch_index.As called out earlier, ensure
batch_indexis inserted asNullable(UInt16)to avoid failures when NULLs appear.- tt.12 AS batch_index, + CAST(tt.13 AS Nullable(UInt16)) AS batch_index,
🧹 Nitpick comments (3)
internal/tools/clickhouse/0013_clickhouse_create_address_transfers_mv.sql (1)
1-1: Idempotent creation is fine; add a plan for definition drift.
IF NOT EXISTSwon’t update an existing MV if its query changes. Consider a follow-up migration that drops/recreates when the SELECT changes, or add a checksum comment to detect drift during deploys.internal/tools/clickhouse/0011_clickhouse_create_address_transactions_mv.sql (1)
1-1: Same note on MV drift withIF NOT EXISTS.If the MV definition ever changes,
IF NOT EXISTSwill mask it. Plan a DROP/CREATE on change or add a guard to verify the current definition matches.internal/tools/clickhouse/0014_clickhouse_create_insert_token_transfers_mv.sql (1)
1-3: Backfill strategy for historical rows.
IF NOT EXISTS+ noPOPULATEmeans only future inserts flow. Add a one-time backfill:
INSERT INTO token_transfers SELECT ... FROM insert_null_block_data ARRAY JOIN token_transfers AS tt
using the same corrected projection as this MV.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (4)
internal/tools/clickhouse/0004_clickhouse_create_insert_null_table.sql(5 hunks)internal/tools/clickhouse/0011_clickhouse_create_address_transactions_mv.sql(1 hunks)internal/tools/clickhouse/0013_clickhouse_create_address_transfers_mv.sql(1 hunks)internal/tools/clickhouse/0014_clickhouse_create_insert_token_transfers_mv.sql(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- internal/tools/clickhouse/0004_clickhouse_create_insert_null_table.sql
🔇 Additional comments (1)
internal/tools/clickhouse/0014_clickhouse_create_insert_token_transfers_mv.sql (1)
4-19: All selected fields’ types align with thetoken_transferstable; no mismatches detected.
| tt.1 AS token_type, | ||
| tt.2 AS token_address, | ||
| tt.3 AS token_id, | ||
| tt.4 AS from_address, | ||
| tt.5 AS to_address, | ||
| tt.6 AS block_number, | ||
| tt.7 AS block_timestamp, | ||
| tt.8 AS transaction_hash, | ||
| tt.9 AS transaction_index, | ||
| tt.10 AS amount, | ||
| tt.11 AS log_index, | ||
| tt.12 AS batch_index, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tuple index misalignment: fields shifted by one → data corruption risk.
Per the new schema, the tuple layout starts with chain_id, so token_type should be tt.2, token_address tt.3, …, and batch_index tt.13. Current mapping is off-by-one. Fix as below; also cast batch_index to nullable.
- tt.1 AS token_type,
- tt.2 AS token_address,
- tt.3 AS token_id,
- tt.4 AS from_address,
- tt.5 AS to_address,
- tt.6 AS block_number,
- tt.7 AS block_timestamp,
- tt.8 AS transaction_hash,
- tt.9 AS transaction_index,
- tt.10 AS amount,
- tt.11 AS log_index,
- tt.12 AS batch_index,
+ tt.2 AS token_type,
+ tt.3 AS token_address,
+ tt.4 AS token_id,
+ tt.5 AS from_address,
+ tt.6 AS to_address,
+ tt.7 AS block_number,
+ tt.8 AS block_timestamp,
+ tt.9 AS transaction_hash,
+ tt.10 AS transaction_index,
+ tt.11 AS amount,
+ tt.12 AS log_index,
+ CAST(tt.13 AS Nullable(UInt16)) AS batch_index,🤖 Prompt for AI Agents
In
internal/tools/clickhouse/0014_clickhouse_create_insert_token_transfers_mv.sql
around lines 5 to 16, the tuple indexes are off by one because the new schema
prepends chain_id; update every tt.N by incrementing the index by 1 (so
token_type becomes tt.2, token_address tt.3, token_id tt.4, from_address tt.5,
to_address tt.6, block_number tt.7, block_timestamp tt.8, transaction_hash tt.9,
transaction_index tt.10, amount tt.11, log_index tt.12, batch_index tt.13) and
ensure batch_index is cast/defined as nullable per schema (replace non-null type
with its nullable equivalent).
Summary by CodeRabbit
New Features
Chores