-
Notifications
You must be signed in to change notification settings - Fork 251
Description
1. Problem Statement
The current CKB database schema relies heavily on Block Hash as the primary key for storing block-related data (Headers, Bodies, Uncles, etc.). While Block Hash is unique and essential for verifying data integrity, using it as a key in RocksDB (an LSM-tree based storage) presents significant performance challenges:
- Random Writes: Block hashes are effectively random. Inserting blocks causes random write patterns, which are inefficient for LSM-trees that favor sequential writes.
- Write Amplification: Random insertions trigger frequent and expensive compaction cycles in RocksDB to sort and merge SSTables.
- Read Amplification: Scattering related data across many SSTables increases the overhead of point lookups and range scans.
2. Proposed Solution
The core proposal is to refactor the database schema to use Composite Keys based on Block Number (Big Endian) + Block Hash.
Why Block Number?
Block numbers are strictly sequential. By using the block number as the prefix of the key:
- Sequential Writes: New blocks are appended to the end of the key space. This aligns perfectly with RocksDB's append-only nature for MemTables and minimizes overlap in SSTables.
- Reduced Compaction: Sequential writes significantly reduce the need for rewrites during compaction, lowering Write Amplification.
- Data Locality: Blocks with similar heights are stored close together, improving cache efficiency and range scan performance.
3. Detailed Schema Changes
The refactoring introduces a new key structure for block-related Column Families.
3.1 New COLUMN_INDEX (Col 0)
This acts as the primary "index" to map random hashes to sequential numbers.
- Key:
Block Hash(32 bytes) - Value:
Block Number(8 bytes, Big Endian)Main Chain Flag(1 byte):0x01if on main chain,0x00otherwise.
- Benefit:
- Allows looking up the
Block Numberwhen only the hash is known. - Optimizes
is_main_chain(hash)checks to be O(1) in the same lookup.
- Allows looking up the
3.2 Block Data Columns (Cols 1, 2, 3, 6, 7, 8, 15, 17, 18)
These columns store the actual block content. They now use a composite key.
- Key Format:
Block Number (BE)+Block Hash - Affected Columns:
COLUMN_BLOCK_HEADER(1): Header + HashCOLUMN_BLOCK_BODY(2): TransactionsCOLUMN_BLOCK_UNCLE(3): Uncle BlocksCOLUMN_BLOCK_EXT(6): Block Extension (verified, total difficulty)COLUMN_BLOCK_PROPOSAL_IDS(7)COLUMN_BLOCK_EPOCH(8)COLUMN_BLOCK_EXTENSION(15)COLUMN_BLOCK_FILTER(17)COLUMN_BLOCK_FILTER_HASH(18)
3.3 Other Changes
COLUMN_NUMBER_HASH(13): Deprecated. The composite keys now naturally provide the number->hash mapping (and more, since it handles forks by storing all hashes for a number).- Unchanged Columns: Columns that don't key off blocks (e.g.,
COLUMN_META,COLUMN_CELL) remain largely unchanged or have minor adjustments.
4. Migration Strategy
(Not sure, Considering)
5. Benefits Summary
- Performance: Drastically improved write throughput and reduced latency for block synchronization.
- Resource Usage: Lower CPU and I/O usage due to reduced compaction overhead.