Design Draft: Refactor RocksDB Schema for Reduced Read/Write Amplification


## 1. Problem Statement

The current CKB database [schema](https://github.com/nervosnetwork/ckb/blob/develop/db-schema/src/lib.rs) relies heavily on `Block Hash` as the primary key for storing block-related data (Headers, Bodies, Uncles, etc.). While `Block Hash` is unique and essential for verifying data integrity, using it as a key in RocksDB (an LSM-tree based storage) presents significant performance challenges:

*   **Random Writes**: Block hashes are effectively random. Inserting blocks causes random write patterns, which are inefficient for LSM-trees that favor sequential writes.
*   **Write Amplification**: Random insertions trigger frequent and expensive compaction cycles in RocksDB to sort and merge SSTables.
*   **Read Amplification**: Scattering related data across many SSTables increases the overhead of point lookups and range scans.

## 2. Proposed Solution

The core proposal is to refactor the database schema to use **Composite Keys** based on `Block Number` (Big Endian) + `Block Hash`.

### Why Block Number?
Block numbers are strictly sequential. By using the block number as the prefix of the key:
1.  **Sequential Writes**: New blocks are appended to the end of the key space. This aligns perfectly with RocksDB's append-only nature for MemTables and minimizes overlap in SSTables.
2.  **Reduced Compaction**: Sequential writes significantly reduce the need for rewrites during compaction, lowering Write Amplification.
3.  **Data Locality**: Blocks with similar heights are stored close together, improving cache efficiency and range scan performance.

## 3. Detailed Schema Changes

The refactoring introduces a new key structure for block-related Column Families.

### 3.1 New `COLUMN_INDEX` (Col 0)
This acts as the primary "index" to map random hashes to sequential numbers.

*   **Key**: `Block Hash` (32 bytes)
*   **Value**:
    *   `Block Number` (8 bytes, Big Endian)
    *   `Main Chain Flag` (1 byte): `0x01` if on main chain, `0x00` otherwise.
*   **Benefit**:
    *   Allows looking up the `Block Number` when only the hash is known.
    *   Optimizes `is_main_chain(hash)` checks to be O(1) in the same lookup.

### 3.2 Block Data Columns (Cols 1, 2, 3, 6, 7, 8, 15, 17, 18)
These columns store the actual block content. They now use a composite key.

*   **Key Format**: `Block Number (BE)` + `Block Hash`
*   **Affected Columns**:
    *   `COLUMN_BLOCK_HEADER` (1): Header + Hash
    *   `COLUMN_BLOCK_BODY` (2): Transactions
    *   `COLUMN_BLOCK_UNCLE` (3): Uncle Blocks
    *   `COLUMN_BLOCK_EXT` (6): Block Extension (verified, total difficulty)
    *   `COLUMN_BLOCK_PROPOSAL_IDS` (7)
    *   `COLUMN_BLOCK_EPOCH` (8)
    *   `COLUMN_BLOCK_EXTENSION` (15)
    *   `COLUMN_BLOCK_FILTER` (17)
    *   `COLUMN_BLOCK_FILTER_HASH` (18)

### 3.3 Other Changes
*   **`COLUMN_NUMBER_HASH` (13)**: Deprecated. The composite keys now naturally provide the number->hash mapping (and more, since it handles forks by storing all hashes for a number).
*   **Unchanged Columns**: Columns that don't key off blocks (e.g., `COLUMN_META`, `COLUMN_CELL`) remain largely unchanged or have minor adjustments.

## 4. Migration Strategy
  (Not sure, Considering)

## 5. Benefits Summary
*   **Performance**: Drastically improved write throughput and reduced latency for block synchronization.
*   **Resource Usage**: Lower CPU and I/O usage due to reduced compaction overhead.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Design Draft: Refactor RocksDB Schema for Reduced Read/Write Amplification #5087

1. Problem Statement

2. Proposed Solution

Why Block Number?

3. Detailed Schema Changes

3.1 New `COLUMN_INDEX` (Col 0)

3.2 Block Data Columns (Cols 1, 2, 3, 6, 7, 8, 15, 17, 18)

3.3 Other Changes

4. Migration Strategy

5. Benefits Summary

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Design Draft: Refactor RocksDB Schema for Reduced Read/Write Amplification #5087

Description

1. Problem Statement

2. Proposed Solution

Why Block Number?

3. Detailed Schema Changes

3.1 New COLUMN_INDEX (Col 0)

3.2 Block Data Columns (Cols 1, 2, 3, 6, 7, 8, 15, 17, 18)

3.3 Other Changes

4. Migration Strategy

5. Benefits Summary

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

3.1 New `COLUMN_INDEX` (Col 0)