|
| 1 | +--- |
| 2 | +title: How Fuse Engine Works |
| 3 | +--- |
| 4 | + |
| 5 | +## Fuse Engine |
| 6 | + |
| 7 | +Fuse Engine is Databend's core storage engine, optimized for managing **petabyte-scale** data efficiently on **cloud object storage**. By default, tables created in Databend automatically use this engine (`ENGINE=FUSE`). Inspired by Git, its snapshot-based design enables powerful data versioning (like Time Travel) and provides **high query performance** through advanced pruning and indexing. |
| 8 | + |
| 9 | +This document explains its core concepts and how it works. |
| 10 | + |
| 11 | + |
| 12 | +## Core Concepts |
| 13 | + |
| 14 | +Fuse Engine organizes data using three core structures, mirroring Git: |
| 15 | + |
| 16 | +* **Snapshots (Like Git Commits):** Immutable references defining the table's state at a point in time by pointing to specific Segments. Enables Time Travel. |
| 17 | +* **Segments (Like Git Trees):** Collections of Blocks with summary statistics used for fast data skipping (pruning). Can be shared across Snapshots. |
| 18 | +* **Blocks (Like Git Blobs):** Immutable data files (Parquet format) holding the actual rows and detailed column-level statistics for fine-grained pruning. |
| 19 | + |
| 20 | + |
| 21 | +``` |
| 22 | + Table HEAD |
| 23 | + │ |
| 24 | + ▼ |
| 25 | + ┌───────────────┐ ┌───────────────┐ ┌───────────────┐ |
| 26 | + │ SEGMENT A │◄────│ SNAPSHOT 2 │────►│ SEGMENT B │ |
| 27 | + │ │ │ Previous: │ │ │ |
| 28 | + └───────┬───────┘ │ SNAPSHOT 1 │ └───────┬───────┘ |
| 29 | + │ └───────────────┘ │ |
| 30 | + │ │ │ |
| 31 | + │ ▼ │ |
| 32 | + │ ┌───────────────┐ │ |
| 33 | + │ │ SNAPSHOT 1 │ │ |
| 34 | + │ │ │ │ |
| 35 | + │ └───────────────┘ │ |
| 36 | + │ │ |
| 37 | + ▼ ▼ |
| 38 | + ┌───────────────┐ ┌───────────────┐ |
| 39 | + │ BLOCK 1 │ │ BLOCK 2 │ |
| 40 | + │ (cloud.txt) │ │(warehouse.txt)│ |
| 41 | + └───────────────┘ └───────────────┘ |
| 42 | +``` |
| 43 | + |
| 44 | + |
| 45 | + |
| 46 | +## How Writing Works |
| 47 | + |
| 48 | +When you add data to a table, Fuse Engine creates a chain of objects. Let's walk through this process step by step: |
| 49 | + |
| 50 | +### Step 1: Create a table |
| 51 | + |
| 52 | +```sql |
| 53 | +CREATE TABLE git(file VARCHAR, content VARCHAR); |
| 54 | +``` |
| 55 | + |
| 56 | +At this point, the table exists but contains no data: |
| 57 | + |
| 58 | +``` |
| 59 | +(Empty table with no data) |
| 60 | +``` |
| 61 | + |
| 62 | +### Step 2: Insert first data |
| 63 | + |
| 64 | +```sql |
| 65 | +INSERT INTO git VALUES('cloud.txt', '2022/05/06, Databend, Cloud'); |
| 66 | +``` |
| 67 | + |
| 68 | +After the first insert, Fuse Engine creates the initial snapshot, segment, and block: |
| 69 | + |
| 70 | +``` |
| 71 | + Table HEAD |
| 72 | + │ |
| 73 | + ▼ |
| 74 | + ┌───────────────┐ |
| 75 | + │ SNAPSHOT 1 │ |
| 76 | + │ │ |
| 77 | + └───────┬───────┘ |
| 78 | + │ |
| 79 | + ▼ |
| 80 | + ┌───────────────┐ |
| 81 | + │ SEGMENT A │ |
| 82 | + │ │ |
| 83 | + └───────┬───────┘ |
| 84 | + │ |
| 85 | + ▼ |
| 86 | + ┌───────────────┐ |
| 87 | + │ BLOCK 1 │ |
| 88 | + │ (cloud.txt) │ |
| 89 | + └───────────────┘ |
| 90 | +``` |
| 91 | + |
| 92 | +### Step 3: Insert more data |
| 93 | + |
| 94 | +```sql |
| 95 | +INSERT INTO git VALUES('warehouse.txt', '2022/05/07, Databend, Warehouse'); |
| 96 | +``` |
| 97 | + |
| 98 | +When we insert more data, Fuse Engine creates a new snapshot that references both the original segment and a new segment: |
| 99 | + |
| 100 | +``` |
| 101 | + Table HEAD |
| 102 | + │ |
| 103 | + ▼ |
| 104 | + ┌───────────────┐ ┌───────────────┐ ┌───────────────┐ |
| 105 | + │ SEGMENT A │◄────│ SNAPSHOT 2 │────►│ SEGMENT B │ |
| 106 | + │ │ │ Previous: │ │ │ |
| 107 | + └───────┬───────┘ │ SNAPSHOT 1 │ └───────┬───────┘ |
| 108 | + │ └───────────────┘ │ |
| 109 | + │ │ │ |
| 110 | + │ ▼ │ |
| 111 | + │ ┌───────────────┐ │ |
| 112 | + │ │ SNAPSHOT 1 │ │ |
| 113 | + │ │ │ │ |
| 114 | + │ └───────────────┘ │ |
| 115 | + │ │ |
| 116 | + ▼ ▼ |
| 117 | + ┌───────────────┐ ┌───────────────┐ |
| 118 | + │ BLOCK 1 │ │ BLOCK 2 │ |
| 119 | + │ (cloud.txt) │ │(warehouse.txt)│ |
| 120 | + └───────────────┘ └───────────────┘ |
| 121 | +``` |
| 122 | + |
| 123 | +## How Reading Works |
| 124 | + |
| 125 | +When you query data, Fuse Engine uses smart pruning to find your data efficiently: |
| 126 | + |
| 127 | +``` |
| 128 | +Query: SELECT * FROM git WHERE file = 'cloud.txt'; |
| 129 | +
|
| 130 | + Table HEAD |
| 131 | + │ |
| 132 | + ▼ |
| 133 | + ┌───────────────┐ ┌───────────────┐ ┌───────────────┐ |
| 134 | + │ SEGMENT A │◄────│ SNAPSHOT 2 │────►│ SEGMENT B │ |
| 135 | + │ CHECK │ │ │ │ CHECK │ |
| 136 | + └───────┬───────┘ └───────────────┘ └───────────────┘ |
| 137 | + │ ✗ |
| 138 | + │ (Skip - doesn't contain |
| 139 | + │ 'cloud.txt') |
| 140 | + ▼ |
| 141 | + ┌───────────────┐ |
| 142 | + │ BLOCK 1 │ |
| 143 | + │ CHECK │ |
| 144 | + └───────┬───────┘ |
| 145 | + │ |
| 146 | + │ ✓ (Contains 'cloud.txt') |
| 147 | + ▼ |
| 148 | + Read this block |
| 149 | +``` |
| 150 | + |
| 151 | +### Smart Pruning Process |
| 152 | + |
| 153 | +``` |
| 154 | +┌─────────────────────────────────────────┐ |
| 155 | +│ Query: WHERE file = 'cloud.txt' │ |
| 156 | +└─────────────────┬───────────────────────┘ |
| 157 | + │ |
| 158 | + ▼ |
| 159 | +┌─────────────────────────────────────────┐ |
| 160 | +│ Check SEGMENT A │ |
| 161 | +│ Min file value: 'cloud.txt' │ |
| 162 | +│ Max file value: 'cloud.txt' │ |
| 163 | +│ │ |
| 164 | +│ Result: ✓ Might contain 'cloud.txt' │ |
| 165 | +└─────────────────┬───────────────────────┘ |
| 166 | + │ |
| 167 | + ▼ |
| 168 | +┌─────────────────────────────────────────┐ |
| 169 | +│ Check SEGMENT B │ |
| 170 | +│ Min file value: 'warehouse.txt' │ |
| 171 | +│ Max file value: 'warehouse.txt' │ |
| 172 | +│ │ |
| 173 | +│ Result: ✗ Cannot contain 'cloud.txt' │ |
| 174 | +└─────────────────┬───────────────────────┘ |
| 175 | + │ |
| 176 | + ▼ |
| 177 | +┌─────────────────────────────────────────┐ |
| 178 | +│ Check BLOCK 1 in SEGMENT A │ |
| 179 | +│ Min file value: 'cloud.txt' │ |
| 180 | +│ Max file value: 'cloud.txt' │ |
| 181 | +│ │ |
| 182 | +│ Result: ✓ Contains 'cloud.txt' │ |
| 183 | +└─────────────────┬───────────────────────┘ |
| 184 | + │ |
| 185 | + ▼ |
| 186 | +┌─────────────────────────────────────────┐ |
| 187 | +│ Read only BLOCK 1 │ |
| 188 | +└─────────────────────────────────────────┘ |
| 189 | +``` |
| 190 | + |
| 191 | +## Snapshot-Based Features |
| 192 | + |
| 193 | +Fuse Engine's snapshot architecture enables powerful data management capabilities: |
| 194 | + |
| 195 | +### Time Travel |
| 196 | + |
| 197 | +Query data as it existed at any point in time. Enables data branching, tagging, and governance with complete audit trails and error recovery. |
| 198 | + |
| 199 | +### Zero-Copy Schema Evolution |
| 200 | + |
| 201 | +Modify your table's structure (add columns, drop columns, rename, change types) **without rewriting any underlying data files**. |
| 202 | + |
| 203 | +- Changes are metadata-only operations recorded in new Snapshots. |
| 204 | +- This is instantaneous, requires no downtime, and avoids costly data migration tasks. Older data remains accessible with its original schema. |
| 205 | + |
| 206 | + |
| 207 | +## Advanced Indexing for Query Acceleration (Fuse Engine) |
| 208 | + |
| 209 | +Beyond basic block/segment pruning using statistics, Fuse Engine offers specialized secondary indexes to further accelerate specific query patterns: |
| 210 | + |
| 211 | +| Index Type | Brief Description | Accelerates Queries Like... | Example Query Snippet | |
| 212 | +| :------------------ | :-------------------------------------------------------- | :-------------------------------------------------- | :-------------------------------------- | |
| 213 | +| **Aggregate Index** | Pre-computes aggregate results for specified groups | Faster `COUNT`, `SUM`, `AVG`... + `GROUP BY` | `SELECT COUNT(*)... GROUP BY city` | |
| 214 | +| **Full-Text Index** | Inverted index for fast keyword search within text | Text search using `MATCH` (e.g., logs) | `WHERE MATCH(log_entry, 'error')` | |
| 215 | +| **JSON Index** | Indexes specific paths/keys within JSON documents | Filtering on specific JSON paths/values | `WHERE event_data:user.id = 123` | |
| 216 | +| **Bloom Filter Index** | Probabilistic check to quickly skip non-matching blocks | Fast point lookups (`=`) & `IN` list filtering | `WHERE user_id = 'xyz'` | |
| 217 | + |
| 218 | + |
| 219 | + |
| 220 | +## Comparison: Databend Fuse Engine vs. Apache Iceberg |
| 221 | + |
| 222 | +_**Note:** This comparison focuses specifically on **table format features**. As Databend's native table format, Fuse evolves, aiming to improve **usability and performance**. Features shown are current; expect changes._ |
| 223 | + |
| 224 | +| Feature | Apache Iceberg | Databend Fuse Engine | |
| 225 | +| :---------------------- | :--------------------------------- | :----------------------------------- | |
| 226 | +| **Metadata Structure** | Manifest Lists -> Manifest Files -> Data Files | **Snapshot** -> Segments -> Blocks | |
| 227 | +| **Statistics Levels** | File-level (+Partition) | **Multi-level** (Snapshot, Segment, Block) → Finer pruning | |
| 228 | +| **Pruning Power** | Good (File/Partition stats) | **Excellent** (Multi-level stats + Secondary indexes) | |
| 229 | +| **Schema Evolution** | Supported (Metadata change) | **Zero-Copy** (Metadata-only, Instant) | |
| 230 | +| **Data Clustering** | Sorting (On write) | **Automatic** Optimization (Background) | |
| 231 | +| **Streaming Support** | Basic streaming ingestion | **Advanced Incremental** (Insert/Update tracking) | |
0 commit comments