Skip to content

Commit 985dc4a

Browse files
authored
How Fuse Engine Works (#1940)
1 parent b864b72 commit 985dc4a

File tree

3 files changed

+243
-0
lines changed

3 files changed

+243
-0
lines changed
Lines changed: 231 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,231 @@
1+
---
2+
title: How Fuse Engine Works
3+
---
4+
5+
## Fuse Engine
6+
7+
Fuse Engine is Databend's core storage engine, optimized for managing **petabyte-scale** data efficiently on **cloud object storage**. By default, tables created in Databend automatically use this engine (`ENGINE=FUSE`). Inspired by Git, its snapshot-based design enables powerful data versioning (like Time Travel) and provides **high query performance** through advanced pruning and indexing.
8+
9+
This document explains its core concepts and how it works.
10+
11+
12+
## Core Concepts
13+
14+
Fuse Engine organizes data using three core structures, mirroring Git:
15+
16+
* **Snapshots (Like Git Commits):** Immutable references defining the table's state at a point in time by pointing to specific Segments. Enables Time Travel.
17+
* **Segments (Like Git Trees):** Collections of Blocks with summary statistics used for fast data skipping (pruning). Can be shared across Snapshots.
18+
* **Blocks (Like Git Blobs):** Immutable data files (Parquet format) holding the actual rows and detailed column-level statistics for fine-grained pruning.
19+
20+
21+
```
22+
Table HEAD
23+
24+
25+
┌───────────────┐ ┌───────────────┐ ┌───────────────┐
26+
│ SEGMENT A │◄────│ SNAPSHOT 2 │────►│ SEGMENT B │
27+
│ │ │ Previous: │ │ │
28+
└───────┬───────┘ │ SNAPSHOT 1 │ └───────┬───────┘
29+
│ └───────────────┘ │
30+
│ │ │
31+
│ ▼ │
32+
│ ┌───────────────┐ │
33+
│ │ SNAPSHOT 1 │ │
34+
│ │ │ │
35+
│ └───────────────┘ │
36+
│ │
37+
▼ ▼
38+
┌───────────────┐ ┌───────────────┐
39+
│ BLOCK 1 │ │ BLOCK 2 │
40+
│ (cloud.txt) │ │(warehouse.txt)│
41+
└───────────────┘ └───────────────┘
42+
```
43+
44+
45+
46+
## How Writing Works
47+
48+
When you add data to a table, Fuse Engine creates a chain of objects. Let's walk through this process step by step:
49+
50+
### Step 1: Create a table
51+
52+
```sql
53+
CREATE TABLE git(file VARCHAR, content VARCHAR);
54+
```
55+
56+
At this point, the table exists but contains no data:
57+
58+
```
59+
(Empty table with no data)
60+
```
61+
62+
### Step 2: Insert first data
63+
64+
```sql
65+
INSERT INTO git VALUES('cloud.txt', '2022/05/06, Databend, Cloud');
66+
```
67+
68+
After the first insert, Fuse Engine creates the initial snapshot, segment, and block:
69+
70+
```
71+
Table HEAD
72+
73+
74+
┌───────────────┐
75+
│ SNAPSHOT 1 │
76+
│ │
77+
└───────┬───────┘
78+
79+
80+
┌───────────────┐
81+
│ SEGMENT A │
82+
│ │
83+
└───────┬───────┘
84+
85+
86+
┌───────────────┐
87+
│ BLOCK 1 │
88+
│ (cloud.txt) │
89+
└───────────────┘
90+
```
91+
92+
### Step 3: Insert more data
93+
94+
```sql
95+
INSERT INTO git VALUES('warehouse.txt', '2022/05/07, Databend, Warehouse');
96+
```
97+
98+
When we insert more data, Fuse Engine creates a new snapshot that references both the original segment and a new segment:
99+
100+
```
101+
Table HEAD
102+
103+
104+
┌───────────────┐ ┌───────────────┐ ┌───────────────┐
105+
│ SEGMENT A │◄────│ SNAPSHOT 2 │────►│ SEGMENT B │
106+
│ │ │ Previous: │ │ │
107+
└───────┬───────┘ │ SNAPSHOT 1 │ └───────┬───────┘
108+
│ └───────────────┘ │
109+
│ │ │
110+
│ ▼ │
111+
│ ┌───────────────┐ │
112+
│ │ SNAPSHOT 1 │ │
113+
│ │ │ │
114+
│ └───────────────┘ │
115+
│ │
116+
▼ ▼
117+
┌───────────────┐ ┌───────────────┐
118+
│ BLOCK 1 │ │ BLOCK 2 │
119+
│ (cloud.txt) │ │(warehouse.txt)│
120+
└───────────────┘ └───────────────┘
121+
```
122+
123+
## How Reading Works
124+
125+
When you query data, Fuse Engine uses smart pruning to find your data efficiently:
126+
127+
```
128+
Query: SELECT * FROM git WHERE file = 'cloud.txt';
129+
130+
Table HEAD
131+
132+
133+
┌───────────────┐ ┌───────────────┐ ┌───────────────┐
134+
│ SEGMENT A │◄────│ SNAPSHOT 2 │────►│ SEGMENT B │
135+
│ CHECK │ │ │ │ CHECK │
136+
└───────┬───────┘ └───────────────┘ └───────────────┘
137+
│ ✗
138+
│ (Skip - doesn't contain
139+
│ 'cloud.txt')
140+
141+
┌───────────────┐
142+
│ BLOCK 1 │
143+
│ CHECK │
144+
└───────┬───────┘
145+
146+
│ ✓ (Contains 'cloud.txt')
147+
148+
Read this block
149+
```
150+
151+
### Smart Pruning Process
152+
153+
```
154+
┌─────────────────────────────────────────┐
155+
│ Query: WHERE file = 'cloud.txt' │
156+
└─────────────────┬───────────────────────┘
157+
158+
159+
┌─────────────────────────────────────────┐
160+
│ Check SEGMENT A │
161+
│ Min file value: 'cloud.txt' │
162+
│ Max file value: 'cloud.txt' │
163+
│ │
164+
│ Result: ✓ Might contain 'cloud.txt' │
165+
└─────────────────┬───────────────────────┘
166+
167+
168+
┌─────────────────────────────────────────┐
169+
│ Check SEGMENT B │
170+
│ Min file value: 'warehouse.txt' │
171+
│ Max file value: 'warehouse.txt' │
172+
│ │
173+
│ Result: ✗ Cannot contain 'cloud.txt' │
174+
└─────────────────┬───────────────────────┘
175+
176+
177+
┌─────────────────────────────────────────┐
178+
│ Check BLOCK 1 in SEGMENT A │
179+
│ Min file value: 'cloud.txt' │
180+
│ Max file value: 'cloud.txt' │
181+
│ │
182+
│ Result: ✓ Contains 'cloud.txt' │
183+
└─────────────────┬───────────────────────┘
184+
185+
186+
┌─────────────────────────────────────────┐
187+
│ Read only BLOCK 1 │
188+
└─────────────────────────────────────────┘
189+
```
190+
191+
## Snapshot-Based Features
192+
193+
Fuse Engine's snapshot architecture enables powerful data management capabilities:
194+
195+
### Time Travel
196+
197+
Query data as it existed at any point in time. Enables data branching, tagging, and governance with complete audit trails and error recovery.
198+
199+
### Zero-Copy Schema Evolution
200+
201+
Modify your table's structure (add columns, drop columns, rename, change types) **without rewriting any underlying data files**.
202+
203+
- Changes are metadata-only operations recorded in new Snapshots.
204+
- This is instantaneous, requires no downtime, and avoids costly data migration tasks. Older data remains accessible with its original schema.
205+
206+
207+
## Advanced Indexing for Query Acceleration (Fuse Engine)
208+
209+
Beyond basic block/segment pruning using statistics, Fuse Engine offers specialized secondary indexes to further accelerate specific query patterns:
210+
211+
| Index Type | Brief Description | Accelerates Queries Like... | Example Query Snippet |
212+
| :------------------ | :-------------------------------------------------------- | :-------------------------------------------------- | :-------------------------------------- |
213+
| **Aggregate Index** | Pre-computes aggregate results for specified groups | Faster `COUNT`, `SUM`, `AVG`... + `GROUP BY` | `SELECT COUNT(*)... GROUP BY city` |
214+
| **Full-Text Index** | Inverted index for fast keyword search within text | Text search using `MATCH` (e.g., logs) | `WHERE MATCH(log_entry, 'error')` |
215+
| **JSON Index** | Indexes specific paths/keys within JSON documents | Filtering on specific JSON paths/values | `WHERE event_data:user.id = 123` |
216+
| **Bloom Filter Index** | Probabilistic check to quickly skip non-matching blocks | Fast point lookups (`=`) & `IN` list filtering | `WHERE user_id = 'xyz'` |
217+
218+
219+
220+
## Comparison: Databend Fuse Engine vs. Apache Iceberg
221+
222+
_**Note:** This comparison focuses specifically on **table format features**. As Databend's native table format, Fuse evolves, aiming to improve **usability and performance**. Features shown are current; expect changes._
223+
224+
| Feature | Apache Iceberg | Databend Fuse Engine |
225+
| :---------------------- | :--------------------------------- | :----------------------------------- |
226+
| **Metadata Structure** | Manifest Lists -> Manifest Files -> Data Files | **Snapshot** -> Segments -> Blocks |
227+
| **Statistics Levels** | File-level (+Partition) | **Multi-level** (Snapshot, Segment, Block) → Finer pruning |
228+
| **Pruning Power** | Good (File/Partition stats) | **Excellent** (Multi-level stats + Secondary indexes) |
229+
| **Schema Evolution** | Supported (Metadata change) | **Zero-Copy** (Metadata-only, Instant) |
230+
| **Data Clustering** | Sorting (On write) | **Automatic** Optimization (Background) |
231+
| **Streaming Support** | Basic streaming ingestion | **Advanced Incremental** (Insert/Update tracking) |
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
{
2+
"label": "How Databend Works"
3+
}
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
---
2+
title: How Databend Works
3+
---
4+
5+
Technical deep-dive into Databend's architecture, storage engine, and query execution.
6+
7+
import IndexOverviewList from '@site/src/components/IndexOverviewList';
8+
9+
<IndexOverviewList />

0 commit comments

Comments
 (0)