Skip to content

Commit 07632a3

Browse files
authored
Merge pull request #1147 from Altinity/export_part_docs
Export merge tree part docs
2 parents 7ba4cce + d88fc4e commit 07632a3

File tree

3 files changed

+217
-1
lines changed

3 files changed

+217
-1
lines changed
Lines changed: 160 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,160 @@
1+
# ALTER TABLE EXPORT PART
2+
3+
## Overview
4+
5+
The `ALTER TABLE EXPORT PART` command exports individual MergeTree data parts to object storage (S3, Azure Blob Storage, etc.), typically in Parquet format.
6+
7+
**Key Characteristics:**
8+
- **Experimental feature** - must be enabled via `allow_experimental_export_merge_tree_part` setting
9+
- **Asynchronous** - executes in the background, returns immediately
10+
- **Ephemeral** - no automatic retry mechanism; manual retry required on failure
11+
- **Idempotent** - safe to re-export the same part (skips by default if file exists)
12+
- **Preserves sort order** from the source table
13+
14+
## Syntax
15+
16+
```sql
17+
ALTER TABLE [database.]table_name
18+
EXPORT PART 'part_name'
19+
TO TABLE [destination_database.]destination_table
20+
SETTINGS allow_experimental_export_merge_tree_part = 1
21+
[, setting_name = value, ...]
22+
```
23+
24+
### Parameters
25+
26+
- **`table_name`**: The source MergeTree table containing the part to export
27+
- **`part_name`**: The exact name of the data part to export (e.g., `'2020_1_1_0'`, `'all_1_1_0'`)
28+
- **`destination_table`**: The target table for the export (typically an S3, Azure, or other object storage table)
29+
30+
## Requirements
31+
32+
Source and destination tables must be 100% compatible:
33+
34+
1. **Identical schemas** - same columns, types, and order
35+
2. **Matching partition keys** - partition expressions must be identical
36+
37+
## Settings
38+
39+
### `allow_experimental_export_merge_tree_part` (Required)
40+
41+
- **Type**: `Bool`
42+
- **Default**: `false`
43+
- **Description**: Must be set to `true` to enable the experimental feature.
44+
45+
### `export_merge_tree_part_overwrite_file_if_exists` (Optional)
46+
47+
- **Type**: `Bool`
48+
- **Default**: `false`
49+
- **Description**: If set to `true`, it will overwrite the file. Otherwise, fails with exception.
50+
51+
## Examples
52+
53+
### Basic Export to S3
54+
55+
```sql
56+
-- Create source and destination tables
57+
CREATE TABLE mt_table (id UInt64, year UInt16)
58+
ENGINE = MergeTree() PARTITION BY year ORDER BY tuple();
59+
60+
CREATE TABLE s3_table (id UInt64, year UInt16)
61+
ENGINE = S3(s3_conn, filename='data', format=Parquet, partition_strategy='hive')
62+
PARTITION BY year;
63+
64+
-- Insert and export
65+
INSERT INTO mt_table VALUES (1, 2020), (2, 2020), (3, 2021);
66+
67+
ALTER TABLE mt_table EXPORT PART '2020_1_1_0' TO TABLE s3_table
68+
SETTINGS allow_experimental_export_merge_tree_part = 1;
69+
70+
ALTER TABLE mt_table EXPORT PART '2021_2_2_0' TO TABLE s3_table
71+
SETTINGS allow_experimental_export_merge_tree_part = 1;
72+
```
73+
74+
## Monitoring
75+
76+
### Active Exports
77+
78+
Active exports can be found in the `system.exports` table. As of now, it only shows currently executing exports. It will not show pending or finished exports.
79+
80+
```sql
81+
arthur :) select * from system.exports;
82+
83+
SELECT *
84+
FROM system.exports
85+
86+
Query id: 2026718c-d249-4208-891b-a271f1f93407
87+
88+
Row 1:
89+
──────
90+
source_database: default
91+
source_table: source_mt_table
92+
destination_database: default
93+
destination_table: destination_table
94+
create_time: 2025-11-19 09:09:11
95+
part_name: 20251016-365_1_1_0
96+
destination_file_path: table_root/eventDate=2025-10-16/retention=365/20251016-365_1_1_0_17B2F6CD5D3C18E787C07AE3DAF16EB1.parquet
97+
elapsed: 2.04845441
98+
rows_read: 1138688 -- 1.14 million
99+
total_rows_to_read: 550961374 -- 550.96 million
100+
total_size_bytes_compressed: 37619147120 -- 37.62 billion
101+
total_size_bytes_uncompressed: 138166213721 -- 138.17 billion
102+
bytes_read_uncompressed: 316892925 -- 316.89 million
103+
memory_usage: 596006095 -- 596.01 million
104+
peak_memory_usage: 601239033 -- 601.24 million
105+
```
106+
107+
### Export History
108+
109+
You can query succeeded or failed exports in `system.part_log`. For now, it only keeps track of completion events (either success or fails).
110+
111+
```sql
112+
arthur :) select * from system.part_log where event_type='ExportPart' and table = 'replicated_source' order by event_time desc limit 1;
113+
114+
SELECT *
115+
FROM system.part_log
116+
WHERE (event_type = 'ExportPart') AND (`table` = 'replicated_source')
117+
ORDER BY event_time DESC
118+
LIMIT 1
119+
120+
Query id: ae1c1cd3-c20e-4f20-8b82-ed1f6af0237f
121+
122+
Row 1:
123+
──────
124+
hostname: arthur
125+
query_id:
126+
event_type: ExportPart
127+
merge_reason: NotAMerge
128+
merge_algorithm: Undecided
129+
event_date: 2025-11-19
130+
event_time: 2025-11-19 09:08:31
131+
event_time_microseconds: 2025-11-19 09:08:31.974701
132+
duration_ms: 4
133+
database: default
134+
table: replicated_source
135+
table_uuid: 78471c67-24f4-4398-9df5-ad0a6c3daf41
136+
part_name: 2021_0_0_0
137+
partition_id: 2021
138+
partition: 2021
139+
part_type: Compact
140+
disk_name: default
141+
path_on_disk: year=2021/2021_0_0_0_78C704B133D41CB0EF64DD2A9ED3B6BA.parquet
142+
rows: 1
143+
size_in_bytes: 272
144+
merged_from: ['2021_0_0_0']
145+
bytes_uncompressed: 86
146+
read_rows: 1
147+
read_bytes: 6
148+
peak_memory_usage: 22
149+
error: 0
150+
exception:
151+
ProfileEvents: {}
152+
```
153+
154+
### Profile Events
155+
156+
- `PartsExports` - Successful exports
157+
- `PartsExportFailures` - Failed exports
158+
- `PartsExportDuplicated` - Number of part exports that failed because target already exists.
159+
- `PartsExportTotalMilliseconds` - Total time
160+
Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
---
2+
description: 'System table containing information about in progress merge tree part exports'
3+
keywords: ['system table', 'exports', 'merge tree', 'part']
4+
slug: /operations/system-tables/exports
5+
title: 'system.exports'
6+
---
7+
8+
Contains information about in progress merge tree part exports
9+
10+
Columns:
11+
12+
- `source_database` ([String](/docs/en/sql-reference/data-types/string.md)) — Name of the source database.
13+
- `source_table` ([String](/docs/en/sql-reference/data-types/string.md)) — Name of the source table.
14+
- `destination_database` ([String](/docs/en/sql-reference/data-types/string.md)) — Name of the destination database.
15+
- `destination_table` ([String](/docs/en/sql-reference/data-types/string.md)) — Name of the destination table.
16+
- `create_time` ([DateTime](/docs/en/sql-reference/data-types/datetime.md)) — Date and time when the export command was received in the server.
17+
- `part_name` ([String](/docs/en/sql-reference/data-types/string.md)) — Name of the part.
18+
- `destination_file_path` ([String](/docs/en/sql-reference/data-types/string.md)) — File path relative to where the part is being exported to.
19+
- `elapsed` ([Float64](/docs/en/sql-reference/data-types/float.md)) — The time elapsed (in seconds) since the export started.
20+
- `rows_read` ([UInt64](/docs/en/sql-reference/data-types/int-uint.md)) — The number of rows read from the exported part.
21+
- `total_rows_to_read` ([UInt64](/docs/en/sql-reference/data-types/int-uint.md)) — The total number of rows to read from the exported part.
22+
- `total_size_bytes_compressed` ([UInt64](/docs/en/sql-reference/data-types/int-uint.md)) — The total size of the compressed data in the exported part.
23+
- `total_size_bytes_uncompressed` ([UInt64](/docs/en/sql-reference/data-types/int-uint.md)) — The total size of the uncompressed data in the exported part.
24+
- `bytes_read_uncompressed` ([UInt64](/docs/en/sql-reference/data-types/int-uint.md)) — The number of uncompressed bytes read from the exported part.
25+
- `memory_usage` ([UInt64](/docs/en/sql-reference/data-types/int-uint.md)) — Current memory usage in bytes for the export operation.
26+
- `peak_memory_usage` ([UInt64](/docs/en/sql-reference/data-types/int-uint.md)) — Peak memory usage in bytes during the export operation.
27+
28+
**Example**
29+
30+
```sql
31+
arthur :) select * from system.exports;
32+
33+
SELECT *
34+
FROM system.exports
35+
36+
Query id: 2026718c-d249-4208-891b-a271f1f93407
37+
38+
Row 1:
39+
──────
40+
source_database: default
41+
source_table: source_mt_table
42+
destination_database: default
43+
destination_table: destination_table
44+
create_time: 2025-11-19 09:09:11
45+
part_name: 20251016-365_1_1_0
46+
destination_file_path: table_root/eventDate=2025-10-16/retention=365/20251016-365_1_1_0_17B2F6CD5D3C18E787C07AE3DAF16EB1.parquet
47+
elapsed: 2.04845441
48+
rows_read: 1138688 -- 1.14 million
49+
total_rows_to_read: 550961374 -- 550.96 million
50+
total_size_bytes_compressed: 37619147120 -- 37.62 billion
51+
total_size_bytes_uncompressed: 138166213721 -- 138.17 billion
52+
bytes_read_uncompressed: 316892925 -- 316.89 million
53+
memory_usage: 596006095 -- 596.01 million
54+
peak_memory_usage: 601239033 -- 601.24 million
55+
```
56+

src/Storages/System/StorageSystemExports.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ ColumnsDescription StorageSystemExports::getColumnsDescription()
2020
{"source_table", std::make_shared<DataTypeString>(), "Name of the source table."},
2121
{"destination_database", std::make_shared<DataTypeString>(), "Name of the destination database."},
2222
{"destination_table", std::make_shared<DataTypeString>(), "Name of the destination table."},
23-
{"create_time", std::make_shared<DataTypeDateTime>(), "Date and time when the export command was submitted for execution."},
23+
{"create_time", std::make_shared<DataTypeDateTime>(), "Date and time when the export command was received in the server."},
2424
{"part_name", std::make_shared<DataTypeString>(), "Name of the part"},
2525
{"destination_file_path", std::make_shared<DataTypeString>(), "File path where the part is being exported."},
2626
{"elapsed", std::make_shared<DataTypeFloat64>(), "The time elapsed (in seconds) since the export started."},

0 commit comments

Comments
 (0)