Skip to content

Commit 8be22e4

Browse files
authored
[feat] add iceberg publish_changes and order by (#3172)
1 parent ab5bb33 commit 8be22e4

File tree

6 files changed

+501
-270
lines changed

6 files changed

+501
-270
lines changed

docs/lakehouse/catalogs/iceberg-catalog.mdx

Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1794,6 +1794,25 @@ For an Iceberg Database, you must first drop all tables under the database befor
17941794
);
17951795
```
17961796

1797+
Starting from version 4.1.0, Doris supports specifying sort columns when creating an Iceberg table. When writing data, the data will be sorted according to the specified sort columns to achieve better query performance.
1798+
1799+
```sql
1800+
CREATE TABLE ordered_table (
1801+
`id` int NULL,
1802+
`name` text NULL,
1803+
`score` double NULL,
1804+
`create_time` datetimev2(6) NULL
1805+
)
1806+
ORDER BY (`id` ASC NULLS FIRST, `score` DESC NULLS LAST)
1807+
PROPERTIES (
1808+
"write-format" = "parquet",
1809+
"write.parquet.compression-codec" = "zstd"
1810+
);
1811+
```
1812+
1813+
- If no sort columns are specified, no sorting will be performed during writes.
1814+
- The default sort order is ASC NULLS FIRST.
1815+
17971816
After creation, you can use the `SHOW CREATE TABLE` command to view the Iceberg table creation statement. For details about partition functions, see the [Partitioning](#) section.
17981817

17991818
* **Dropping Tables**
@@ -2542,6 +2561,51 @@ EXECUTE set_current_snapshot ("ref" = "v1.0");
25422561
3. The operation will fail if the specified snapshot ID or reference does not exist
25432562
4. If the current snapshot is already the target snapshot, the operation returns directly without creating a new snapshot
25442563

2564+
### publish_changes
2565+
2566+
The `publish_changes` operation is used in the WAP (Write-Audit-Publish) mode to publish a snapshot with the specified `wap.id` as the current table state.
2567+
It locates the snapshot whose `wap.id` matches the given `wap_id` and cherry-picks it onto the current table, making the staged data visible to all read operations.
2568+
2569+
**Syntax:**
2570+
2571+
```sql
2572+
ALTER TABLE [catalog.][database.]table_name
2573+
EXECUTE publish_changes("wap_id" = "<wap_id>")
2574+
```
2575+
2576+
**Parameters:**
2577+
2578+
**Parameters:**
2579+
2580+
| Parameter Name | Type | Required | Description |
2581+
| -------------- | ---- | -------- | ----------- |
2582+
| `wap_id` | STRING | Yes | The WAP snapshot ID to be published |
2583+
2584+
**Return Value:**
2585+
2586+
Executing `publish_changes` returns a result set with the following 2 columns:
2587+
2588+
| Column Name | Type | Description |
2589+
| ----------- | ---- | ----------- |
2590+
| `previous_snapshot_id` | STRING | The ID of the current snapshot before the publish operation (NULL if none) |
2591+
| `current_snapshot_id` | STRING | The ID of the new snapshot created and set as current after publishing |
2592+
2593+
**Examples:**
2594+
2595+
```sql
2596+
-- Publish the snapshot whose WAP ID is test_wap_001
2597+
ALTER TABLE iceberg_db.iceberg_table
2598+
EXECUTE publish_changes("wap_id" = "test_wap_001");
2599+
```
2600+
2601+
**Notes:**
2602+
2603+
1. This operation does not support a WHERE clause, nor PARTITION/PARTITIONS clauses
2604+
2. It is only meaningful for Iceberg tables with write.wap.enabled = true and WAP snapshots generated via wap.id
2605+
3. If no snapshot is found for the specified wap_id, the operation fails and throws an error
2606+
4. After publishing, the new snapshot becomes the current snapshot
2607+
5. If there is no snapshot before publishing, previous_snapshot_id may be NULL
2608+
25452609
## Iceberg Table Optimization
25462610

25472611
### View Data File Distribution

0 commit comments

Comments
 (0)