Skip to content

Commit 7649388

Browse files
authored
Merge branch 'antalya-26.1' into feature/antalya-26.1/json_part2
2 parents e15af07 + b806ac5 commit 7649388

File tree

134 files changed

+4520
-740
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

134 files changed

+4520
-740
lines changed

docs/en/antalya/swarm.md

Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
# Antalya branch
2+
3+
## Swarm
4+
5+
### Difference with upstream version
6+
7+
#### `storage_type` argument in object storage functions
8+
9+
In upstream ClickHouse, there are several table functions to read Iceberg tables from different storage backends such as `icebergLocal`, `icebergS3`, `icebergAzure`, `icebergHDFS`, cluster variants, the `iceberg` function as a synonym for `icebergS3`, and table engines like `IcebergLocal`, `IcebergS3`, `IcebergAzure`, `IcebergHDFS`.
10+
11+
In the Antalya branch, the `iceberg` table function and the `Iceberg` table engine unify all variants into one by using a new named argument, `storage_type`, which can be one of `local`, `s3`, `azure`, or `hdfs`.
12+
13+
Old syntax examples:
14+
15+
```sql
16+
SELECT * FROM icebergS3('http://minio1:9000/root/table_data', 'minio', 'minio123', 'Parquet');
17+
SELECT * FROM icebergAzureCluster('mycluster', 'http://azurite1:30000/devstoreaccount1', 'cont', '/table_data', 'devstoreaccount1', 'Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==', 'Parquet');
18+
CREATE TABLE mytable ENGINE=IcebergHDFS('/table_data', 'Parquet');
19+
```
20+
21+
New syntax examples:
22+
23+
```sql
24+
SELECT * FROM iceberg(storage_type='s3', 'http://minio1:9000/root/table_data', 'minio', 'minio123', 'Parquet');
25+
SELECT * FROM icebergCluster('mycluster', storage_type='azure', 'http://azurite1:30000/devstoreaccount1', 'cont', '/table_data', 'devstoreaccount1', 'Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==', 'Parquet');
26+
CREATE TABLE mytable ENGINE=Iceberg('/table_data', 'Parquet', storage_type='hdfs');
27+
```
28+
29+
Also, if a named collection is used to store access parameters, the field `storage_type` can be included in the same named collection:
30+
31+
```xml
32+
<named_collections>
33+
<s3>
34+
<url>http://minio1:9001/root/</url>
35+
<access_key_id>minio</access_key_id>
36+
<secret_access_key>minio123</secret_access_key>
37+
<storage_type>s3</storage_type>
38+
</s3>
39+
</named_collections>
40+
```
41+
42+
```sql
43+
SELECT * FROM iceberg(s3, filename='table_data');
44+
```
45+
46+
By default `storage_type` is `'s3'` to maintain backward compatibility.
47+
48+
49+
#### `object_storage_cluster` setting
50+
51+
The new setting `object_storage_cluster` controls whether a single-node or cluster variant of table functions reading from object storage (e.g., `s3`, `azure`, `iceberg`, and their cluster variants like `s3Cluster`, `azureCluster`, `icebergCluster`) is used.
52+
53+
Old syntax examples:
54+
55+
```sql
56+
SELECT * from s3Cluster('myCluster', 'http://minio1:9001/root/data/{clickhouse,database}/*', 'minio', 'minio123', 'CSV',
57+
'name String, value UInt32, polygon Array(Array(Tuple(Float64, Float64)))');
58+
SELECT * FROM icebergAzureCluster('mycluster', 'http://azurite1:30000/devstoreaccount1', 'cont', '/table_data', 'devstoreaccount1', 'Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==', 'Parquet');
59+
```
60+
61+
New syntax examples:
62+
63+
```sql
64+
SELECT * from s3('http://minio1:9001/root/data/{clickhouse,database}/*', 'minio', 'minio123', 'CSV',
65+
'name String, value UInt32, polygon Array(Array(Tuple(Float64, Float64)))')
66+
SETTINGS object_storage_cluster='myCluster';
67+
SELECT * FROM icebergAzure('http://azurite1:30000/devstoreaccount1', 'cont', '/table_data', 'devstoreaccount1', 'Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==', 'Parquet')
68+
SETTINGS object_storage_cluster='myCluster';
69+
```
70+
71+
This setting also applies to table engines and can be used with tables managed by Iceberg Catalog.
72+
73+
Note: The upstream ClickHouse has introduced analogous settings, such as `parallel_replicas_for_cluster_engines` and `cluster_for_parallel_replicas`. Since version 25.10, these settings work with table engines. It is possible that in the future, the `object_storage_cluster` setting will be deprecated.

docs/en/engines/table-engines/integrations/iceberg.md

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -293,6 +293,62 @@ CREATE TABLE example_table ENGINE = Iceberg(
293293

294294
`Iceberg` table engine and table function support metadata cache storing the information of manifest files, manifest list and metadata json. The cache is stored in memory. This feature is controlled by setting `use_iceberg_metadata_files_cache`, which is enabled by default.
295295

296+
## Altinity Antalya branch
297+
298+
### Specify storage type in arguments
299+
300+
Only in the Altinity Antalya branch does `Iceberg` table engine support all storage types. The storage type can be specified using the named argument `storage_type`. Supported values are `s3`, `azure`, `hdfs`, and `local`.
301+
302+
```sql
303+
CREATE TABLE iceberg_table_s3
304+
ENGINE = Iceberg(storage_type='s3', url, [, NOSIGN | access_key_id, secret_access_key, [session_token]], format, [,compression])
305+
306+
CREATE TABLE iceberg_table_azure
307+
ENGINE = Iceberg(storage_type='azure', connection_string|storage_account_url, container_name, blobpath, [account_name, account_key, format, compression])
308+
309+
CREATE TABLE iceberg_table_hdfs
310+
ENGINE = Iceberg(storage_type='hdfs', path_to_table, [,format] [,compression_method])
311+
312+
CREATE TABLE iceberg_table_local
313+
ENGINE = Iceberg(storage_type='local', path_to_table, [,format] [,compression_method])
314+
```
315+
316+
### Specify storage type in named collection
317+
318+
Only in Altinity Antalya branch `storage_type` can be included as part of a named collection. This allows for centralized configuration of storage settings.
319+
320+
```xml
321+
<clickhouse>
322+
<named_collections>
323+
<iceberg_conf>
324+
<url>http://test.s3.amazonaws.com/clickhouse-bucket/</url>
325+
<access_key_id>test<access_key_id>
326+
<secret_access_key>test</secret_access_key>
327+
<format>auto</format>
328+
<structure>auto</structure>
329+
<storage_type>s3</storage_type>
330+
</iceberg_conf>
331+
</named_collections>
332+
</clickhouse>
333+
```
334+
335+
```sql
336+
CREATE TABLE iceberg_table ENGINE=Iceberg(iceberg_conf, filename = 'test_table')
337+
```
338+
339+
The default value for `storage_type` is `s3`.
340+
341+
### The `object_storage_cluster` setting.
342+
343+
Only in the Altinity Antalya branch is an alternative syntax for the `Iceberg` table engine available. This syntax allows execution on a cluster when the `object_storage_cluster` setting is non-empty and contains the cluster name.
344+
345+
```sql
346+
CREATE TABLE iceberg_table_s3
347+
ENGINE = Iceberg(storage_type='s3', url, [, NOSIGN | access_key_id, secret_access_key, [session_token]], format, [,compression]);
348+
349+
SELECT * FROM iceberg_table_s3 SETTINGS object_storage_cluster='cluster_simple';
350+
```
351+
296352
## See also {#see-also}
297353

298354
- [iceberg table function](/sql-reference/table-functions/iceberg.md)

docs/en/engines/table-engines/mergetree-family/part_export.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -84,6 +84,13 @@ In case a table function is used as the destination, the schema can be omitted a
8484
- **Default**: `true`
8585
- **Description**: If set to true, throws if pending patch parts exists for a given part. Note that by default mutations are applied to all parts, which means that if a mutation in practice would only affetct part/partition x, all the other parts/partition will throw upon export. The exception is when the `IN PARTITION` clause was used in the mutation command. Note the `IN PARTITION` clause is not properly implemented for plain MergeTree tables.
8686

87+
### export_merge_tree_part_filename_pattern
88+
89+
- **Type**: `String`
90+
- **Default**: `{part_name}_{checksum}`
91+
- **Description**: Pattern for the filename of the exported merge tree part. The `part_name` and `checksum` are calculated and replaced on the fly. Additional macros are supported.
92+
93+
8794
## Examples
8895

8996
### Basic Export to S3

docs/en/engines/table-engines/mergetree-family/partition_export.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -82,6 +82,12 @@ TO TABLE [destination_database.]destination_table
8282
- **Default**: `true`
8383
- **Description**: If set to true, throws if pending patch parts exists for a given part. Note that by default mutations are applied to all parts, which means that if a mutation in practice would only affetct part/partition x, all the other parts/partition will throw upon export. The exception is when the `IN PARTITION` clause was used in the mutation command. Note the `IN PARTITION` clause is not properly implemented for plain MergeTree tables.
8484

85+
### export_merge_tree_part_filename_pattern
86+
87+
- **Type**: `String`
88+
- **Default**: `{part_name}_{checksum}`
89+
- **Description**: Pattern for the filename of the exported merge tree part. The `part_name` and `checksum` are calculated and replaced on the fly. Additional macros are supported.
90+
8591
## Examples
8692

8793
### Basic Export to S3
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
# Task distribution in *Cluster family functions
2+
3+
## Task distribution algorithm
4+
5+
Table functions such as `s3Cluster`, `azureBlobStorageCluster`, `hdsfCluster`, `icebergCluster`, and table engines like `S3`, `Azure`, `HDFS`, `Iceberg` with the setting `object_storage_cluster` distribute tasks across all cluster nodes or a subset limited by the `object_storage_max_nodes` setting. This setting limits the number of nodes involved in processing a distributed query, randomly selecting nodes for each query.
6+
7+
A single task corresponds to processing one source file.
8+
9+
For each file, one cluster node is selected as the primary node using a consistent Rendezvous Hashing algorithm. This algorithm guarantees that:
10+
* The same node is consistently selected as primary for each file, as long as the cluster remains unchanged.
11+
* When the cluster changes (nodes added or removed), only files assigned to those affected nodes change their primary node assignment.
12+
13+
This improves cache efficiency by minimizing data movement among nodes.
14+
15+
## `lock_object_storage_task_distribution_ms` setting
16+
17+
Each node begins processing files for which it is the primary node. After completing its assigned files, a node may take tasks from other nodes, either immediately or after waiting for `lock_object_storage_task_distribution_ms` milliseconds if the primary node does not request new files during that interval. The default value of `lock_object_storage_task_distribution_ms` is 500 milliseconds. This setting balances between caching efficiency and workload redistribution when nodes are imbalanced.
18+
19+
## `SYSTEM STOP SWARM MODE` command
20+
21+
If a node needs to shut down gracefully, the command `SYSTEM STOP SWARM MODE` prevents the node from receiving new tasks for *Cluster-family queries. The node finishes processing already assigned files before it can safely shut down without errors.
22+
23+
Receiving new tasks can be resumed with the command `SYSTEM START SWARM MODE`.

docs/en/sql-reference/table-functions/azureBlobStorageCluster.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,20 @@ SELECT count(*) FROM azureBlobStorageCluster(
5454

5555
See [azureBlobStorage](/sql-reference/table-functions/azureBlobStorage#using-shared-access-signatures-sas-sas-tokens) for examples.
5656

57+
## Altinity Antalya branch
58+
59+
### `object_storage_cluster` setting.
60+
61+
Only in the Altinity Antalya branch, the alternative syntax for the `azureBlobStorageCluster` table function is avilable. This allows the `azureBlobStorage` function to be used with the non-empty `object_storage_cluster` setting, specifying a cluster name. This enables distributed queries over Azure Blob Storage across a ClickHouse cluster.
62+
63+
```sql
64+
SELECT count(*) FROM azureBlobStorage(
65+
'http://azurite1:10000/devstoreaccount1', 'testcontainer', 'test_cluster_count.csv', 'devstoreaccount1',
66+
'Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==', 'CSV',
67+
'auto', 'key UInt64')
68+
SETTINGS object_storage_cluster='cluster_simple'
69+
```
70+
5771
## Related {#related}
5872

5973
- [AzureBlobStorage engine](../../engines/table-engines/integrations/azureBlobStorage.md)

docs/en/sql-reference/table-functions/deltalakeCluster.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,17 @@ A table with the specified structure for reading data from cluster in the specif
4545
- `_time` — Last modified time of the file. Type: `Nullable(DateTime)`. If the time is unknown, the value is `NULL`.
4646
- `_etag` — The etag of the file. Type: `LowCardinality(String)`. If the etag is unknown, the value is `NULL`.
4747

48+
## Altinity Antalya branch
49+
50+
### `object_storage_cluster` setting.
51+
52+
Only in the Altinity Antalya branch alternative syntax for `deltaLakeCluster` table function is available. This allows the `deltaLake` function to be used with the non-empty `object_storage_cluster` setting, specifying a cluster name. This enables distributed queries over Delta Lake Storage across a ClickHouse cluster.
53+
54+
```sql
55+
SELECT count(*) FROM deltaLake(url [,aws_access_key_id, aws_secret_access_key] [,format] [,structure] [,compression])
56+
SETTINGS object_storage_cluster='cluster_simple'
57+
```
58+
4859
## Related {#related}
4960

5061
- [deltaLake engine](engines/table-engines/integrations/deltalake.md)

docs/en/sql-reference/table-functions/hdfsCluster.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -60,6 +60,18 @@ FROM hdfsCluster('cluster_simple', 'hdfs://hdfs1:9000/{some,another}_dir/*', 'TS
6060
If your listing of files contains number ranges with leading zeros, use the construction with braces for each digit separately or use `?`.
6161
:::
6262

63+
## Altinity Antalya branch
64+
65+
### `object_storage_cluster` setting.
66+
67+
Only in the Altinity Antalya branch alternative syntax for `hdfsCluster` table function is available. This allows the `hdfs` function to be used with the non-empty `object_storage_cluster` setting, specifying a cluster name. This enables distributed queries over HDFS Storage across a ClickHouse cluster.
68+
69+
```sql
70+
SELECT count(*)
71+
FROM hdfs('hdfs://hdfs1:9000/{some,another}_dir/*', 'TSV', 'name String, value UInt32')
72+
SETTINGS object_storage_cluster='cluster_simple'
73+
```
74+
6375
## Related {#related}
6476

6577
- [HDFS engine](../../engines/table-engines/integrations/hdfs.md)

docs/en/sql-reference/table-functions/hudiCluster.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,18 @@ A table with the specified structure for reading data from cluster in the specif
4343
- `_time` — Last modified time of the file. Type: `Nullable(DateTime)`. If the time is unknown, the value is `NULL`.
4444
- `_etag` — The etag of the file. Type: `LowCardinality(String)`. If the etag is unknown, the value is `NULL`.
4545

46+
## Altinity Antalya branch
47+
48+
### `object_storage_cluster` setting.
49+
50+
Only in the Altinity Antalya branch alternative syntax for `hudiCluster` table function is available. This allows the `hudi` function to be used with the non-empty `object_storage_cluster` setting, specifying a cluster name. This enables distributed queries over Hudi Storage across a ClickHouse cluster.
51+
52+
```sql
53+
SELECT *
54+
FROM hudi(url [,aws_access_key_id, aws_secret_access_key] [,format] [,structure] [,compression])
55+
SETTINGS object_storage_cluster='cluster_simple'
56+
```
57+
4658
## Related {#related}
4759

4860
- [Hudi engine](engines/table-engines/integrations/hudi.md)

docs/en/sql-reference/table-functions/iceberg.md

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -502,6 +502,48 @@ x: Ivanov
502502
y: 993
503503
```
504504

505+
506+
## Altinity Antalya branch
507+
508+
### Specify storage type in arguments
509+
510+
Only in the Altinity Antalya branch does the `iceberg` table function support all storage types. The storage type can be specified using the named argument `storage_type`. Supported values are `s3`, `azure`, `hdfs`, and `local`.
511+
512+
```sql
513+
iceberg(storage_type='s3', url [, NOSIGN | access_key_id, secret_access_key, [session_token]] [,format] [,compression_method])
514+
515+
iceberg(storage_type='azure', connection_string|storage_account_url, container_name, blobpath, [,account_name], [,account_key] [,format] [,compression_method])
516+
517+
iceberg(storage_type='hdfs', path_to_table, [,format] [,compression_method])
518+
519+
iceberg(storage_type='local', path_to_table, [,format] [,compression_method])
520+
```
521+
522+
### Specify storage type in named collection
523+
524+
Only in the Altinity Antalya branch can storage_type be included as part of a named collection. This allows for centralized configuration of storage settings.
525+
526+
```xml
527+
<clickhouse>
528+
<named_collections>
529+
<iceberg_conf>
530+
<url>http://test.s3.amazonaws.com/clickhouse-bucket/</url>
531+
<access_key_id>test<access_key_id>
532+
<secret_access_key>test</secret_access_key>
533+
<format>auto</format>
534+
<structure>auto</structure>
535+
<storage_type>s3</storage_type>
536+
</iceberg_conf>
537+
</named_collections>
538+
</clickhouse>
539+
```
540+
541+
```sql
542+
iceberg(named_collection[, option=value [,..]])
543+
```
544+
545+
The default value for `storage_type` is `s3`.
546+
505547
## See Also {#see-also}
506548

507549
* [Iceberg engine](/engines/table-engines/integrations/iceberg.md)

0 commit comments

Comments
 (0)