Skip to content

Commit 1a1d51a

Browse files
authored
Merge pull request #1165 from Altinity/feature/antalya-25.8/docs
Documentation for Swarm features in Antalya branch
2 parents 1620e0c + 439b7e6 commit 1a1d51a

File tree

10 files changed

+334
-0
lines changed

10 files changed

+334
-0
lines changed

docs/en/antalya/swarm.md

Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
# Antalya branch
2+
3+
## Swarm
4+
5+
### Difference with upstream version
6+
7+
#### `storage_type` argument in object storage functions
8+
9+
In upstream ClickHouse, there are several table functions to read Iceberg tables from different storage backends such as `icebergLocal`, `icebergS3`, `icebergAzure`, `icebergHDFS`, cluster variants, the `iceberg` function as a synonym for `icebergS3`, and table engines like `IcebergLocal`, `IcebergS3`, `IcebergAzure`, `IcebergHDFS`.
10+
11+
In the Antalya branch, the `iceberg` table function and the `Iceberg` table engine unify all variants into one by using a new named argument, `storage_type`, which can be one of `local`, `s3`, `azure`, or `hdfs`.
12+
13+
Old syntax examples:
14+
15+
```sql
16+
SELECT * FROM icebergS3('http://minio1:9000/root/table_data', 'minio', 'minio123', 'Parquet');
17+
SELECT * FROM icebergAzureCluster('mycluster', 'http://azurite1:30000/devstoreaccount1', 'cont', '/table_data', 'devstoreaccount1', 'Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==', 'Parquet');
18+
CREATE TABLE mytable ENGINE=IcebergHDFS('/table_data', 'Parquet');
19+
```
20+
21+
New syntax examples:
22+
23+
```sql
24+
SELECT * FROM iceberg(storage_type='s3', 'http://minio1:9000/root/table_data', 'minio', 'minio123', 'Parquet');
25+
SELECT * FROM icebergCluster('mycluster', storage_type='azure', 'http://azurite1:30000/devstoreaccount1', 'cont', '/table_data', 'devstoreaccount1', 'Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==', 'Parquet');
26+
CREATE TABLE mytable ENGINE=Iceberg('/table_data', 'Parquet', storage_type='hdfs');
27+
```
28+
29+
Also, if a named collection is used to store access parameters, the field `storage_type` can be included in the same named collection:
30+
31+
```xml
32+
<named_collections>
33+
<s3>
34+
<url>http://minio1:9001/root/</url>
35+
<access_key_id>minio</access_key_id>
36+
<secret_access_key>minio123</secret_access_key>
37+
<storage_type>s3</storage_type>
38+
</s3>
39+
</named_collections>
40+
```
41+
42+
```sql
43+
SELECT * FROM iceberg(s3, filename='table_data');
44+
```
45+
46+
By default `storage_type` is `'s3'` to maintain backward compatibility.
47+
48+
49+
#### `object_storage_cluster` setting
50+
51+
The new setting `object_storage_cluster` controls whether a single-node or cluster variant of table functions reading from object storage (e.g., `s3`, `azure`, `iceberg`, and their cluster variants like `s3Cluster`, `azureCluster`, `icebergCluster`) is used.
52+
53+
Old syntax examples:
54+
55+
```sql
56+
SELECT * from s3Cluster('myCluster', 'http://minio1:9001/root/data/{clickhouse,database}/*', 'minio', 'minio123', 'CSV',
57+
'name String, value UInt32, polygon Array(Array(Tuple(Float64, Float64)))');
58+
SELECT * FROM icebergAzureCluster('mycluster', 'http://azurite1:30000/devstoreaccount1', 'cont', '/table_data', 'devstoreaccount1', 'Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==', 'Parquet');
59+
```
60+
61+
New syntax examples:
62+
63+
```sql
64+
SELECT * from s3('http://minio1:9001/root/data/{clickhouse,database}/*', 'minio', 'minio123', 'CSV',
65+
'name String, value UInt32, polygon Array(Array(Tuple(Float64, Float64)))')
66+
SETTINGS object_storage_cluster='myCluster';
67+
SELECT * FROM icebergAzure('http://azurite1:30000/devstoreaccount1', 'cont', '/table_data', 'devstoreaccount1', 'Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==', 'Parquet')
68+
SETTINGS object_storage_cluster='myCluster';
69+
```
70+
71+
This setting also applies to table engines and can be used with tables managed by Iceberg Catalog.
72+
73+
Note: The upstream ClickHouse has introduced analogous settings, such as `parallel_replicas_for_cluster_engines` and `cluster_for_parallel_replicas`. Since version 25.10, these settings work with table engines. It is possible that in the future, the `object_storage_cluster` setting will be deprecated.

docs/en/engines/table-engines/integrations/iceberg.md

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -296,6 +296,62 @@ CREATE TABLE example_table ENGINE = Iceberg(
296296

297297
`Iceberg` table engine and table function support metadata cache storing the information of manifest files, manifest list and metadata json. The cache is stored in memory. This feature is controlled by setting `use_iceberg_metadata_files_cache`, which is enabled by default.
298298

299+
## Altinity Antalya branch
300+
301+
### Specify storage type in arguments
302+
303+
Only in the Altinity Antalya branch does `Iceberg` table engine support all storage types. The storage type can be specified using the named argument `storage_type`. Supported values are `s3`, `azure`, `hdfs`, and `local`.
304+
305+
```sql
306+
CREATE TABLE iceberg_table_s3
307+
ENGINE = Iceberg(storage_type='s3', url, [, NOSIGN | access_key_id, secret_access_key, [session_token]], format, [,compression])
308+
309+
CREATE TABLE iceberg_table_azure
310+
ENGINE = Iceberg(storage_type='azure', connection_string|storage_account_url, container_name, blobpath, [account_name, account_key, format, compression])
311+
312+
CREATE TABLE iceberg_table_hdfs
313+
ENGINE = Iceberg(storage_type='hdfs', path_to_table, [,format] [,compression_method])
314+
315+
CREATE TABLE iceberg_table_local
316+
ENGINE = Iceberg(storage_type='local', path_to_table, [,format] [,compression_method])
317+
```
318+
319+
### Specify storage type in named collection
320+
321+
Only in Altinity Antalya branch `storage_type` can be included as part of a named collection. This allows for centralized configuration of storage settings.
322+
323+
```xml
324+
<clickhouse>
325+
<named_collections>
326+
<iceberg_conf>
327+
<url>http://test.s3.amazonaws.com/clickhouse-bucket/</url>
328+
<access_key_id>test<access_key_id>
329+
<secret_access_key>test</secret_access_key>
330+
<format>auto</format>
331+
<structure>auto</structure>
332+
<storage_type>s3</storage_type>
333+
</iceberg_conf>
334+
</named_collections>
335+
</clickhouse>
336+
```
337+
338+
```sql
339+
CREATE TABLE iceberg_table ENGINE=Iceberg(iceberg_conf, filename = 'test_table')
340+
```
341+
342+
The default value for `storage_type` is `s3`.
343+
344+
### The `object_storage_cluster` setting.
345+
346+
Only in the Altinity Antalya branch is an alternative syntax for the `Iceberg` table engine available. This syntax allows execution on a cluster when the `object_storage_cluster` setting is non-empty and contains the cluster name.
347+
348+
```sql
349+
CREATE TABLE iceberg_table_s3
350+
ENGINE = Iceberg(storage_type='s3', url, [, NOSIGN | access_key_id, secret_access_key, [session_token]], format, [,compression]);
351+
352+
SELECT * FROM iceberg_table_s3 SETTINGS object_storage_cluster='cluster_simple';
353+
```
354+
299355
## See also {#see-also}
300356

301357
- [iceberg table function](/sql-reference/table-functions/iceberg.md)
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
# Task distribution in *Cluster family functions
2+
3+
## Task distribution algorithm
4+
5+
Table functions such as `s3Cluster`, `azureBlobStorageCluster`, `hdsfCluster`, `icebergCluster`, and table engines like `S3`, `Azure`, `HDFS`, `Iceberg` with the setting `object_storage_cluster` distribute tasks across all cluster nodes or a subset limited by the `object_storage_max_nodes` setting. This setting limits the number of nodes involved in processing a distributed query, randomly selecting nodes for each query.
6+
7+
A single task corresponds to processing one source file.
8+
9+
For each file, one cluster node is selected as the primary node using a consistent Rendezvous Hashing algorithm. This algorithm guarantees that:
10+
* The same node is consistently selected as primary for each file, as long as the cluster remains unchanged.
11+
* When the cluster changes (nodes added or removed), only files assigned to those affected nodes change their primary node assignment.
12+
13+
This improves cache efficiency by minimizing data movement among nodes.
14+
15+
## `lock_object_storage_task_distribution_ms` setting
16+
17+
Each node begins processing files for which it is the primary node. After completing its assigned files, a node may take tasks from other nodes, either immediately or after waiting for `lock_object_storage_task_distribution_ms` milliseconds if the primary node does not request new files during that interval. The default value of `lock_object_storage_task_distribution_ms` is 500 milliseconds. This setting balances between caching efficiency and workload redistribution when nodes are imbalanced.
18+
19+
## `SYSTEM STOP SWARM MODE` command
20+
21+
If a node needs to shut down gracefully, the command `SYSTEM STOP SWARM MODE` prevents the node from receiving new tasks for *Cluster-family queries. The node finishes processing already assigned files before it can safely shut down without errors.
22+
23+
Receiving new tasks can be resumed with the command `SYSTEM START SWARM MODE`.

docs/en/sql-reference/table-functions/azureBlobStorageCluster.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,20 @@ SELECT count(*) FROM azureBlobStorageCluster(
5353

5454
See [azureBlobStorage](/sql-reference/table-functions/azureBlobStorage#using-shared-access-signatures-sas-sas-tokens) for examples.
5555

56+
## Altinity Antalya branch
57+
58+
### `object_storage_cluster` setting.
59+
60+
Only in the Altinity Antalya branch, the alternative syntax for the `azureBlobStorageCluster` table function is avilable. This allows the `azureBlobStorage` function to be used with the non-empty `object_storage_cluster` setting, specifying a cluster name. This enables distributed queries over Azure Blob Storage across a ClickHouse cluster.
61+
62+
```sql
63+
SELECT count(*) FROM azureBlobStorage(
64+
'http://azurite1:10000/devstoreaccount1', 'testcontainer', 'test_cluster_count.csv', 'devstoreaccount1',
65+
'Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==', 'CSV',
66+
'auto', 'key UInt64')
67+
SETTINGS object_storage_cluster='cluster_simple'
68+
```
69+
5670
## Related {#related}
5771

5872
- [AzureBlobStorage engine](../../engines/table-engines/integrations/azureBlobStorage.md)

docs/en/sql-reference/table-functions/deltalakeCluster.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,17 @@ A table with the specified structure for reading data from cluster in the specif
3636
- `_time` — Last modified time of the file. Type: `Nullable(DateTime)`. If the time is unknown, the value is `NULL`.
3737
- `_etag` — The etag of the file. Type: `LowCardinality(String)`. If the etag is unknown, the value is `NULL`.
3838

39+
## Altinity Antalya branch
40+
41+
### `object_storage_cluster` setting.
42+
43+
Only in the Altinity Antalya branch alternative syntax for `deltaLakeCluster` table function is available. This allows the `deltaLake` function to be used with the non-empty `object_storage_cluster` setting, specifying a cluster name. This enables distributed queries over Delta Lake Storage across a ClickHouse cluster.
44+
45+
```sql
46+
SELECT count(*) FROM deltaLake(url [,aws_access_key_id, aws_secret_access_key] [,format] [,structure] [,compression])
47+
SETTINGS object_storage_cluster='cluster_simple'
48+
```
49+
3950
## Related {#related}
4051

4152
- [deltaLake engine](engines/table-engines/integrations/deltalake.md)

docs/en/sql-reference/table-functions/hdfsCluster.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,18 @@ FROM hdfsCluster('cluster_simple', 'hdfs://hdfs1:9000/{some,another}_dir/*', 'TS
5959
If your listing of files contains number ranges with leading zeros, use the construction with braces for each digit separately or use `?`.
6060
:::
6161

62+
## Altinity Antalya branch
63+
64+
### `object_storage_cluster` setting.
65+
66+
Only in the Altinity Antalya branch alternative syntax for `hdfsCluster` table function is available. This allows the `hdfs` function to be used with the non-empty `object_storage_cluster` setting, specifying a cluster name. This enables distributed queries over HDFS Storage across a ClickHouse cluster.
67+
68+
```sql
69+
SELECT count(*)
70+
FROM hdfs('hdfs://hdfs1:9000/{some,another}_dir/*', 'TSV', 'name String, value UInt32')
71+
SETTINGS object_storage_cluster='cluster_simple'
72+
```
73+
6274
## Related {#related}
6375

6476
- [HDFS engine](../../engines/table-engines/integrations/hdfs.md)

docs/en/sql-reference/table-functions/hudiCluster.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,18 @@ A table with the specified structure for reading data from cluster in the specif
4242
- `_time` — Last modified time of the file. Type: `Nullable(DateTime)`. If the time is unknown, the value is `NULL`.
4343
- `_etag` — The etag of the file. Type: `LowCardinality(String)`. If the etag is unknown, the value is `NULL`.
4444

45+
## Altinity Antalya branch
46+
47+
### `object_storage_cluster` setting.
48+
49+
Only in the Altinity Antalya branch alternative syntax for `hudiCluster` table function is available. This allows the `hudi` function to be used with the non-empty `object_storage_cluster` setting, specifying a cluster name. This enables distributed queries over Hudi Storage across a ClickHouse cluster.
50+
51+
```sql
52+
SELECT *
53+
FROM hudi(url [,aws_access_key_id, aws_secret_access_key] [,format] [,structure] [,compression])
54+
SETTINGS object_storage_cluster='cluster_simple'
55+
```
56+
4557
## Related {#related}
4658

4759
- [Hudi engine](engines/table-engines/integrations/hudi.md)

docs/en/sql-reference/table-functions/iceberg.md

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -302,6 +302,47 @@ Table function `iceberg` is an alias to `icebergS3` now.
302302
- `_time` — Last modified time of the file. Type: `Nullable(DateTime)`. If the time is unknown, the value is `NULL`.
303303
- `_etag` — The etag of the file. Type: `LowCardinality(String)`. If the etag is unknown, the value is `NULL`.
304304

305+
## Altinity Antalya branch
306+
307+
### Specify storage type in arguments
308+
309+
Only in the Altinity Antalya branch does the `iceberg` table function support all storage types. The storage type can be specified using the named argument `storage_type`. Supported values are `s3`, `azure`, `hdfs`, and `local`.
310+
311+
```sql
312+
iceberg(storage_type='s3', url [, NOSIGN | access_key_id, secret_access_key, [session_token]] [,format] [,compression_method])
313+
314+
iceberg(storage_type='azure', connection_string|storage_account_url, container_name, blobpath, [,account_name], [,account_key] [,format] [,compression_method])
315+
316+
iceberg(storage_type='hdfs', path_to_table, [,format] [,compression_method])
317+
318+
iceberg(storage_type='local', path_to_table, [,format] [,compression_method])
319+
```
320+
321+
### Specify storage type in named collection
322+
323+
Only in the Altinity Antalya branch can storage_type be included as part of a named collection. This allows for centralized configuration of storage settings.
324+
325+
```xml
326+
<clickhouse>
327+
<named_collections>
328+
<iceberg_conf>
329+
<url>http://test.s3.amazonaws.com/clickhouse-bucket/</url>
330+
<access_key_id>test<access_key_id>
331+
<secret_access_key>test</secret_access_key>
332+
<format>auto</format>
333+
<structure>auto</structure>
334+
<storage_type>s3</storage_type>
335+
</iceberg_conf>
336+
</named_collections>
337+
</clickhouse>
338+
```
339+
340+
```sql
341+
iceberg(named_collection[, option=value [,..]])
342+
```
343+
344+
The default value for `storage_type` is `s3`.
345+
305346
## See Also {#see-also}
306347

307348
* [Iceberg engine](/engines/table-engines/integrations/iceberg.md)

docs/en/sql-reference/table-functions/icebergCluster.md

Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,81 @@ SELECT * FROM icebergS3Cluster('cluster_simple', 'http://test.s3.amazonaws.com/c
4949
- `_time` — Last modified time of the file. Type: `Nullable(DateTime)`. If the time is unknown, the value is `NULL`.
5050
- `_etag` — The etag of the file. Type: `LowCardinality(String)`. If the etag is unknown, the value is `NULL`.
5151

52+
## Altinity Antalya branch
53+
54+
### `icebergLocalCluster` table function
55+
56+
Only in the Altinity Antalya branch, `icebergLocalCluster` designed to make distributed cluster queries when Iceberg data is stored on shared network storage mounted with a local path. The path must be identical on all replicas.
57+
58+
```sql
59+
icebergLocalCluster(cluster_name, path_to_table, [,format] [,compression_method])
60+
```
61+
62+
### Specify storage type in function arguments
63+
64+
Only in the Altinity Antalya branch, the `icebergCluster` table function supports all storage backends. The storage backend can be specified using the named argument `storage_type`. Valid values include `s3`, `azure`, `hdfs`, and `local`.
65+
66+
```sql
67+
icebergCluster(storage_type='s3', cluster_name, url [, NOSIGN | access_key_id, secret_access_key, [session_token]] [,format] [,compression_method])
68+
69+
icebergCluster(storage_type='azure', cluster_name, connection_string|storage_account_url, container_name, blobpath, [,account_name], [,account_key] [,format] [,compression_method])
70+
71+
icebergCluster(storage_type='hdfs', cluster_name, path_to_table, [,format] [,compression_method])
72+
73+
icebergCluster(storage_type='local', cluster_name, path_to_table, [,format] [,compression_method])
74+
```
75+
76+
### Specify storage type in a named collection
77+
78+
Only in the Altinity Antalya branch, `storage_type` can be part of a named collection.
79+
80+
```xml
81+
<clickhouse>
82+
<named_collections>
83+
<iceberg_conf>
84+
<url>http://test.s3.amazonaws.com/clickhouse-bucket/</url>
85+
<access_key_id>test</access_key_id>
86+
<secret_access_key>test</secret_access_key>
87+
<format>auto</format>
88+
<structure>auto</structure>
89+
<storage_type>s3</storage_type>
90+
</iceberg_conf>
91+
</named_collections>
92+
</clickhouse>
93+
```
94+
95+
```sql
96+
icebergCluster(iceberg_conf[, option=value [,..]])
97+
```
98+
99+
The default value for `storage_type` is `s3`.
100+
101+
### `object_storage_cluster` setting.
102+
103+
Only in the Altinity Antalya branch, an alternative syntax for `icebergCluster` table function is available. This allows the `iceberg` function to be used with the non-empty `object_storage_cluster` setting, specifying a cluster name. This enables distributed queries over Iceberg table across a ClickHouse cluster.
104+
105+
```sql
106+
icebergS3(url [, NOSIGN | access_key_id, secret_access_key, [session_token]] [,format] [,compression_method]) SETTINGS object_storage_cluster='cluster_name'
107+
108+
icebergAzure(connection_string|storage_account_url, container_name, blobpath, [,account_name], [,account_key] [,format] [,compression_method]) SETTINGS object_storage_cluster='cluster_name'
109+
110+
icebergHDFS(path_to_table, [,format] [,compression_method]) SETTINGS object_storage_cluster='cluster_name'
111+
112+
icebergLocal(path_to_table, [,format] [,compression_method]) SETTINGS object_storage_cluster='cluster_name'
113+
114+
icebergS3(option=value [,..]) SETTINGS object_storage_cluster='cluster_name'
115+
116+
iceberg(storage_type='s3', url [, NOSIGN | access_key_id, secret_access_key, [session_token]] [,format] [,compression_method]) SETTINGS object_storage_cluster='cluster_name'
117+
118+
iceberg(storage_type='azure', connection_string|storage_account_url, container_name, blobpath, [,account_name], [,account_key] [,format] [,compression_method]) SETTINGS object_storage_cluster='cluster_name'
119+
120+
iceberg(storage_type='hdfs', path_to_table, [,format] [,compression_method]) SETTINGS object_storage_cluster='cluster_name'
121+
122+
iceberg(storage_type='local', path_to_table, [,format] [,compression_method]) SETTINGS object_storage_cluster='cluster_name'
123+
124+
iceberg(iceberg_conf[, option=value [,..]]) SETTINGS object_storage_cluster='cluster_name'
125+
```
126+
52127
**See Also**
53128

54129
- [Iceberg engine](/engines/table-engines/integrations/iceberg.md)

docs/en/sql-reference/table-functions/s3Cluster.md

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -90,6 +90,23 @@ Users can use the same approaches as document for the s3 function [here](/sql-re
9090

9191
For details on optimizing the performance of the s3 function see [our detailed guide](/integrations/s3/performance).
9292

93+
## Altinity Antalya branch
94+
95+
### `object_storage_cluster` setting.
96+
97+
Only in the Altinity Antalya branch alternative syntax for `s3Cluster` table function is available. This allows the `s3` function to be used with the non-empty `object_storage_cluster` setting, specifying a cluster name. This enables distributed queries over S3 Storage across a ClickHouse cluster.
98+
99+
```sql
100+
SELECT * FROM s3(
101+
'http://minio1:9001/root/data/{clickhouse,database}/*',
102+
'minio',
103+
'ClickHouse_Minio_P@ssw0rd',
104+
'CSV',
105+
'name String, value UInt32, polygon Array(Array(Tuple(Float64, Float64)))'
106+
) ORDER BY (name, value, polygon)
107+
SETTINGS object_storage_cluster='cluster_simple'
108+
```
109+
93110
## Related {#related}
94111

95112
- [S3 engine](../../engines/table-engines/integrations/s3.md)

0 commit comments

Comments
 (0)