You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
#### `storage_type` argument in object storage functions
8
+
9
+
In upstream ClickHouse, there are several table functions to read Iceberg tables from different storage backends such as `icebergLocal`, `icebergS3`, `icebergAzure`, `icebergHDFS`, cluster variants, the `iceberg` function as a synonym for `icebergS3`, and table engines like `IcebergLocal`, `IcebergS3`, `IcebergAzure`, `IcebergHDFS`.
10
+
11
+
In the Antalya branch, the `iceberg` table function and the `Iceberg` table engine unify all variants into one by using a new named argument, `storage_type`, which can be one of `local`, `s3`, `azure`, or `hdfs`.
Also, if a named collection is used to store access parameters, the field `storage_type` can be included in the same named collection:
30
+
31
+
```xml
32
+
<named_collections>
33
+
<s3>
34
+
<url>http://minio1:9001/root/</url>
35
+
<access_key_id>minio</access_key_id>
36
+
<secret_access_key>minio123</secret_access_key>
37
+
<storage_type>s3</storage_type>
38
+
</s3>
39
+
</named_collections>
40
+
```
41
+
42
+
```sql
43
+
SELECT*FROM iceberg(s3, filename='table_data');
44
+
```
45
+
46
+
By default `storage_type` is `'s3'` to maintain backward compatibility.
47
+
48
+
49
+
#### `object_storage_cluster` setting
50
+
51
+
The new setting `object_storage_cluster` controls whether a single-node or cluster variant of table functions reading from object storage (e.g., `s3`, `azure`, `iceberg`, and their cluster variants like `s3Cluster`, `azureCluster`, `icebergCluster`) is used.
This setting also applies to table engines and can be used with tables managed by Iceberg Catalog.
72
+
73
+
Note: The upstream ClickHouse has introduced analogous settings, such as `parallel_replicas_for_cluster_engines` and `cluster_for_parallel_replicas`. Since version 25.10, these settings work with table engines. It is possible that in the future, the `object_storage_cluster` setting will be deprecated.
`Iceberg` table engine and table function support metadata cache storing the information of manifest files, manifest list and metadata json. The cache is stored in memory. This feature is controlled by setting `use_iceberg_metadata_files_cache`, which is enabled by default.
298
298
299
+
## Altinity Antalya branch
300
+
301
+
### Specify storage type in arguments
302
+
303
+
Only in the Altinity Antalya branch does `Iceberg` table engine support all storage types. The storage type can be specified using the named argument `storage_type`. Supported values are `s3`, `azure`, `hdfs`, and `local`.
Only in Altinity Antalya branch `storage_type` can be included as part of a named collection. This allows for centralized configuration of storage settings.
Only in the Altinity Antalya branch is an alternative syntax for the `Iceberg` table engine available. This syntax allows execution on a cluster when the `object_storage_cluster` setting is non-empty and contains the cluster name.
Table functions such as `s3Cluster`, `azureBlobStorageCluster`, `hdsfCluster`, `icebergCluster`, and table engines like `S3`, `Azure`, `HDFS`, `Iceberg` with the setting `object_storage_cluster` distribute tasks across all cluster nodes or a subset limited by the `object_storage_max_nodes` setting. This setting limits the number of nodes involved in processing a distributed query, randomly selecting nodes for each query.
6
+
7
+
A single task corresponds to processing one source file.
8
+
9
+
For each file, one cluster node is selected as the primary node using a consistent Rendezvous Hashing algorithm. This algorithm guarantees that:
10
+
* The same node is consistently selected as primary for each file, as long as the cluster remains unchanged.
11
+
* When the cluster changes (nodes added or removed), only files assigned to those affected nodes change their primary node assignment.
12
+
13
+
This improves cache efficiency by minimizing data movement among nodes.
Each node begins processing files for which it is the primary node. After completing its assigned files, a node may take tasks from other nodes, either immediately or after waiting for `lock_object_storage_task_distribution_ms` milliseconds if the primary node does not request new files during that interval. The default value of `lock_object_storage_task_distribution_ms` is 500 milliseconds. This setting balances between caching efficiency and workload redistribution when nodes are imbalanced.
18
+
19
+
## `SYSTEM STOP SWARM MODE` command
20
+
21
+
If a node needs to shut down gracefully, the command `SYSTEM STOP SWARM MODE` prevents the node from receiving new tasks for *Cluster-family queries. The node finishes processing already assigned files before it can safely shut down without errors.
22
+
23
+
Receiving new tasks can be resumed with the command `SYSTEM START SWARM MODE`.
Copy file name to clipboardExpand all lines: docs/en/sql-reference/table-functions/azureBlobStorageCluster.md
+14Lines changed: 14 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -53,6 +53,20 @@ SELECT count(*) FROM azureBlobStorageCluster(
53
53
54
54
See [azureBlobStorage](/sql-reference/table-functions/azureBlobStorage#using-shared-access-signatures-sas-sas-tokens) for examples.
55
55
56
+
## Altinity Antalya branch
57
+
58
+
### `object_storage_cluster` setting.
59
+
60
+
Only in the Altinity Antalya branch, the alternative syntax for the `azureBlobStorageCluster` table function is avilable. This allows the `azureBlobStorage` function to be used with the non-empty `object_storage_cluster` setting, specifying a cluster name. This enables distributed queries over Azure Blob Storage across a ClickHouse cluster.
Copy file name to clipboardExpand all lines: docs/en/sql-reference/table-functions/deltalakeCluster.md
+11Lines changed: 11 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -36,6 +36,17 @@ A table with the specified structure for reading data from cluster in the specif
36
36
-`_time` — Last modified time of the file. Type: `Nullable(DateTime)`. If the time is unknown, the value is `NULL`.
37
37
-`_etag` — The etag of the file. Type: `LowCardinality(String)`. If the etag is unknown, the value is `NULL`.
38
38
39
+
## Altinity Antalya branch
40
+
41
+
### `object_storage_cluster` setting.
42
+
43
+
Only in the Altinity Antalya branch alternative syntax for `deltaLakeCluster` table function is available. This allows the `deltaLake` function to be used with the non-empty `object_storage_cluster` setting, specifying a cluster name. This enables distributed queries over Delta Lake Storage across a ClickHouse cluster.
44
+
45
+
```sql
46
+
SELECTcount(*) FROM deltaLake(url [,aws_access_key_id, aws_secret_access_key] [,format] [,structure] [,compression])
Copy file name to clipboardExpand all lines: docs/en/sql-reference/table-functions/hdfsCluster.md
+12Lines changed: 12 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -59,6 +59,18 @@ FROM hdfsCluster('cluster_simple', 'hdfs://hdfs1:9000/{some,another}_dir/*', 'TS
59
59
If your listing of files contains number ranges with leading zeros, use the construction with braces for each digit separately or use `?`.
60
60
:::
61
61
62
+
## Altinity Antalya branch
63
+
64
+
### `object_storage_cluster` setting.
65
+
66
+
Only in the Altinity Antalya branch alternative syntax for `hdfsCluster` table function is available. This allows the `hdfs` function to be used with the non-empty `object_storage_cluster` setting, specifying a cluster name. This enables distributed queries over HDFS Storage across a ClickHouse cluster.
67
+
68
+
```sql
69
+
SELECTcount(*)
70
+
FROM hdfs('hdfs://hdfs1:9000/{some,another}_dir/*', 'TSV', 'name String, value UInt32')
Copy file name to clipboardExpand all lines: docs/en/sql-reference/table-functions/hudiCluster.md
+12Lines changed: 12 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -42,6 +42,18 @@ A table with the specified structure for reading data from cluster in the specif
42
42
-`_time` — Last modified time of the file. Type: `Nullable(DateTime)`. If the time is unknown, the value is `NULL`.
43
43
-`_etag` — The etag of the file. Type: `LowCardinality(String)`. If the etag is unknown, the value is `NULL`.
44
44
45
+
## Altinity Antalya branch
46
+
47
+
### `object_storage_cluster` setting.
48
+
49
+
Only in the Altinity Antalya branch alternative syntax for `hudiCluster` table function is available. This allows the `hudi` function to be used with the non-empty `object_storage_cluster` setting, specifying a cluster name. This enables distributed queries over Hudi Storage across a ClickHouse cluster.
50
+
51
+
```sql
52
+
SELECT*
53
+
FROM hudi(url [,aws_access_key_id, aws_secret_access_key] [,format] [,structure] [,compression])
Copy file name to clipboardExpand all lines: docs/en/sql-reference/table-functions/iceberg.md
+41Lines changed: 41 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -302,6 +302,47 @@ Table function `iceberg` is an alias to `icebergS3` now.
302
302
-`_time` — Last modified time of the file. Type: `Nullable(DateTime)`. If the time is unknown, the value is `NULL`.
303
303
-`_etag` — The etag of the file. Type: `LowCardinality(String)`. If the etag is unknown, the value is `NULL`.
304
304
305
+
## Altinity Antalya branch
306
+
307
+
### Specify storage type in arguments
308
+
309
+
Only in the Altinity Antalya branch does the `iceberg` table function support all storage types. The storage type can be specified using the named argument `storage_type`. Supported values are `s3`, `azure`, `hdfs`, and `local`.
Only in the Altinity Antalya branch can storage_type be included as part of a named collection. This allows for centralized configuration of storage settings.
Copy file name to clipboardExpand all lines: docs/en/sql-reference/table-functions/icebergCluster.md
+75Lines changed: 75 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -49,6 +49,81 @@ SELECT * FROM icebergS3Cluster('cluster_simple', 'http://test.s3.amazonaws.com/c
49
49
-`_time` — Last modified time of the file. Type: `Nullable(DateTime)`. If the time is unknown, the value is `NULL`.
50
50
-`_etag` — The etag of the file. Type: `LowCardinality(String)`. If the etag is unknown, the value is `NULL`.
51
51
52
+
## Altinity Antalya branch
53
+
54
+
### `icebergLocalCluster` table function
55
+
56
+
Only in the Altinity Antalya branch, `icebergLocalCluster` designed to make distributed cluster queries when Iceberg data is stored on shared network storage mounted with a local path. The path must be identical on all replicas.
Only in the Altinity Antalya branch, the `icebergCluster` table function supports all storage backends. The storage backend can be specified using the named argument `storage_type`. Valid values include `s3`, `azure`, `hdfs`, and `local`.
Only in the Altinity Antalya branch, an alternative syntax for `icebergCluster` table function is available. This allows the `iceberg` function to be used with the non-empty `object_storage_cluster` setting, specifying a cluster name. This enables distributed queries over Iceberg table across a ClickHouse cluster.
Copy file name to clipboardExpand all lines: docs/en/sql-reference/table-functions/s3Cluster.md
+17Lines changed: 17 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -90,6 +90,23 @@ Users can use the same approaches as document for the s3 function [here](/sql-re
90
90
91
91
For details on optimizing the performance of the s3 function see [our detailed guide](/integrations/s3/performance).
92
92
93
+
## Altinity Antalya branch
94
+
95
+
### `object_storage_cluster` setting.
96
+
97
+
Only in the Altinity Antalya branch alternative syntax for `s3Cluster` table function is available. This allows the `s3` function to be used with the non-empty `object_storage_cluster` setting, specifying a cluster name. This enables distributed queries over S3 Storage across a ClickHouse cluster.
0 commit comments