You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/en/engines/table-engines/integrations/iceberg.md
+38-1Lines changed: 38 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -85,7 +85,7 @@ To read a table where the schema has changed after its creation with dynamic sch
85
85
86
86
## Partition Pruning {#partition-pruning}
87
87
88
-
ClickHouse supports partition pruning during SELECT queries for Iceberg tables, which helps optimize query performance by skipping irrelevant data files. Now it works with only identity transforms and time-based transforms (hour, day, month, year). To enable partition pruning, set `use_iceberg_partition_pruning = 1`.
88
+
ClickHouse supports partition pruning during SELECT queries for Iceberg tables, which helps optimize query performance by skipping irrelevant data files. To enable partition pruning, set `use_iceberg_partition_pruning = 1`. For more information about iceberg partition pruning address https://iceberg.apache.org/spec/#partitioning
89
89
90
90
91
91
## Time Travel {#time-travel}
@@ -247,6 +247,43 @@ The second one is that while doing time travel you can't get state of table befo
247
247
248
248
In Clickhouse the behavior is consistent with Spark. You can mentally replace Spark Select queries with Clickhouse Select queries and it will work the same way.
When using the `Iceberg` table engine in ClickHouse, the system needs to locate the correct metadata.json file that describes the Iceberg table structure. Here's how this resolution process works:
252
+
253
+
### Candidates search (in Priority Order) {#candidate-search}
254
+
255
+
1.**Direct Path Specification**:
256
+
* If you set `iceberg_metadata_file_path`, the system will use this exact path by combining it with the Iceberg table directory path.
257
+
* When this setting is provided, all other resolution settings are ignored.
258
+
259
+
2.**Table UUID Matching**:
260
+
* If `iceberg_metadata_table_uuid` is specified, the system will:
261
+
* Look only at `.metadata.json` files in the `metadata` directory
262
+
* Filter for files containing a `table-uuid` field matching your specified UUID (case-insensitive)
263
+
264
+
3.**Default Search**:
265
+
* If neither of the above settings are provided, all `.metadata.json` files in the `metadata` directory become candidates
266
+
267
+
### Selecting the Most Recent File {#most-recent-file}
268
+
269
+
After identifying candidate files using the above rules, the system determines which one is the most recent:
270
+
271
+
* If `iceberg_recent_metadata_file_by_last_updated_ms_field` is enabled:
272
+
* The file with the largest `last-updated-ms` value is selected
273
+
274
+
* Otherwise:
275
+
* The file with the highest version number is selected
276
+
* (Version appears as `V` in filenames formatted as `V.metadata.json` or `V-uuid.metadata.json`)
277
+
278
+
**Note**: All mentioned settings are engine-level settings and must be specified during table creation as shown below:
**Note**: While Iceberg Catalogs typically handle metadata resolution, the `Iceberg` table engine in ClickHouse directly interprets files stored in S3 as Iceberg tables, which is why understanding these resolution rules is important.
Copy file name to clipboardExpand all lines: docs/en/sql-reference/table-functions/iceberg.md
+39-1Lines changed: 39 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -78,7 +78,7 @@ Currently, it is not possible to change nested structures or the types of elemen
78
78
79
79
## Partition Pruning {#partition-pruning}
80
80
81
-
ClickHouse supports partition pruning during SELECT queries for Iceberg tables, which helps optimize query performance by skipping irrelevant data files. Now it works with only identity transforms and time-based transforms (hour, day, month, year). To enable partition pruning, set `use_iceberg_partition_pruning = 1`.
81
+
ClickHouse supports partition pruning during SELECT queries for Iceberg tables, which helps optimize query performance by skipping irrelevant data files. To enable partition pruning, set `use_iceberg_partition_pruning = 1`. For more information about iceberg partition pruning address https://iceberg.apache.org/spec/#partitioning
82
82
83
83
84
84
## Time Travel {#time-travel}
@@ -239,6 +239,44 @@ The second one is that while doing time travel you can't get state of table befo
239
239
240
240
In Clickhouse the behavior is consistent with Spark. You can mentally replace Spark Select queries with Clickhouse Select queries and it will work the same way.
When using the `iceberg` table function in ClickHouse, the system needs to locate the correct metadata.json file that describes the Iceberg table structure. Here's how this resolution process works:
245
+
246
+
### Candidate Search (in Priority Order) {#candidate-search}
247
+
248
+
1.**Direct Path Specification**:
249
+
* If you set `iceberg_metadata_file_path`, the system will use this exact path by combining it with the Iceberg table directory path.
250
+
* When this setting is provided, all other resolution settings are ignored.
251
+
252
+
2.**Table UUID Matching**:
253
+
* If `iceberg_metadata_table_uuid` is specified, the system will:
254
+
* Look only at `.metadata.json` files in the `metadata` directory
255
+
* Filter for files containing a `table-uuid` field matching your specified UUID (case-insensitive)
256
+
257
+
3.**Default Search**:
258
+
* If neither of the above settings are provided, all `.metadata.json` files in the `metadata` directory become candidates
259
+
260
+
### Selecting the Most Recent File {#most-recent-file}
261
+
262
+
After identifying candidate files using the above rules, the system determines which one is the most recent:
263
+
264
+
* If `iceberg_recent_metadata_file_by_last_updated_ms_field` is enabled:
265
+
* The file with the largest `last-updated-ms` value is selected
266
+
267
+
* Otherwise:
268
+
* The file with the highest version number is selected
269
+
* (Version appears as `V` in filenames formatted as `V.metadata.json` or `V-uuid.metadata.json`)
270
+
271
+
**Note**: All mentioned settings are table function settings (not global or query-level settings) and must be specified as shown below:
**Note**: While Iceberg Catalogs typically handle metadata resolution, the `iceberg` table function in ClickHouse directly interprets files stored in S3 as Iceberg tables, which is why understanding these resolution rules is important.
279
+
242
280
## Metadata cache {#metadata-cache}
243
281
244
282
`Iceberg` table engine and table function support metadata cache storing the information of manifest files, manifest list and metadata json. The cache is stored in memory. This feature is controlled by setting `use_iceberg_metadata_files_cache`, which is enabled by default.
0 commit comments