azat
diff --git a/‎docs/en/engines/table-engines/integrations/iceberg.md‎
Lines changed: 37 additions & 0 deletions b/‎docs/en/engines/table-engines/integrations/iceberg.md‎
Lines changed: 37 additions & 0 deletions
diff --git a/‎docs/en/sql-reference/table-functions/iceberg.md‎
Lines changed: 38 additions & 0 deletions b/‎docs/en/sql-reference/table-functions/iceberg.md‎
Lines changed: 38 additions & 0 deletions
diff --git a/‎src/Storages/ObjectStorage/DataLakes/Iceberg/IcebergMetadata.cpp‎
Lines changed: 133 additions & 38 deletions b/‎src/Storages/ObjectStorage/DataLakes/Iceberg/IcebergMetadata.cpp‎
Lines changed: 133 additions & 38 deletions
diff --git a/‎src/Storages/ObjectStorage/DataLakes/Iceberg/IcebergMetadata.h‎
Lines changed: 3 additions & 1 deletion b/‎src/Storages/ObjectStorage/DataLakes/Iceberg/IcebergMetadata.h‎
Lines changed: 3 additions & 1 deletion
diff --git a/‎src/Storages/ObjectStorage/StorageObjectStorageSettings.cpp‎
Lines changed: 6 additions & 0 deletions b/‎src/Storages/ObjectStorage/StorageObjectStorageSettings.cpp‎
Lines changed: 6 additions & 0 deletions
diff --git a/‎tests/queries/0_stateless/03401_several_iceberg_tables_in_one_dir.reference‎
Lines changed: 11 additions & 0 deletions b/‎tests/queries/0_stateless/03401_several_iceberg_tables_in_one_dir.reference‎
Lines changed: 11 additions & 0 deletions
diff --git a/‎tests/queries/0_stateless/03401_several_iceberg_tables_in_one_dir.sql‎
Lines changed: 13 additions & 0 deletions b/‎tests/queries/0_stateless/03401_several_iceberg_tables_in_one_dir.sql‎
Lines changed: 13 additions & 0 deletions
diff --git a/‎tests/queries/0_stateless/data_minio/merged_several_tables_test/data/00000-0-7e41b120-8dbb-4f03-a487-fab888fe1037-0-00001.parquet‎
625 Bytes b/‎tests/queries/0_stateless/data_minio/merged_several_tables_test/data/00000-0-7e41b120-8dbb-4f03-a487-fab888fe1037-0-00001.parquet‎
625 Bytes
diff --git a/‎tests/queries/0_stateless/data_minio/merged_several_tables_test/data/00000-2-3f64c5e4-85a1-4846-bca2-ec61df4bd056-0-00001.parquet‎
624 Bytes b/‎tests/queries/0_stateless/data_minio/merged_several_tables_test/data/00000-2-3f64c5e4-85a1-4846-bca2-ec61df4bd056-0-00001.parquet‎
624 Bytes
diff --git a/‎tests/queries/0_stateless/data_minio/merged_several_tables_test/data/00000-4-21e1734e-704f-43d0-aef8-0d4baf8caaa3-0-00001.parquet‎
625 Bytes b/‎tests/queries/0_stateless/data_minio/merged_several_tables_test/data/00000-4-21e1734e-704f-43d0-aef8-0d4baf8caaa3-0-00001.parquet‎
625 Bytes
@@ -247,6 +247,43 @@ The second one is that while doing time travel you can't get state of table befo
 
 In Clickhouse the behavior is consistent with Spark. You can mentally replace Spark Select queries with Clickhouse Select queries and it will work the same way.
 
+## Metadata File Resolution {#metadata-file-resolution}
+When using the `Iceberg` table engine in ClickHouse, the system needs to locate the correct metadata.json file that describes the Iceberg table structure. Here's how this resolution process works:
+
+### Candidates search (in Priority Order) {#candidate-search}
+
+1. **Direct Path Specification**:
+   * If you set `iceberg_metadata_file_path`, the system will use this exact path by combining it with the Iceberg table directory path.
+   * When this setting is provided, all other resolution settings are ignored.
+
+2. **Table UUID Matching**:
+   * If `iceberg_metadata_table_uuid` is specified, the system will:
+     * Look only at `.metadata.json` files in the `metadata` directory
+     * Filter for files containing a `table-uuid` field matching your specified UUID (case-insensitive)
+
+3. **Default Search**:
+   * If neither of the above settings are provided, all `.metadata.json` files in the `metadata` directory become candidates
+
+### Selecting the Most Recent File {#most-recent-file}
+
+After identifying candidate files using the above rules, the system determines which one is the most recent:
+
+* If `iceberg_recent_metadata_file_by_last_updated_ms_field` is enabled:
+  * The file with the largest `last-updated-ms` value is selected
+
+* Otherwise:
+  * The file with the highest version number is selected
+  * (Version appears as `V` in filenames formatted as `V.metadata.json` or `V-uuid.metadata.json`)
+
+**Note**: All mentioned settings are engine-level settings and must be specified during table creation as shown below:
+
+```sql 
+CREATE TABLE example_table ENGINE = Iceberg(
+    's3://bucket/path/to/iceberg_table'
+) SETTINGS iceberg_metadata_table_uuid = '6f6f6407-c6a5-465f-a808-ea8900e35a38';
+```
+
+**Note**: While Iceberg Catalogs typically handle metadata resolution, the `Iceberg` table engine in ClickHouse directly interprets files stored in S3 as Iceberg tables, which is why understanding these resolution rules is important.
 
 ## Data cache {#data-cache}
 
 
@@ -239,6 +239,44 @@ The second one is that while doing time travel you can't get state of table befo
 
 In Clickhouse the behavior is consistent with Spark. You can mentally replace Spark Select queries with Clickhouse Select queries and it will work the same way.
 
+## Metadata File Resolution {#metadata-file-resolution}
+
+When using the `iceberg` table function in ClickHouse, the system needs to locate the correct metadata.json file that describes the Iceberg table structure. Here's how this resolution process works:
+
+### Candidate Search (in Priority Order) {#candidate-search}
+
+1. **Direct Path Specification**:
+   * If you set `iceberg_metadata_file_path`, the system will use this exact path by combining it with the Iceberg table directory path.
+   * When this setting is provided, all other resolution settings are ignored.
+
+2. **Table UUID Matching**:
+   * If `iceberg_metadata_table_uuid` is specified, the system will:
+     * Look only at `.metadata.json` files in the `metadata` directory
+     * Filter for files containing a `table-uuid` field matching your specified UUID (case-insensitive)
+
+3. **Default Search**:
+   * If neither of the above settings are provided, all `.metadata.json` files in the `metadata` directory become candidates
+
+### Selecting the Most Recent File {#most-recent-file}
+
+After identifying candidate files using the above rules, the system determines which one is the most recent:
+
+* If `iceberg_recent_metadata_file_by_last_updated_ms_field` is enabled:
+  * The file with the largest `last-updated-ms` value is selected
+
+* Otherwise:
+  * The file with the highest version number is selected
+  * (Version appears as `V` in filenames formatted as `V.metadata.json` or `V-uuid.metadata.json`)
+
+**Note**: All mentioned settings are table function settings (not global or query-level settings) and must be specified as shown below:
+
+```sql 
+SELECT * FROM iceberg('s3://bucket/path/to/iceberg_table', 
+    SETTINGS iceberg_metadata_table_uuid = 'a90eed4c-f74b-4e5b-b630-096fb9d09021');
+```
+
+**Note**: While Iceberg Catalogs typically handle metadata resolution, the `iceberg` table function in ClickHouse directly interprets files stored in S3 as Iceberg tables, which is why understanding these resolution rules is important.
+
 ## Metadata cache {#metadata-cache}
 
 `Iceberg` table engine and table function support metadata cache storing the information of manifest files, manifest list and metadata json. The cache is stored in memory. This feature is controlled by setting `use_iceberg_metadata_files_cache`, which is enabled by default.
 
@@ -35,6 +35,8 @@ namespace DB
 namespace StorageObjectStorageSetting
 {
     extern const StorageObjectStorageSettingsString iceberg_metadata_file_path;
+    extern const StorageObjectStorageSettingsString iceberg_metadata_table_uuid;
+    extern const StorageObjectStorageSettingsBool iceberg_recent_metadata_file_by_last_updated_ms_field;
 }
 
 namespace ErrorCodes
@@ -67,7 +69,10 @@ constexpr const char * SNAPSHOT_LOG_FIELD = "snapshot-log";
 constexpr const char * TIMESTAMP_FIELD_INSIDE_SNAPSHOT = "timestamp-ms";
 constexpr const char * TABLE_LOCATION_FIELD = "location";
 constexpr const char * SNAPSHOTS_FIELD = "snapshots";
+constexpr const char * LAST_UPDATED_MS_FIELD = "last-updated-ms";
 
+namespace
+{
 
 std::pair<Int32, Poco::JSON::Object::Ptr>
 parseTableSchemaFromManifestFile(const AvroForIcebergDeserializer & deserializer, const String & manifest_file_name)
@@ -86,6 +91,38 @@ parseTableSchemaFromManifestFile(const AvroForIcebergDeserializer & deserializer
 }
 
 
+std::string normalizeUuid(const std::string & uuid)
+{
+    std::string result;
+    result.reserve(uuid.size());
+    for (char c : uuid)
+    {
+        if (std::isalnum(c))
+        {
+            result.push_back(std::tolower(c));
+        }
+    }
+    return result;
+}
+
+Poco::JSON::Object::Ptr
+readJSON(const String & metadata_file_path, ObjectStoragePtr object_storage, const ContextPtr & local_context, LoggerPtr log)
+{
+    ObjectInfo object_info(metadata_file_path);
+    auto buf = StorageObjectStorageSource::createReadBuffer(object_info, object_storage, local_context, log);
+
+    String json_str;
+    readJSONObjectPossiblyInvalid(json_str, *buf);
+
+    Poco::JSON::Parser parser; /// For some reason base/base/JSON.h can not parse this json file
+    Poco::Dynamic::Var json = parser.parse(json_str);
+    return json.extract<Poco::JSON::Object::Ptr>();
+}
+
+
+}
+
+
 IcebergMetadata::IcebergMetadata(
     ObjectStoragePtr object_storage_,
     ConfigurationObserverPtr configuration_,
@@ -244,38 +281,114 @@ static std::pair<Int32, String> getMetadataFileAndVersion(const std::string & pa
     return std::make_pair(std::stoi(version_str), path);
 }
 
+enum class MostRecentMetadataFileSelectionWay
+{
+    BY_LAST_UPDATED_MS_FIELD,
+    BY_METADATA_FILE_VERSION
+};
+
+struct ShortMetadataFileInfo
+{
+    UInt32 version;
+    UInt64 last_updated_ms;
+    String path;
+};
+
+
 /**
  * Each version of table metadata is stored in a `metadata` directory and
  * has one of 2 formats:
  *   1) v<V>.metadata.json, where V - metadata version.
  *   2) <V>-<random-uuid>.metadata.json, where V - metadata version
  */
-static std::pair<Int32, String>
-getLatestMetadataFileAndVersion(const ObjectStoragePtr & object_storage, const StorageObjectStorage::Configuration & configuration)
+static std::pair<Int32, String> getLatestMetadataFileAndVersion(
+    const ObjectStoragePtr & object_storage,
+    const StorageObjectStorage::Configuration & configuration,
+    const ContextPtr & local_context,
+    const std::optional<String> & table_uuid)
 {
+    auto log = getLogger("IcebergMetadataFileResolver");
+    MostRecentMetadataFileSelectionWay selection_way
+        = configuration.getSettingsRef()[StorageObjectStorageSetting::iceberg_recent_metadata_file_by_last_updated_ms_field].value
+        ? MostRecentMetadataFileSelectionWay::BY_LAST_UPDATED_MS_FIELD
+        : MostRecentMetadataFileSelectionWay::BY_METADATA_FILE_VERSION;
+    bool need_all_metadata_files_parsing
+        = (selection_way == MostRecentMetadataFileSelectionWay::BY_LAST_UPDATED_MS_FIELD) || table_uuid.has_value();
     const auto metadata_files = listFiles(*object_storage, configuration, "metadata", ".metadata.json");
     if (metadata_files.empty())
     {
         throw Exception(
             ErrorCodes::FILE_DOESNT_EXIST, "The metadata file for Iceberg table with path {} doesn't exist", configuration.getPath());
     }
-    std::vector<std::pair<UInt32, String>> metadata_files_with_versions;
+    std::vector<ShortMetadataFileInfo> metadata_files_with_versions;
     metadata_files_with_versions.reserve(metadata_files.size());
     for (const auto & path : metadata_files)
     {
-        metadata_files_with_versions.emplace_back(getMetadataFileAndVersion(path));
+        auto [version, metadata_file_path] = getMetadataFileAndVersion(path);
+        if (need_all_metadata_files_parsing)
+        {
+            auto metadata_file_object = readJSON(metadata_file_path, object_storage, local_context, log);
+            if (table_uuid.has_value())
+            {
+                if (metadata_file_object->has("table-uuid"))
+                {
+                    auto current_table_uuid = metadata_file_object->getValue<String>("table-uuid");
+                    if (normalizeUuid(table_uuid.value()) == normalizeUuid(current_table_uuid))
+                    {
+                        metadata_files_with_versions.emplace_back(
+                            version, metadata_file_object->getValue<UInt64>(LAST_UPDATED_MS_FIELD), metadata_file_path);
+                    }
+                }
+                else
+                {
+                    Int64 format_version = metadata_file_object->getValue<Int64>(FORMAT_VERSION_FIELD);
+                    throw Exception(
+                        format_version == 1 ? ErrorCodes::BAD_ARGUMENTS : ErrorCodes::ICEBERG_SPECIFICATION_VIOLATION,
+                        "Table UUID is not specified in some metadata files for table by path {}",
+                        metadata_file_path);
+                }
+            }
+            else
+            {
+                metadata_files_with_versions.emplace_back(version, metadata_file_object->getValue<UInt64>(LAST_UPDATED_MS_FIELD), metadata_file_path);
+            }
+        }
+        else
+        {
+            metadata_files_with_versions.emplace_back(version, 0, metadata_file_path);
+        }
     }
 
     /// Get the latest version of metadata file: v<V>.metadata.json
-    return *std::max_element(metadata_files_with_versions.begin(), metadata_files_with_versions.end());
+    const ShortMetadataFileInfo & latest_metadata_file_info = [&]()
+    {
+        if (selection_way == MostRecentMetadataFileSelectionWay::BY_LAST_UPDATED_MS_FIELD)
+        {
+            return *std::max_element(
+                metadata_files_with_versions.begin(),
+                metadata_files_with_versions.end(),
+                [](const ShortMetadataFileInfo & a, const ShortMetadataFileInfo & b) { return a.last_updated_ms < b.last_updated_ms; });
+        }
+        else
+        {
+            return *std::max_element(
+                metadata_files_with_versions.begin(),
+                metadata_files_with_versions.end(),
+                [](const ShortMetadataFileInfo & a, const ShortMetadataFileInfo & b) { return a.version < b.version; });
+        }
+    }();
+    return {latest_metadata_file_info.version, latest_metadata_file_info.path};
 }
 
-static std::pair<Int32, String> getLatestOrExplicitMetadataFileAndVersion(const ObjectStoragePtr & object_storage, const StorageObjectStorage::Configuration & configuration, Poco::Logger * log)
+static std::pair<Int32, String> getLatestOrExplicitMetadataFileAndVersion(
+    const ObjectStoragePtr & object_storage,
+    const StorageObjectStorage::Configuration & configuration,
+    const ContextPtr & local_context,
+    Poco::Logger * log)
 {
-    auto explicit_metadata_path = configuration.getSettingsRef()[StorageObjectStorageSetting::iceberg_metadata_file_path].value;
-    std::pair<Int32, String> result;
-    if (!explicit_metadata_path.empty())
+    if (configuration.getSettingsRef()[StorageObjectStorageSetting::iceberg_metadata_file_path].changed)
     {
+        auto explicit_metadata_path = configuration.getSettingsRef()[StorageObjectStorageSetting::iceberg_metadata_file_path].value;
         try
         {
             LOG_TEST(log, "Explicit metadata file path is specified {}, will read from this metadata file", explicit_metadata_path);
@@ -289,55 +402,37 @@ static std::pair<Int32, String> getLatestOrExplicitMetadataFileAndVersion(const
             auto prefix_storage_path = configuration.getPath();
             if (!explicit_metadata_path.starts_with(prefix_storage_path))
                 explicit_metadata_path = std::filesystem::path(prefix_storage_path) / explicit_metadata_path;
-            result = getMetadataFileAndVersion(explicit_metadata_path);
+            return getMetadataFileAndVersion(explicit_metadata_path);
         }
         catch (const std::exception & ex)
         {
             throw Exception(ErrorCodes::BAD_ARGUMENTS, "Invalid path {} specified for iceberg_metadata_file_path: '{}'", explicit_metadata_path, ex.what());
         }
     }
-    else
+    else if (configuration.getSettingsRef()[StorageObjectStorageSetting::iceberg_metadata_table_uuid].changed)
     {
-        result = getLatestMetadataFileAndVersion(object_storage, configuration);
+        std::optional<String> table_uuid = configuration.getSettingsRef()[StorageObjectStorageSetting::iceberg_metadata_table_uuid].value;
+        return getLatestMetadataFileAndVersion(object_storage, configuration, local_context, table_uuid);
     }
-
-    return result;
-}
-
-
-Poco::JSON::Object::Ptr IcebergMetadata::readJSON(const String & metadata_file_path, const ContextPtr & local_context) const
-{
-    auto configuration_ptr = configuration.lock();
-    auto create_fn = [&]()
-    {
-        ObjectInfo object_info(metadata_file_path);
-        auto buf = StorageObjectStorageSource::createReadBuffer(object_info, object_storage, local_context, log);
-
-        String json_str;
-        readJSONObjectPossiblyInvalid(json_str, *buf);
-
-        Poco::JSON::Parser parser; /// For some reason base/base/JSON.h can not parse this json file
-        Poco::Dynamic::Var json = parser.parse(json_str);
-        return std::make_pair(json.extract<Poco::JSON::Object::Ptr>(), json.size());
-    };
-    if (manifest_cache)
+    else
     {
-        return manifest_cache->getOrSetTableMetadata(IcebergMetadataFilesCache::getKey(configuration_ptr, metadata_file_path), create_fn);
+        return getLatestMetadataFileAndVersion(object_storage, configuration, local_context, std::nullopt);
     }
-    return create_fn().first;
 }
 
+
 bool IcebergMetadata::update(const ContextPtr & local_context)
 {
     auto configuration_ptr = configuration.lock();
 
-    const auto [metadata_version, metadata_file_path] = getLatestOrExplicitMetadataFileAndVersion(object_storage, *configuration_ptr, log.get());
+    const auto [metadata_version, metadata_file_path]
+        = getLatestOrExplicitMetadataFileAndVersion(object_storage, *configuration_ptr, local_context, log.get());
 
     bool metadata_file_changed = false;
     if (last_metadata_version != metadata_version)
     {
         last_metadata_version = metadata_version;
-        last_metadata_object = readJSON(metadata_file_path, local_context);
+        last_metadata_object = ::DB::readJSON(metadata_file_path, object_storage, local_context, log);
         metadata_file_changed = true;
     }
 
@@ -499,7 +594,7 @@ DataLakeMetadataPtr IcebergMetadata::create(
     else
         LOG_TRACE(log, "Not using in-memory cache for iceberg metadata files, because the setting use_iceberg_metadata_files_cache is false.");
 
-    const auto [metadata_version, metadata_file_path] = getLatestOrExplicitMetadataFileAndVersion(object_storage, *configuration_ptr, log.get());
+    const auto [metadata_version, metadata_file_path] = getLatestOrExplicitMetadataFileAndVersion(object_storage, *configuration_ptr, local_context, log.get());
 
     auto create_fn = [&]()
     {
 
@@ -135,7 +135,9 @@ class IcebergMetadata : public IDataLakeMetadata, private WithContext
 
     std::optional<String> getRelevantManifestList(const Poco::JSON::Object::Ptr & metadata);
 
-    Poco::JSON::Object::Ptr readJSON(const String & metadata_file_path, const ContextPtr & local_context) const;
+    Strings getDataFilesImpl(const ActionsDAG * filter_dag) const;
+
+    Iceberg::ManifestFilePtr tryGetManifestFile(const String & filename) const;
 };
 }
 
 
@@ -26,6 +26,12 @@ Whether delta-lake read schema is the same as table schema.
     DECLARE(String, iceberg_metadata_file_path, "", R"(
 Explicit path to desired Iceberg metadata file, should be relative to path in object storage. Make sense for table function use case only.
 )", 0) \
+    DECLARE(String, iceberg_metadata_table_uuid, "", R"(
+Explicit table UUID to read metadata for. Ignored if iceberg_metadata_file_path is set.
+)", 0) \
+    DECLARE(Bool, iceberg_recent_metadata_file_by_last_updated_ms_field, false, R"(
+If enabled, the engine would use the metadata file with the most recent last_updated_ms json field. Does not make sense to use with iceberg_metadata_file_path.
+)", 0)
 
 // clang-format on
 
 
@@ -0,0 +1,11 @@
+1	test1
+2	test2
+3	test3
+4	test4
+5	test5
+6	test6
+0
+0
+0
+5	test5
+6	test6
@@ -0,0 +1,13 @@
+-- Tags: no-fasttest
+-- Tag no-fasttest: Depends on AWS
+
+SELECT * FROM icebergS3(s3_conn, filename='merged_several_tables_test', SETTINGS iceberg_metadata_table_uuid = 'ea8d1178-7756-4b89-b21f-00e9f31fe03e') ORDER BY id;
+SELECT * FROM icebergS3(s3_conn, filename='merged_several_tables_test', SETTINGS iceberg_metadata_table_uuid = 'A90EED4CF74B4E5BB630096FB9D09021') ORDER BY id;
+SELECT * FROM icebergS3(s3_conn, filename='merged_several_tables_test', SETTINGS iceberg_metadata_table_uuid = '6f6f6407_c6A5465f_A808ea8900_e35a38') ORDER BY id;
+
+SELECT count() FROM icebergS3(s3_conn, filename='merged_several_tables_test', SETTINGS iceberg_metadata_file_path = 'metadata/00001-aec4e034-3f73-48f7-87ad-51b7b42a8db7.metadata.json');
+SELECT count() FROM icebergS3(s3_conn, filename='merged_several_tables_test', SETTINGS iceberg_metadata_file_path = 'metadata/00001-2aad93a8-a893-4943-8504-f6021f83ecab.metadata.json');
+SELECT count() FROM icebergS3(s3_conn, filename='merged_several_tables_test', SETTINGS iceberg_metadata_file_path = 'metadata/00001-aec4e034-3f73-48f7-87ad-51b7b42a8db7.metadata.json');
+
+
+SELECT * FROM icebergS3(s3_conn, filename='merged_several_tables_test', SETTINGS iceberg_recent_metadata_file_by_last_updated_ms_field = true) ORDER BY id;
Original file line number	Diff line number	Diff line change
`@@ -135,7 +135,9 @@ class IcebergMetadata : public IDataLakeMetadata, private WithContext`
`135`	`135`
`136`	`136`	`std::optional<String> getRelevantManifestList(const Poco::JSON::Object::Ptr & metadata);`
`137`	`137`
`138`		`- Poco::JSON::Object::Ptr readJSON(const String & metadata_file_path, const ContextPtr & local_context) const;`
	`138`	`+ Strings getDataFilesImpl(const ActionsDAG * filter_dag) const;`
	`139`	`+`
	`140`	`+ Iceberg::ManifestFilePtr tryGetManifestFile(const String & filename) const;`
`139`	`141`	`};`
`140`	`142`	`}`
`141`	`143`
-Original file line number
+Diff line change
@@ @@ -0,0 +1,11 @@ @@
 +1	test1
 +2	test2
 +3	test3
 +4	test4
 +5	test5
 +6	test6
 +0
 +0
 +0
 +5	test5
 +6	test6