Skip to content

Commit 2e19ad3

Browse files
authored
Merge branch '25.8' into backport/25.8/86357
2 parents b5b68f8 + d6e53e2 commit 2e19ad3

File tree

125 files changed

+1808
-428
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

125 files changed

+1808
-428
lines changed

.gitmodules

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -124,7 +124,7 @@
124124
url = https://github.com/anrieff/libcpuid
125125
[submodule "contrib/openldap"]
126126
path = contrib/openldap
127-
url = https://github.com/ClickHouse/openldap
127+
url = https://github.com/openldap/openldap
128128
[submodule "contrib/AMQP-CPP"]
129129
path = contrib/AMQP-CPP
130130
url = https://github.com/ClickHouse/AMQP-CPP

cmake/autogenerated_versions.txt

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -2,11 +2,11 @@
22

33
# NOTE: VERSION_REVISION has nothing common with DBMS_TCP_PROTOCOL_VERSION,
44
# only DBMS_TCP_PROTOCOL_VERSION should be incremented on protocol changes.
5-
SET(VERSION_REVISION 54502)
5+
SET(VERSION_REVISION 54503)
66
SET(VERSION_MAJOR 25)
77
SET(VERSION_MINOR 8)
8-
SET(VERSION_PATCH 2)
9-
SET(VERSION_GITHASH 4f2b50b8c9293522f2e9b593de8f709290fa8312)
10-
SET(VERSION_DESCRIBE v25.8.2.1-lts)
11-
SET(VERSION_STRING 25.8.2.1)
8+
SET(VERSION_PATCH 3)
9+
SET(VERSION_GITHASH 874146507b059896a7ca4ca257ee847fe05dbe05)
10+
SET(VERSION_DESCRIBE v25.8.3.1-lts)
11+
SET(VERSION_STRING 25.8.3.1)
1212
# end of autochange

contrib/openldap

Submodule openldap updated 1443 files

docs/en/operations/server-configuration-parameters/_server_settings_outside_source.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -969,6 +969,8 @@ To enable JSON logging support, use the following snippet:
969969
<logger>
970970
<formatting>
971971
<type>json</type>
972+
<!-- Can be configured on a per-channel basis (log, errorlog, console, syslog), or globally for all channels (then just omit it). -->
973+
<!-- <channel></channel> -->
972974
<names>
973975
<date_time>date_time</date_time>
974976
<thread_name>thread_name</thread_name>
Lines changed: 86 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,86 @@
1+
---
2+
description: 'System table containing information about metadata files read from Iceberg tables. Each entry
3+
represents either a root metadata file, metadata extracted from an Avro file, or an entry of some Avro file.'
4+
keywords: ['system table', 'iceberg_metadata_log']
5+
slug: /operations/system-tables/iceberg_metadata_log
6+
title: 'system.iceberg_metadata_log'
7+
---
8+
9+
import SystemTableCloud from '@site/docs/_snippets/_system_table_cloud.md';
10+
11+
# system.iceberg_metadata_log
12+
13+
The `system.iceberg_metadata_log` table records metadata access and parsing events for Iceberg tables read by ClickHouse. It provides detailed information about each metadata file or entry processed, which is useful for debugging, auditing, and understanding Iceberg table structure evolution.
14+
15+
## Purpose {#purpose}
16+
17+
This table logs every metadata file and entry read from Iceberg tables, including root metadata files, manifest lists, and manifest entries. It helps users trace how ClickHouse interprets Iceberg table metadata and diagnose issues related to schema evolution, file resolution, or query planning.
18+
19+
:::note
20+
This table is primarily intended for debugging purposes.
21+
:::note
22+
23+
## Columns {#columns}
24+
| Name | Type | Description |
25+
|----------------|-----------|----------------------------------------------------------------------------------------------|
26+
| `event_date` | [Date](../../sql-reference/data-types/date.md) | Date of the log entry. |
27+
| `event_time` | [DateTime](../../sql-reference/data-types/datetime.md) | Timestamp of the event. |
28+
| `query_id` | [String](../../sql-reference/data-types/string.md) | Query ID that triggered the metadata read. |
29+
| `content_type` | [Enum8](../../sql-reference/data-types/enum.md) | Type of metadata content (see below). |
30+
| `table_path` | [String](../../sql-reference/data-types/string.md) | Path to the Iceberg table. |
31+
| `file_path` | [String](../../sql-reference/data-types/string.md) | Path to the root metadata JSON file, Avro manifest list, or manifest file. |
32+
| `content` | [String](../../sql-reference/data-types/string.md) | Content in JSON format (raw metadata from .json, Avro metadata, or Avro entry). |
33+
| `row_in_file` | [Nullable](../../sql-reference/data-types/nullable.md)([UInt64](../../sql-reference/data-types/int-uint.md)) | Row number in the file, if applicable. Present for `ManifestListEntry` and `ManifestFileEntry` content types. |
34+
35+
## `content_type` values {#content-type-values}
36+
37+
- `None`: No content.
38+
- `Metadata`: Root metadata file.
39+
- `ManifestListMetadata`: Manifest list metadata.
40+
- `ManifestListEntry`: Entry in a manifest list.
41+
- `ManifestFileMetadata`: Manifest file metadata.
42+
- `ManifestFileEntry`: Entry in a manifest file.
43+
44+
<SystemTableCloud/>
45+
46+
## Controlling log verbosity {#controlling-log-verbosity}
47+
48+
You can control which metadata events are logged using the [`iceberg_metadata_log_level`](../../operations/settings/settings.md#iceberg_metadata_log_level) setting.
49+
50+
To log all metadata used in the current query:
51+
52+
```sql
53+
SELECT * FROM my_iceberg_table SETTINGS iceberg_metadata_log_level = 'manifest_file_entry';
54+
55+
SYSTEM FLUSH LOGS iceberg_metadata_log;
56+
57+
SELECT content_type, file_path, row_in_file
58+
FROM system.iceberg_metadata_log
59+
WHERE query_id = '{previous_query_id}';
60+
```
61+
62+
To log only the root metadata JSON file used in the current query:
63+
64+
```sql
65+
SELECT * FROM my_iceberg_table SETTINGS iceberg_metadata_log_level = 'metadata';
66+
67+
SYSTEM FLUSH LOGS iceberg_metadata_log;
68+
69+
SELECT content_type, file_path, row_in_file
70+
FROM system.iceberg_metadata_log
71+
WHERE query_id = '{previous_query_id}';
72+
```
73+
74+
See more information in the description of the [`iceberg_metadata_log_level`](../../operations/settings/settings.md#iceberg_metadata_log_level) setting.
75+
76+
### Good To Know {#good-to-know}
77+
78+
- Use `iceberg_metadata_log_level` at the query level only when you need to investigate your Iceberg table in detail. Otherwise, you may populate the log table with excessive metadata and experience performance degradation.
79+
- The table may contain duplicate entries, as it is intended primarily for debugging and does not guarantee uniqueness per entity.
80+
- If you use a `content_type` more verbose than `ManifestListMetadata`, the Iceberg metadata cache is disabled for manifest lists.
81+
- Similarly, if you use a `content_type` more verbose than `ManifestFileMetadata`, the Iceberg metadata cache is disabled for manifest files.
82+
83+
## See also {#see-also}
84+
- [Iceberg Table Engine](../../engines/table-engines/integrations/iceberg.md)
85+
- [Iceberg Table Function](../../sql-reference/table-functions/iceberg.md)
86+
- [system.iceberg_history](./iceberg_history.md)

programs/server/Server.cpp

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1223,7 +1223,7 @@ try
12231223
std::vector<ProtocolServerAdapter> servers_to_start_before_tables;
12241224

12251225
/// Wait for all threads to avoid possible use-after-free (for example logging objects can be already destroyed).
1226-
SCOPE_EXIT({
1226+
SCOPE_EXIT_SAFE({
12271227
Stopwatch watch;
12281228
LOG_INFO(log, "Waiting for background threads");
12291229
GlobalThreadPool::instance().shutdown();
@@ -1274,7 +1274,7 @@ try
12741274

12751275
/// NOTE: global context should be destroyed *before* GlobalThreadPool::shutdown()
12761276
/// Otherwise GlobalThreadPool::shutdown() will hang, since Context holds some threads.
1277-
SCOPE_EXIT({
1277+
SCOPE_EXIT_SAFE({
12781278
async_metrics.stop();
12791279

12801280
/** Ask to cancel background jobs all table engines,

programs/server/config.xml

Lines changed: 20 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -72,10 +72,12 @@
7272
</logger>
7373
</levels>
7474
-->
75+
7576
<!-- Structured log formatting:
76-
You can specify log format(for now, JSON only). In that case, the console log will be printed
77-
in specified format like JSON.
78-
For example, as below:
77+
78+
You can specify log format(for now, JSON only).
79+
It can be done either on a per-channel (log, errorlog, console, syslog) level (set `channel`, i.e. `<channel>console</channel>') or for all channels (omit `channel`).
80+
The log will be printed in specified format like JSON, example:
7981
8082
{"date_time":"1650918987.180175","thread_name":"#1","thread_id":"254545","level":"Trace","query_id":"","logger_name":"BaseDaemon","message":"Received signal 2","source_file":"../base/daemon/BaseDaemon.cpp; virtual void SignalListener::run()","source_line":"192"}
8183
{"date_time_utc":"2024-11-06T09:06:09Z","thread_name":"#1","thread_id":"254545","level":"Trace","query_id":"","logger_name":"BaseDaemon","message":"Received signal 2","source_file":"../base/daemon/BaseDaemon.cpp; virtual void SignalListener::run()","source_line":"192"}
@@ -91,8 +93,10 @@
9193
However, if you comment out all the tags under <names>, the program will print default values for as
9294
below.
9395
-->
94-
<!-- <formatting>
96+
<!--
97+
<formatting>
9598
<type>json</type>
99+
<channel></channel>
96100
<names>
97101
<date_time>date_time</date_time>
98102
<date_time_utc>date_time_utc</date_time_utc>
@@ -105,7 +109,8 @@
105109
<source_file>source_file</source_file>
106110
<source_line>source_line</source_line>
107111
</names>
108-
</formatting> -->
112+
</formatting>
113+
-->
109114
</logger>
110115

111116
<url_scheme_mappers>
@@ -1281,6 +1286,16 @@
12811286
<flush_on_crash>false</flush_on_crash>
12821287
</asynchronous_metric_log>
12831288

1289+
<iceberg_metadata_log>
1290+
<database>system</database>
1291+
<table>iceberg_metadata_log</table>
1292+
<flush_interval_milliseconds>2000</flush_interval_milliseconds>
1293+
<max_size_rows>1048576</max_size_rows>
1294+
<reserved_size_rows>8192</reserved_size_rows>
1295+
<buffer_size_rows_flush_threshold>524288</buffer_size_rows_flush_threshold>
1296+
<flush_on_crash>false</flush_on_crash>
1297+
</iceberg_metadata_log>
1298+
12841299
<!--
12851300
OpenTelemetry log contains OpenTelemetry trace spans.
12861301

src/Columns/ColumnArray.cpp

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -255,6 +255,25 @@ char * ColumnArray::serializeValueIntoMemory(size_t n, char * memory) const
255255
return memory;
256256
}
257257

258+
std::optional<size_t> ColumnArray::getSerializedValueSize(size_t n) const
259+
{
260+
const auto & offsets_data = getOffsets();
261+
262+
size_t pos = offsets_data[n - 1];
263+
size_t end = offsets_data[n];
264+
265+
size_t res = sizeof(offsets_data[0]);
266+
for (; pos < end; ++pos)
267+
{
268+
auto element_size = getData().getSerializedValueSize(pos);
269+
if (!element_size)
270+
return std::nullopt;
271+
res += *element_size;
272+
}
273+
274+
return res;
275+
}
276+
258277

259278
const char * ColumnArray::deserializeAndInsertFromArena(const char * pos)
260279
{

src/Columns/ColumnArray.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -80,6 +80,7 @@ class ColumnArray final : public COWHelper<IColumnHelper<ColumnArray>, ColumnArr
8080
void insertData(const char * pos, size_t length) override;
8181
StringRef serializeValueIntoArena(size_t n, Arena & arena, char const *& begin) const override;
8282
char * serializeValueIntoMemory(size_t, char * memory) const override;
83+
std::optional<size_t> getSerializedValueSize(size_t n) const override;
8384
const char * deserializeAndInsertFromArena(const char * pos) override;
8485
const char * skipSerializedInArena(const char * pos) const override;
8586
void updateHashWithValue(size_t n, SipHash & hash) const override;

src/Columns/ColumnDynamic.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -196,6 +196,7 @@ class ColumnDynamic final : public COWHelper<IColumnHelper<ColumnDynamic>, Colum
196196
StringRef serializeValueIntoArena(size_t n, Arena & arena, char const *& begin) const override;
197197
const char * deserializeAndInsertFromArena(const char * pos) override;
198198
const char * skipSerializedInArena(const char * pos) const override;
199+
std::optional<size_t> getSerializedValueSize(size_t) const override { return std::nullopt; }
199200

200201
void updateHashWithValue(size_t n, SipHash & hash) const override;
201202

0 commit comments

Comments
 (0)