Skip to content

Commit 9217a39

Browse files
kbatuigasmicheleRP
andauthored
Breaking changes for Iceberg in 25.3 (#1468)
Co-authored-by: Michele Cyran <[email protected]>
1 parent bd2b8ff commit 9217a39

File tree

7 files changed

+215
-13
lines changed

7 files changed

+215
-13
lines changed

modules/ROOT/nav.adoc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -231,6 +231,7 @@
231231
** xref:upgrade:k-compatibility.adoc[]
232232
** xref:manage:kubernetes/k-upgrade-kubernetes.adoc[Migrate Node Pools]
233233
** xref:upgrade:deprecated/index.adoc[Deprecated Features]
234+
** xref:upgrade:iceberg-schema-changes-and-migration-guide.adoc[Iceberg Schema Changes in v25.3]
234235
* xref:migrate:index.adoc[Migrate]
235236
** xref:migrate:console-v3.adoc[Migrate to Redpanda Console v3.0.x]
236237
** xref:migrate:data-migration.adoc[]

modules/get-started/pages/release-notes/redpanda.adoc

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,8 @@ This topic includes new content added in version {page-component-version}. For a
77
* xref:redpanda-cloud:get-started:whats-new-cloud.adoc[]
88
* xref:redpanda-cloud:get-started:cloud-overview.adoc#redpanda-cloud-vs-self-managed-feature-compatibility[Redpanda Cloud vs Self-Managed feature compatibility]
99
10+
NOTE: Redpanda v25.3 introduces breaking schema changes for Iceberg topics. If you are using Iceberg topics and want to retain the data in the corresponding Iceberg tables, review xref:upgrade:iceberg-schema-changes-and-migration-guide.adoc[] before upgrading your cluster, and follow the required migration steps to avoid sending new records to a dead-letter queue table.
11+
1012
== Iceberg topics with GCP BigLake
1113

1214
A new xref:manage:iceberg/iceberg-topics-gcp-biglake.adoc[REST catalog integration] with Google Cloud BigLake allows you to add Redpanda topics as Iceberg tables in your data lakehouse.

modules/manage/pages/iceberg/query-iceberg-topics.adoc

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,11 @@ When you access Iceberg topics from a data lakehouse or other Iceberg-compatible
1616
== Access Iceberg tables
1717

1818
ifndef::env-cloud[]
19+
[IMPORTANT]
20+
====
21+
include::upgrade:partial$iceberg-breaking-changes.adoc[]
22+
====
23+
1924
Redpanda generates an Iceberg table with the same name as the topic. Depending on the processing engine and your Iceberg xref:manage:iceberg/use-iceberg-catalogs.adoc[catalog implementation], you may also need to define the table (for example using `CREATE TABLE`) to point the data lakehouse to its location in the catalog. For an example, see xref:manage:iceberg/redpanda-topics-iceberg-snowflake-catalog.adoc[].
2025
endif::[]
2126

modules/manage/pages/iceberg/specify-iceberg-schema.adoc

Lines changed: 22 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -111,6 +111,13 @@ NOTE: If you don't specify the fully qualified Protobuf message name, Redpanda p
111111

112112
== How Iceberg modes translate to table format
113113

114+
ifndef::env-cloud[]
115+
[IMPORTANT]
116+
====
117+
include::upgrade:partial$iceberg-breaking-changes.adoc[]
118+
====
119+
endif::[]
120+
114121
Redpanda generates an Iceberg table with the same name as the topic. In each mode, Redpanda writes to a `redpanda` table column that stores a single Iceberg https://iceberg.apache.org/spec/#nested-types[struct^] per record, containing nested columns of the metadata from each record, including the record key, headers, timestamp, the partition it belongs to, and its offset.
115122

116123
For example, if you produce to a topic `ClickEvent` according to the following Avro schema:
@@ -143,11 +150,12 @@ The `key_value` mode writes to the following table format:
143150
----
144151
CREATE TABLE ClickEvent (
145152
redpanda struct<
146-
partition: integer NOT NULL,
147-
timestamp: timestamp NOT NULL,
148-
offset: long NOT NULL,
149-
headers: array<struct<key: binary NOT NULL, value: binary>>,
150-
key: binary
153+
partition: integer,
154+
timestamp: timestamptz,
155+
offset: long,
156+
headers: array<struct<key: string, value: binary>>,
157+
key: binary,
158+
timestamp_type: integer
151159
>,
152160
value binary
153161
)
@@ -161,11 +169,12 @@ The `value_schema_id_prefix` and `value_schema_latest` modes can use the schema
161169
----
162170
CREATE TABLE ClickEvent (
163171
redpanda struct<
164-
partition: integer NOT NULL,
165-
timestamp: timestamp NOT NULL,
166-
offset: long NOT NULL,
167-
headers: array<struct<key: binary NOT NULL, value: binary>>,
168-
key: binary
172+
partition: integer,
173+
timestamp: timestamptz,
174+
offset: long,
175+
headers: array<struct<key: string, value: binary>>,
176+
key: binary,
177+
timestamp_type: integer
169178
>,
170179
user_id integer NOT NULL,
171180
event_type string,
@@ -213,11 +222,12 @@ Avro::
213222
214223
There are some cases where the Avro type does not map directly to an Iceberg type and Redpanda applies the following transformations:
215224
225+
* Enums are translated into the Iceberg `string` type.
216226
* Different flavors of time (such as `time-millis`) and timestamp (such as `timestamp-millis`) types are translated to the same Iceberg `time` and `timestamp` types, respectively.
217227
* Avro unions are flattened to Iceberg structs with optional fields. For example:
218228
** The union `["int", "long", "float"]` is represented as an Iceberg struct `struct<0 INT NULLABLE, 1 LONG NULLABLE, 2 FLOAT NULLABLE>`.
219229
** The union `["int", null, "float"]` is represented as an Iceberg struct `struct<0 INT NULLABLE, 1 FLOAT NULLABLE>`.
220-
* All fields are required by default. (Avro always sets a default in binary representation.)
230+
* Two-field unions that contain `null` are represented as a single optional field only (no struct). For example, the union `["null", "long"]` is represented as `long`.
221231
222232
Some Avro types are not supported:
223233
@@ -250,7 +260,7 @@ Protobuf::
250260
There are some cases where the Protobuf type does not map directly to an Iceberg type and Redpanda applies the following transformations:
251261
252262
* Repeated values are translated into Iceberg `list` types.
253-
* Enums are translated into Iceberg `int` types based on the integer value of the enumerated type.
263+
* Enums are translated into the Iceberg `string` type.
254264
* `uint32` and `fixed32` are translated into Iceberg `long` types as that is the existing semantic for unsigned 32-bit values in Iceberg.
255265
* `uint64` and `fixed64` values are translated into their Base-10 string representation.
256266
* `google.protobuf.Timestamp` is translated into `timestamp` in Iceberg.
Lines changed: 177 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,177 @@
1+
= Schema Changes and Migration Guide for Iceberg Topics in Redpanda v25.3
2+
:description: Information about breaking schema changes for Iceberg topics in Redpanda v25.3, and actions to take when upgrading.
3+
4+
Redpanda v25.3 introduces changes that break table compatibility for Iceberg topics. If you have existing Iceberg topics and want to retain the data in the corresponding Iceberg tables, you must take specific actions while upgrading to v25.3 to ensure that your Iceberg topics and their associated tables continue to function correctly.
5+
6+
== Breaking changes
7+
8+
The following table lists the schema changes introduced in Redpanda v25.3.
9+
10+
|===
11+
| Field | Iceberg type translation before v25.3 | Iceberg type translation starting in v25.3 | Impact
12+
13+
| `redpanda.timestamp` column
14+
| `timestamp` type
15+
| `timestamptz` (timestamp with time zone) type
16+
| Affects all tables created by Iceberg topics, including dead-letter queue tables.
17+
18+
| `redpanda.headers.key` column
19+
| `binary` type
20+
| `string` type
21+
| Affects all tables created by Iceberg topics, including dead-letter queue tables.
22+
23+
| Avro optionals (two-field union of `[null, <FIELD>]`)
24+
25+
Example: `"type": ["null", "long"]`
26+
27+
| Single-field struct type
28+
29+
Example: `struct<union_opt_1:bigint>`
30+
31+
| Optional `FIELD`
32+
33+
Example: `bigint`
34+
35+
| Affects tables created by Iceberg topics that use Avro optionals.
36+
37+
| Avro non-optional unions
38+
39+
Example: `"type": ["string", "long"]`
40+
41+
| Column names used a naming convention based on the ordering of the union fields
42+
43+
Example: `struct<union_opt_0:string,union_opt_1:bigint>`
44+
45+
| Column names use the type names
46+
47+
Example: `struct<string:string,long:bigint>`
48+
49+
| Affects tables created by Iceberg topics that use Avro unions.
50+
51+
| Avro and Protobuf enums
52+
| `integer` type
53+
| `string` type
54+
| Affects tables created by Iceberg topics that use Avro or Protobuf enums.
55+
56+
|===
57+
58+
== Upgrade steps
59+
60+
When upgrading to Redpanda v25.3, you must perform these steps to migrate Iceberg topics to the new schema translation and ensure your topics continue to function correctly. Failure to perform these steps will result in data being sent to the dead-letter queue (DLQ) table until you make the Iceberg tables conformant to the new schemas (step 4).
61+
62+
. Before upgrading to v25.3, disable Iceberg on all Iceberg topics by setting the `redpanda.iceberg.mode` topic property to `disabled`. This step ensures that no additional Parquet files are written by Iceberg topics.
63+
+
64+
NOTE: Don't set the `iceberg_enabled` cluster property to `false`. Disabling Iceberg at the cluster level would prevent pending Iceberg commits from being finalized post-upgrade.
65+
. xref:upgrade:rolling-upgrade.adoc#perform-a-rolling-upgrade[Perform a rolling upgrade] to v25.3, restarting the cluster in the process.
66+
. Query the `GetCoordinatorState` Admin API endpoint repeatedly for these Iceberg topics to migrate to the new schema, until there are no more pending entries in the coordinator for the given topics. This step confirms that all Parquet files written pre-upgrade have been committed to the Iceberg tables.
67+
+
68+
[,bash]
69+
----
70+
# Pass the comma-separated list of Iceberg topics into "topics_filter"
71+
curl -s \
72+
--header 'Content-Type: application/json' \
73+
--data '{"topics_filter": ["<list-of-topics-to-migrate>"]}' \
74+
localhost:9644/redpanda.core.admin.internal.datalake.v1.DatalakeService/GetCoordinatorState | jq
75+
----
76+
+
77+
.Sample output
78+
[,bash,.no-copy]
79+
----
80+
{
81+
"state": {
82+
"topicStates": {
83+
"topic_to_migrate": {
84+
"revision": "9",
85+
"partitionStates": {
86+
"0": {
87+
"pendingEntries": [
88+
{
89+
"data": {
90+
"startOffset": "12",
91+
"lastOffset": "15",
92+
"dataFiles": [
93+
{
94+
"remotePath": "redpanda-iceberg-catalog/redpanda/topic_to_migrate/data/0-871734c9-e266-41fa-a34d-2afba2828c0d.parquet",
95+
"rowCount": "4",
96+
"fileSizeBytes": "1426",
97+
"tableSchemaId": 0,
98+
"partitionSpecId": 0,
99+
"partitionKey": []
100+
}
101+
],
102+
"dlqFiles": [],
103+
"kafkaProcessedBytes": "289"
104+
},
105+
"addedPendingAt": "6"
106+
}
107+
],
108+
"lastCommitted": "11"
109+
}
110+
},
111+
"lifecycleState": "LIFECYCLE_STATE_LIVE",
112+
"totalKafkaProcessedBytes": "79"
113+
}
114+
}
115+
}
116+
}
117+
----
118+
+
119+
To check for remaining pending files:
120+
+
121+
[,bash]
122+
----
123+
curl -s \
124+
--header 'Content-Type: application/json' \
125+
--data '{}' \
126+
localhost:9644/redpanda.core.admin.internal.datalake.v1.DatalakeService/GetCoordinatorState \
127+
| jq '[.state.topicStates[].partitionStates[].pendingEntries | length] | any(. > 0)'
128+
----
129+
+
130+
If the query returns `true`, there are pending files and you need to wait longer before proceeding to the next step.
131+
132+
. Migrate Iceberg topics to the new schema translation and ensure they are conformant with the breaking change.
133+
+
134+
Run SQL queries to rename affected columns for each Iceberg table you want to migrate to the <<breaking-changes,new schema>>. In addition to renaming the existing columns, Redpanda automatically adds new columns that use the original name, but with the new types:
135+
+
136+
[,sql]
137+
----
138+
/*
139+
`redpanda.timestamp` renamed to `redpanda.timestamp_v1` (`timestamp` type),
140+
new `redpanda.timestamp` (`timestamptz` type) column added
141+
*/
142+
ALTER TABLE redpanda.<name-of-topic-to-migrate>
143+
RENAME COLUMN redpanda.timestamp TO timestamp_v1;
144+
145+
/*
146+
`redpanda.headers.key` renamed to `key_v1` (`binary` type),
147+
new `redpanda.headers.key` (`string` type) column added
148+
*/
149+
ALTER TABLE redpanda.<name-of-topic-to-migrate>
150+
RENAME COLUMN redpanda.headers.key TO key_v1;
151+
152+
/*
153+
Rename any additional affected columns according to the list of
154+
breaking changes in the first section of this guide.
155+
*/
156+
ALTER TABLE redpanda.<name-of-topic-to-migrate>
157+
RENAME COLUMN <column1> TO <column1-new-name>;
158+
----
159+
+
160+
NOTE: Redpanda will not write new data to the renamed columns. You must take care to avoid adding fields to the Kafka schema that collide with the new names.
161+
+
162+
You can then continue to query the data in the original columns, but using their new column names only. To query both older data and new data that use the new types, you must update your queries to account for both the renamed columns and the new columns that use the original name.
163+
+
164+
[,sql]
165+
----
166+
/*
167+
Adjust the range condition as needed.
168+
169+
Tip: Using the same time range for both columns helps ensure that you capture
170+
all data without needing to specify the exact cutoff point for the upgrade.
171+
*/
172+
SELECT count(*) FROM redpanda.<name-of-migrated-topic>
173+
WHERE redpanda.timestamp >= '2025-01-01 00:00:00'
174+
OR redpanda.timestamp_v1 >= '2025-01-01 00:00:00';
175+
----
176+
177+
. Re-enable Iceberg on all Iceberg topics in your upgraded cluster.
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Redpanda v25.3 introduces breaking schema changes for Iceberg topics. If you are using Iceberg topics and want to retain the data in the corresponding Iceberg tables, review xref:upgrade:iceberg-schema-changes-and-migration-guide.adoc[] before upgrading your cluster, and follow the required migration steps to avoid sending new records to a dead-letter queue table.

modules/upgrade/partials/incompat-changes.adoc

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,12 @@
11
=== Review incompatible changes
22

3-
* *Breaking change in Redpanda 25.3*: Schema Registry no longer allows specifying a schema ID and version when registering a schema in read-write mode. You must use import mode to register a schema with a specific ID and version. See xref:manage:schema-reg/schema-reg-api.adoc#set-schema-registry-mode[Use the Schema Registry API] for more information.
3+
* *Breaking changes in Redpanda 25.3*:
4+
** {empty}
5+
+
6+
--
7+
include::upgrade:partial$iceberg-breaking-changes.adoc[]
8+
--
9+
** Schema Registry no longer allows specifying a schema ID and version when registering a schema in read-write mode. You must use import mode to register a schema with a specific ID and version. See xref:manage:schema-reg/schema-reg-api.adoc#set-schema-registry-mode[Use the Schema Registry API] for more information.
410
511
* {empty}
612
+

0 commit comments

Comments
 (0)