Breaking changes for Iceberg in 25.3 (#1468)

kbatuigas · micheleRP · web-flow · commit 9217a392def7 · 2025-11-19T11:00:58.000-08:00
Co-authored-by: Michele Cyran &lt;michele@redpanda.com&gt;
diff --git a/modules/ROOT/nav.adoc b/modules/ROOT/nav.adoc
@@ -231,6 +231,7 @@
 ** xref:upgrade:k-compatibility.adoc[]
 ** xref:manage:kubernetes/k-upgrade-kubernetes.adoc[Migrate Node Pools]
 ** xref:upgrade:deprecated/index.adoc[Deprecated Features]
+** xref:upgrade:iceberg-schema-changes-and-migration-guide.adoc[Iceberg Schema Changes in v25.3]
 * xref:migrate:index.adoc[Migrate]
 ** xref:migrate:console-v3.adoc[Migrate to Redpanda Console v3.0.x]
 ** xref:migrate:data-migration.adoc[]
diff --git a/modules/get-started/pages/release-notes/redpanda.adoc b/modules/get-started/pages/release-notes/redpanda.adoc
@@ -7,6 +7,8 @@ This topic includes new content added in version {page-component-version}. For a
 * xref:redpanda-cloud:get-started:whats-new-cloud.adoc[]
 * xref:redpanda-cloud:get-started:cloud-overview.adoc#redpanda-cloud-vs-self-managed-feature-compatibility[Redpanda Cloud vs Self-Managed feature compatibility]
 
+NOTE: Redpanda v25.3 introduces breaking schema changes for Iceberg topics. If you are using Iceberg topics and want to retain the data in the corresponding Iceberg tables, review xref:upgrade:iceberg-schema-changes-and-migration-guide.adoc[] before upgrading your cluster, and follow the required migration steps to avoid sending new records to a dead-letter queue table.
+
 == Iceberg topics with GCP BigLake
 
 A new xref:manage:iceberg/iceberg-topics-gcp-biglake.adoc[REST catalog integration] with Google Cloud BigLake allows you to add Redpanda topics as Iceberg tables in your data lakehouse.
diff --git a/modules/manage/pages/iceberg/query-iceberg-topics.adoc b/modules/manage/pages/iceberg/query-iceberg-topics.adoc
@@ -16,6 +16,11 @@ When you access Iceberg topics from a data lakehouse or other Iceberg-compatible
 == Access Iceberg tables
 
 ifndef::env-cloud[]
+[IMPORTANT]
+====
+include::upgrade:partial$iceberg-breaking-changes.adoc[]
+====
+
 Redpanda generates an Iceberg table with the same name as the topic. Depending on the processing engine and your Iceberg xref:manage:iceberg/use-iceberg-catalogs.adoc[catalog implementation], you may also need to define the table (for example using `CREATE TABLE`) to point the data lakehouse to its location in the catalog. For an example, see xref:manage:iceberg/redpanda-topics-iceberg-snowflake-catalog.adoc[].
 endif::[]
 
diff --git a/modules/manage/pages/iceberg/specify-iceberg-schema.adoc b/modules/manage/pages/iceberg/specify-iceberg-schema.adoc
@@ -111,6 +111,13 @@ NOTE: If you don't specify the fully qualified Protobuf message name, Redpanda p
 
 == How Iceberg modes translate to table format
 
+ifndef::env-cloud[]
+[IMPORTANT]
+====
+include::upgrade:partial$iceberg-breaking-changes.adoc[]
+====
+endif::[]
+
 Redpanda generates an Iceberg table with the same name as the topic. In each mode, Redpanda writes to a `redpanda` table column that stores a single Iceberg https://iceberg.apache.org/spec/#nested-types[struct^] per record, containing nested columns of the metadata from each record, including the record key, headers, timestamp, the partition it belongs to, and its offset. 
 
 For example, if you produce to a topic `ClickEvent` according to the following Avro schema:
@@ -143,11 +150,12 @@ The `key_value` mode writes to the following table format:
 ----
 CREATE TABLE ClickEvent (
     redpanda struct<
-        partition: integer NOT NULL,
-        timestamp: timestamp NOT NULL,
-        offset:    long NOT NULL,
-        headers:   array<struct<key: binary NOT NULL, value: binary>>,
-        key:       binary
+        partition:      integer,
+        timestamp:      timestamptz,
+        offset:         long,
+        headers:        array<struct<key: string, value: binary>>,
+        key:            binary,
+        timestamp_type: integer
     >,
     value binary
 )
@@ -161,11 +169,12 @@ The `value_schema_id_prefix` and `value_schema_latest` modes can use the schema
 ----
 CREATE TABLE ClickEvent (
     redpanda struct<
-        partition: integer NOT NULL,
-        timestamp: timestamp NOT NULL,
-        offset:    long NOT NULL,
-        headers:   array<struct<key: binary NOT NULL, value: binary>>,
-        key:       binary
+        partition: integer,
+        timestamp:      timestamptz,
+        offset:         long,
+        headers:        array<struct<key: string, value: binary>>,
+        key:            binary,
+        timestamp_type: integer
     >,
     user_id integer NOT NULL,
     event_type string,
@@ -213,11 +222,12 @@ Avro::
 
 There are some cases where the Avro type does not map directly to an Iceberg type and Redpanda applies the following transformations:
 
+* Enums are translated into the Iceberg `string` type.
 * Different flavors of time (such as `time-millis`) and timestamp (such as `timestamp-millis`) types are translated to the same Iceberg `time` and `timestamp` types, respectively.
 * Avro unions are flattened to Iceberg structs with optional fields. For example:
 ** The union `["int", "long", "float"]` is represented as an Iceberg struct `struct<0 INT NULLABLE, 1 LONG  NULLABLE, 2 FLOAT NULLABLE>`.
 ** The union `["int", null, "float"]` is represented as an Iceberg struct `struct<0 INT NULLABLE, 1 FLOAT NULLABLE>`.
-* All fields are required by default. (Avro always sets a default in binary representation.)
+* Two-field unions that contain `null` are represented as a single optional field only (no struct). For example, the union `["null", "long"]` is represented as `long`.
 
 Some Avro types are not supported:
 
@@ -250,7 +260,7 @@ Protobuf::
 There are some cases where the Protobuf type does not map directly to an Iceberg type and Redpanda applies the following transformations:
 
 * Repeated values are translated into Iceberg `list` types.
-* Enums are translated into Iceberg `int` types based on the integer value of the enumerated type.
+* Enums are translated into the Iceberg `string` type.
 * `uint32` and `fixed32` are translated into Iceberg `long` types as that is the existing semantic for unsigned 32-bit values in Iceberg.
 * `uint64` and `fixed64` values are translated into their Base-10 string representation.
 * `google.protobuf.Timestamp` is translated into `timestamp` in Iceberg.
diff --git a/modules/upgrade/pages/iceberg-schema-changes-and-migration-guide.adoc b/modules/upgrade/pages/iceberg-schema-changes-and-migration-guide.adoc
@@ -0,0 +1,177 @@
+= Schema Changes and Migration Guide for Iceberg Topics in Redpanda v25.3
+:description: Information about breaking schema changes for Iceberg topics in Redpanda v25.3, and actions to take when upgrading.
+
+Redpanda v25.3 introduces changes that break table compatibility for Iceberg topics. If you have existing Iceberg topics and want to retain the data in the corresponding Iceberg tables, you must take specific actions while upgrading to v25.3 to ensure that your Iceberg topics and their associated tables continue to function correctly.
+
+== Breaking changes
+
+The following table lists the schema changes introduced in Redpanda v25.3.
+
+|===
+| Field | Iceberg type translation before v25.3 | Iceberg type translation starting in v25.3 | Impact
+
+| `redpanda.timestamp` column 
+| `timestamp` type 
+| `timestamptz` (timestamp with time zone) type
+| Affects all tables created by Iceberg topics, including dead-letter queue tables.
+
+| `redpanda.headers.key` column 
+| `binary` type 
+| `string` type 
+| Affects all tables created by Iceberg topics, including dead-letter queue tables.
+
+| Avro optionals (two-field union of `[null, <FIELD>]`)
+
+Example: `"type": ["null", "long"]`
+
+| Single-field struct type
+
+Example: `struct<union_opt_1:bigint>`
+
+| Optional `FIELD`
+
+Example: `bigint`
+
+| Affects tables created by Iceberg topics that use Avro optionals.
+
+| Avro non-optional unions
+
+Example: `"type": ["string", "long"]`
+
+| Column names used a naming convention based on the ordering of the union fields
+
+Example: `struct<union_opt_0:string,union_opt_1:bigint>`
+
+| Column names use the type names
+
+Example: `struct<string:string,long:bigint>`
+
+| Affects tables created by Iceberg topics that use Avro unions.
+
+| Avro and Protobuf enums 
+| `integer` type
+| `string` type
+| Affects tables created by Iceberg topics that use Avro or Protobuf enums.
+
+|===
+
+== Upgrade steps
+
+When upgrading to Redpanda v25.3, you must perform these steps to migrate Iceberg topics to the new schema translation and ensure your topics continue to function correctly. Failure to perform these steps will result in data being sent to the dead-letter queue (DLQ) table until you make the Iceberg tables conformant to the new schemas (step 4).
+
+. Before upgrading to v25.3, disable Iceberg on all Iceberg topics by setting the `redpanda.iceberg.mode` topic property to `disabled`. This step ensures that no additional Parquet files are written by Iceberg topics.
++
+NOTE: Don't set the `iceberg_enabled` cluster property to `false`. Disabling Iceberg at the cluster level would prevent pending Iceberg commits from being finalized post-upgrade.
+. xref:upgrade:rolling-upgrade.adoc#perform-a-rolling-upgrade[Perform a rolling upgrade] to v25.3, restarting the cluster in the process.
+. Query the `GetCoordinatorState` Admin API endpoint repeatedly for these Iceberg topics to migrate to the new schema, until there are no more pending entries in the coordinator for the given topics. This step confirms that all Parquet files written pre-upgrade have been committed to the Iceberg tables.
++
+[,bash]
+----
+# Pass the comma-separated list of Iceberg topics into "topics_filter" 
+curl -s \
+    --header 'Content-Type: application/json' \
+    --data '{"topics_filter": ["<list-of-topics-to-migrate>"]}' \
+    localhost:9644/redpanda.core.admin.internal.datalake.v1.DatalakeService/GetCoordinatorState | jq
+----
++
+.Sample output
+[,bash,.no-copy]
+----
+{
+  "state": {
+    "topicStates": {
+      "topic_to_migrate": {
+        "revision": "9",
+        "partitionStates": {
+          "0": {
+            "pendingEntries": [
+              {
+                "data": {
+                  "startOffset": "12",
+                  "lastOffset": "15",
+                  "dataFiles": [
+                    {
+                      "remotePath": "redpanda-iceberg-catalog/redpanda/topic_to_migrate/data/0-871734c9-e266-41fa-a34d-2afba2828c0d.parquet",
+                      "rowCount": "4",
+                      "fileSizeBytes": "1426",
+                      "tableSchemaId": 0,
+                      "partitionSpecId": 0,
+                      "partitionKey": []
+                    }
+                  ],
+                  "dlqFiles": [],
+                  "kafkaProcessedBytes": "289"
+                },
+                "addedPendingAt": "6"
+              }
+            ],
+            "lastCommitted": "11"
+          }
+        },
+        "lifecycleState": "LIFECYCLE_STATE_LIVE",
+        "totalKafkaProcessedBytes": "79"
+      }
+    }
+  }
+}
+----
++
+To check for remaining pending files:
++
+[,bash]
+----
+curl -s \
+    --header 'Content-Type: application/json' \
+    --data '{}' \
+    localhost:9644/redpanda.core.admin.internal.datalake.v1.DatalakeService/GetCoordinatorState \
+    | jq  '[.state.topicStates[].partitionStates[].pendingEntries | length] | any(. > 0)'
+----
++
+If the query returns `true`, there are pending files and you need to wait longer before proceeding to the next step.
+
+. Migrate Iceberg topics to the new schema translation and ensure they are conformant with the breaking change.
++
+Run SQL queries to rename affected columns for each Iceberg table you want to migrate to the <<breaking-changes,new schema>>. In addition to renaming the existing columns, Redpanda automatically adds new columns that use the original name, but with the new types:
++
+[,sql]
+----
+/*
+`redpanda.timestamp` renamed to `redpanda.timestamp_v1` (`timestamp` type), 
+new `redpanda.timestamp` (`timestamptz` type) column added
+*/
+ALTER TABLE redpanda.<name-of-topic-to-migrate>
+RENAME COLUMN redpanda.timestamp TO timestamp_v1;
+
+/*
+`redpanda.headers.key` renamed to `key_v1` (`binary` type), 
+new `redpanda.headers.key` (`string` type) column added
+*/
+ALTER TABLE redpanda.<name-of-topic-to-migrate>
+RENAME COLUMN redpanda.headers.key TO key_v1;
+
+/*
+Rename any additional affected columns according to the list of
+breaking changes in the first section of this guide.
+*/
+ALTER TABLE redpanda.<name-of-topic-to-migrate>
+RENAME COLUMN <column1> TO <column1-new-name>;
+----
++
+NOTE: Redpanda will not write new data to the renamed columns. You must take care to avoid adding fields to the Kafka schema that collide with the new names.
++
+You can then continue to query the data in the original columns, but using their new column names only. To query both older data and new data that use the new types, you must update your queries to account for both the renamed columns and the new columns that use the original name.
++
+[,sql]
+----
+/*
+Adjust the range condition as needed. 
+
+Tip: Using the same time range for both columns helps ensure that you capture
+all data without needing to specify the exact cutoff point for the upgrade.
+*/
+SELECT count(*) FROM redpanda.<name-of-migrated-topic>
+  WHERE redpanda.timestamp >= '2025-01-01 00:00:00'
+  OR redpanda.timestamp_v1 >= '2025-01-01 00:00:00';
+----
+
+. Re-enable Iceberg on all Iceberg topics in your upgraded cluster.
diff --git a/modules/upgrade/partials/iceberg-breaking-changes.adoc b/modules/upgrade/partials/iceberg-breaking-changes.adoc
@@ -0,0 +1 @@
+Redpanda v25.3 introduces breaking schema changes for Iceberg topics. If you are using Iceberg topics and want to retain the data in the corresponding Iceberg tables, review xref:upgrade:iceberg-schema-changes-and-migration-guide.adoc[] before upgrading your cluster, and follow the required migration steps to avoid sending new records to a dead-letter queue table.
diff --git a/modules/upgrade/partials/incompat-changes.adoc b/modules/upgrade/partials/incompat-changes.adoc
@@ -1,6 +1,12 @@
 === Review incompatible changes
 
-* *Breaking change in Redpanda 25.3*: Schema Registry no longer allows specifying a schema ID and version when registering a schema in read-write mode. You must use import mode to register a schema with a specific ID and version. See xref:manage:schema-reg/schema-reg-api.adoc#set-schema-registry-mode[Use the Schema Registry API] for more information.
+* *Breaking changes in Redpanda 25.3*:
+** {empty}
++
+--
+include::upgrade:partial$iceberg-breaking-changes.adoc[]
+--
+** Schema Registry no longer allows specifying a schema ID and version when registering a schema in read-write mode. You must use import mode to register a schema with a specific ID and version. See xref:manage:schema-reg/schema-reg-api.adoc#set-schema-registry-mode[Use the Schema Registry API] for more information.
 
 * {empty}
 +

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1 @@`
	`1`	`+Redpanda v25.3 introduces breaking schema changes for Iceberg topics. If you are using Iceberg topics and want to retain the data in the corresponding Iceberg tables, review xref:upgrade:iceberg-schema-changes-and-migration-guide.adoc[] before upgrading your cluster, and follow the required migration steps to avoid sending new records to a dead-letter queue table.`