You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: modules/manage/pages/topic-iceberg-integration.adoc
+7-7Lines changed: 7 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -55,7 +55,7 @@ In the Redpanda Iceberg integration, the manifest files are in JSON format.
55
55
56
56
image::shared:iceberg-integration.png[]
57
57
58
-
When you enable the Iceberg integration for a Redpanda topic, Redpanda brokers store streaming data in the Iceberg-compatible format in Parquet files in object storage, in addition to the log segments uploaded via Tiered Storage. Storing the streaming data in Iceberg tables in the cloud allows you to derive real-time insights through many compatible data lakehouse, data engineering, and business intelligence https://iceberg.apache.org/vendors/[tools^].
58
+
When you enable the Iceberg integration for a Redpanda topic, Redpanda brokers store streaming data in the Iceberg-compatible format in Parquet files in object storage, in addition to the log segments uploaded using Tiered Storage. Storing the streaming data in Iceberg tables in the cloud allows you to derive real-time insights through many compatible data lakehouse, data engineering, and business intelligence https://iceberg.apache.org/vendors/[tools^].
59
59
60
60
== Enable Iceberg integration
61
61
@@ -89,7 +89,7 @@ new-topic-name OK
89
89
. Enable the integration for the topic by configuring `redpanda.iceberg.mode`. You can choose one of the following modes:
90
90
+
91
91
--
92
-
* `key_value`: Creates an Iceberg table using a simple schema, consisting two columns, one for the record metadata including the key, and another binary column for the record's value.
92
+
* `key_value`: Creates an Iceberg table using a simple schema, consisting of two columns, one for the record metadata including the key, and another binary column for the record's value.
93
93
* `value_schema_id_prefix`: Creates an Iceberg table whose structure matches the Redpanda schema for this topic, with columns corresponding to each field. You must register a schema in the Schema Registry (see next step), and producers must write to the topic using the Schema Registry wire format. Redpanda parses the schema used by the record based on the schema ID encoded in the payload header, and stores the topic values in the corresponding table columns.
94
94
* `disabled` (default): Disables writing to an Iceberg table for this topic.
95
95
--
@@ -222,7 +222,7 @@ Avro::
222
222
| timestamp | timestamp
223
223
|===
224
224
225
-
* Different flavors of time (such as time-millis) and timestamp (such as timestamp-millis) types are translated to the same Iceberg `time` and `timestamp` types respectively.
225
+
* Different flavors of time (such as `time-millis`) and timestamp (such as `timestamp-millis`) types are translated to the same Iceberg `time` and `timestamp` types respectively.
226
226
* Avro unions are flattened to Iceberg structs with optional fields:
227
227
** For example, the union `["int", "long", "float"]` is represented as an Iceberg struct `struct<0 INT NULLABLE, 1 LONG NULLABLE, 2 FLOAT NULLABLE>`.
228
228
** The union `["int", null, "float"]` is represented as an Iceberg struct `struct<0 INT NULLABLE, 1 FLOAT NULLABLE>`.
@@ -322,9 +322,9 @@ SELECT * FROM streaming.redpanda.ClickEvent;
322
322
323
323
Spark can use the REST catalog to automatically discover the topic's Iceberg table.
324
324
325
-
==== Filesystem-based catalog (`object_storage`)
325
+
==== File system-based catalog (`object_storage`)
326
326
327
-
If using the `object_storage` catalog type, you must set up the catalog integration in your processing engine accordingly. For example, you can configure Spark to use a filesystem-based catalog with at least the following properties:
327
+
If you are using the `object_storage` catalog type, you must set up the catalog integration in your processing engine accordingly. For example, you can configure Spark to use a file system-based catalog with at least the following properties:
328
328
329
329
```
330
330
spark.sql.catalog.streaming.type = hadoop
@@ -374,7 +374,7 @@ You can register the schema under the `ClickEvent-value` subject:
0 commit comments