Skip to content

Conversation

@kbatuigas
Copy link
Contributor

@kbatuigas kbatuigas commented Oct 7, 2024

Description

Cluster config reference entries will be added in this PR #846

Resolves https://github.com/redpanda-data/documentation-private/issues/
Review deadline: 26 November 2024

Page previews

Topics as Iceberg Tables

Checks

  • New feature
  • Content gap
  • Support Follow-up
  • Small fix (typos, links, copyedits, etc)

@kbatuigas kbatuigas requested a review from a team as a code owner October 7, 2024 18:31
@kbatuigas kbatuigas marked this pull request as draft October 7, 2024 18:32
@kbatuigas kbatuigas marked this pull request as ready for review October 10, 2024 19:26
@kbatuigas kbatuigas changed the title TS topics as Iceberg tables DOC-232 TS topics as Iceberg tables Oct 10, 2024
@Feediver1
Copy link
Contributor

@kbatuigas Please update the issue this resolves (above in Description) and add a review deadline. Thx.

@netlify
Copy link

netlify bot commented Oct 10, 2024

Deploy Preview for redpanda-docs-preview ready!

Name Link
🔨 Latest commit 903785c
🔍 Latest deploy log https://app.netlify.com/sites/redpanda-docs-preview/deploys/674dea9fa376df000892a3ff
😎 Deploy Preview https://deploy-preview-800--redpanda-docs-preview.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@kbatuigas
Copy link
Contributor Author

kbatuigas commented Oct 10, 2024

This preview currently has the new doc under the Manage > Tiered Storage section (with a page URL manage/topic-iceberg-integration), is a different section more appropriate, such as Develop?

And should the page URL be changed? For example, the blog post is at blog/apache-iceberg-topics-streaming-data

@kbatuigas kbatuigas requested a review from lf-rep October 10, 2024 20:50
Copy link

@lf-rep lf-rep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Kat -- I added my comments in-place, as answers to your questions.

@mattschumpert
Copy link

@kbatuigas can you please add me as a reviewer? Thanks!

In the Redpanda Iceberg integration, the manifest files are in JSON format.
* Catalog: Contains the current metadata pointer for the table. Clients reading and writing data to the table see the same version of the current state of the table. You'll configure your Iceberg catalog to point to your object storage bucket or container where the Redpanda data in Iceberg format is located. Redpanda uses the https://iceberg.apache.org/concepts/catalog/#catalog-implementations[Iceberg REST catalog^] endpoint to update your catalog when there are changes to the Iceberg data and metadata.

image::shared:iceberg-integration.png[]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know there's probably not time for this but I could do with some numbering on this diagram.

Copy link
Contributor Author

@kbatuigas kbatuigas Nov 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattschumpert does adding numbering in the provided diagram (same as the one used in the blog post from a while back) sound good? Is there anything else we should add to the design request in Monday?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure I understood this. if you want to narrate the diagram in the text and add numbers , fine with me.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense to add numbers

@kbatuigas kbatuigas force-pushed the 2428_ts-topics-iceberg branch from 755c619 to 1710e31 Compare December 2, 2024 16:13

== Enable Iceberg integration

To create an Iceberg table for a Redpanda topic, you must set the cluster configuration property `iceberg_enabled` to `true`, and also configure the topic property `redpanda.iceberg.mode`. You can choose to provide a schema if you need the Iceberg table to be structured with defined columns.
Copy link
Contributor

@Deflaimun Deflaimun Dec 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should link to the cluster and topic property reference

+
[,bash,]
----
rpk topic create <new-topic-name> --partitions 1 --replicas 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are the flags for partitions and replicas necessary? if not, I would not add them.

[,bash,role=no-copy]
----
TOPIC STATUS
new-topic-name OK
Copy link
Contributor

@Deflaimun Deflaimun Dec 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
new-topic-name OK
<new-topic-name> OK

whenever mentioning the same variable, try to keep the tags. If you add this, check if the preview looks good before merging

* `rest`: Connect to and update an Iceberg catalog using a REST API. See the https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.yaml[Iceberg REST Catalog API specification].
* `object_storage`: Write catalog files to the same object storage bucket as the data files. Use the object storage URL with an Iceberg client to access the catalog and data files for your Redpanda Iceberg tables.

Switching catalog types is not supported.
Copy link
Contributor

@Deflaimun Deflaimun Dec 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Switching catalog types is not supported.
Switching catalog types is not supported.

switching when? mid-flight? That's what I assume but we could clarify
Consider adding this to the limitations section if that makes sense

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bharathv would you be able to confirm?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once a single topic has iceberg enabled, then you cannot change the catalog type. I don't think we enforce this

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry just checking.. right we don't enforce it.


==== File system-based catalog (`object_storage`)

If you are using the `object_storage` catalog type, you must set up the catalog integration in your processing engine accordingly. For example, you can configure Spark to use a file system-based catalog with at least the following properties:
Copy link
Contributor

@Deflaimun Deflaimun Dec 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should mention that the example is for aws s3

* It is not possible to append topic data to an existing Iceberg table that is not created by Redpanda.
* If you enable the Iceberg integration on an existing Redpanda topic, Redpanda does not backfill the generated Iceberg table with topic data.
* JSON schemas are not currently supported. If the topic data is in JSON, use the `key_value` mode to store the JSON in Iceberg, which then can be parsed by most query engines.
* If you are using Avro or Protobuf data, you must use the Schema Registry wire format, where producers include the magic byte and schema ID in the message payload header. See also: xref:manage:schema-reg/schema-id-validation.adoc[] and the https://www.redpanda.com/blog/schema-registry-kafka-streaming#how-does-serialization-work-with-schema-registry-in-kafka[Understanding Apache Kafka Schema Registry^] blog post.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which Schema Registry? Redpanda or any? Does it make a difference?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rockwotj could you clarify here as well please?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only the built in one to Redpanda. External registries are not supported


=== Query topic in key-value mode

You can also forgo using a schema, which means using semi-structured data in Iceberg.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
You can also forgo using a schema, which means using semi-structured data in Iceberg.
You can also choose not to use a schema, allowing you to work with semi-structured data in Iceberg.

Consider using a more simple phrase. 'Forgo' is not common for non-native speakers

Copy link
Contributor

@Deflaimun Deflaimun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved. Check the most important topics as we discussed before merging

@kbatuigas kbatuigas merged commit 38b7b2d into v-WIP/24.3 Dec 2, 2024
7 checks passed
@kbatuigas kbatuigas deleted the 2428_ts-topics-iceberg branch December 2, 2024 17:19
Deflaimun added a commit that referenced this pull request Dec 2, 2024
Co-authored-by: Angela Simms <[email protected]>
Co-authored-by: Tyler Rockwood <[email protected]>
Co-authored-by: Paulo Borges <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.