-
Notifications
You must be signed in to change notification settings - Fork 47
DOC-232 TS topics as Iceberg tables #800
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@kbatuigas Please update the issue this resolves (above in Description) and add a review deadline. Thx. |
✅ Deploy Preview for redpanda-docs-preview ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
|
This preview currently has the new doc under the Manage > Tiered Storage section (with a page URL And should the page URL be changed? For example, the blog post is at |
lf-rep
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Kat -- I added my comments in-place, as answers to your questions.
|
@kbatuigas can you please add me as a reviewer? Thanks! |
| In the Redpanda Iceberg integration, the manifest files are in JSON format. | ||
| * Catalog: Contains the current metadata pointer for the table. Clients reading and writing data to the table see the same version of the current state of the table. You'll configure your Iceberg catalog to point to your object storage bucket or container where the Redpanda data in Iceberg format is located. Redpanda uses the https://iceberg.apache.org/concepts/catalog/#catalog-implementations[Iceberg REST catalog^] endpoint to update your catalog when there are changes to the Iceberg data and metadata. | ||
|
|
||
| image::shared:iceberg-integration.png[] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know there's probably not time for this but I could do with some numbering on this diagram.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mattschumpert does adding numbering in the provided diagram (same as the one used in the blog post from a while back) sound good? Is there anything else we should add to the design request in Monday?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure I understood this. if you want to narrate the diagram in the text and add numbers , fine with me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense to add numbers
Co-authored-by: Tyler Rockwood <[email protected]>
Co-authored-by: Angela Simms <[email protected]>
Co-authored-by: Angela Simms <[email protected]>
755c619 to
1710e31
Compare
|
|
||
| == Enable Iceberg integration | ||
|
|
||
| To create an Iceberg table for a Redpanda topic, you must set the cluster configuration property `iceberg_enabled` to `true`, and also configure the topic property `redpanda.iceberg.mode`. You can choose to provide a schema if you need the Iceberg table to be structured with defined columns. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should link to the cluster and topic property reference
| + | ||
| [,bash,] | ||
| ---- | ||
| rpk topic create <new-topic-name> --partitions 1 --replicas 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are the flags for partitions and replicas necessary? if not, I would not add them.
| [,bash,role=no-copy] | ||
| ---- | ||
| TOPIC STATUS | ||
| new-topic-name OK |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| new-topic-name OK | |
| <new-topic-name> OK |
whenever mentioning the same variable, try to keep the tags. If you add this, check if the preview looks good before merging
| * `rest`: Connect to and update an Iceberg catalog using a REST API. See the https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.yaml[Iceberg REST Catalog API specification]. | ||
| * `object_storage`: Write catalog files to the same object storage bucket as the data files. Use the object storage URL with an Iceberg client to access the catalog and data files for your Redpanda Iceberg tables. | ||
|
|
||
| Switching catalog types is not supported. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Switching catalog types is not supported. | |
| Switching catalog types is not supported. |
switching when? mid-flight? That's what I assume but we could clarify
Consider adding this to the limitations section if that makes sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bharathv would you be able to confirm?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once a single topic has iceberg enabled, then you cannot change the catalog type. I don't think we enforce this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry just checking.. right we don't enforce it.
|
|
||
| ==== File system-based catalog (`object_storage`) | ||
|
|
||
| If you are using the `object_storage` catalog type, you must set up the catalog integration in your processing engine accordingly. For example, you can configure Spark to use a file system-based catalog with at least the following properties: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should mention that the example is for aws s3
| * It is not possible to append topic data to an existing Iceberg table that is not created by Redpanda. | ||
| * If you enable the Iceberg integration on an existing Redpanda topic, Redpanda does not backfill the generated Iceberg table with topic data. | ||
| * JSON schemas are not currently supported. If the topic data is in JSON, use the `key_value` mode to store the JSON in Iceberg, which then can be parsed by most query engines. | ||
| * If you are using Avro or Protobuf data, you must use the Schema Registry wire format, where producers include the magic byte and schema ID in the message payload header. See also: xref:manage:schema-reg/schema-id-validation.adoc[] and the https://www.redpanda.com/blog/schema-registry-kafka-streaming#how-does-serialization-work-with-schema-registry-in-kafka[Understanding Apache Kafka Schema Registry^] blog post. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which Schema Registry? Redpanda or any? Does it make a difference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rockwotj could you clarify here as well please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only the built in one to Redpanda. External registries are not supported
|
|
||
| === Query topic in key-value mode | ||
|
|
||
| You can also forgo using a schema, which means using semi-structured data in Iceberg. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| You can also forgo using a schema, which means using semi-structured data in Iceberg. | |
| You can also choose not to use a schema, allowing you to work with semi-structured data in Iceberg. |
Consider using a more simple phrase. 'Forgo' is not common for non-native speakers
Deflaimun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved. Check the most important topics as we discussed before merging
Co-authored-by: Paulo Borges <[email protected]>
Co-authored-by: Angela Simms <[email protected]> Co-authored-by: Tyler Rockwood <[email protected]> Co-authored-by: Paulo Borges <[email protected]>
Description
Cluster config reference entries will be added in this PR #846
Resolves https://github.com/redpanda-data/documentation-private/issues/
Review deadline: 26 November 2024
Page previews
Topics as Iceberg Tables
Checks