-
Notifications
You must be signed in to change notification settings - Fork 47
Single source Iceberg #1032
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Single source Iceberg #1032
Conversation
✅ Deploy Preview for redpanda-docs-preview ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
PR Change SummaryIntroduced the Iceberg integration for Redpanda, enabling cloud storage of topic data in the Iceberg format for improved analytics.
Added Files
How can I customize these reviews?Check out the Hyperlint AI Reviewer docs for more information on how to customize the review. If you just want to ignore it on this PR, you can add the Note specifically for link checks, we only check the first 30 links in a file and we cache the results for several hours (for instance, if you just added a page, you might experience this). Our recommendation is to add |
c390cdd to
53a0809
Compare
simon0191
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
|
||
| By default, Iceberg topics use the file-system based catalog (config_ref:iceberg_catalog_type,true,properties/cluster-properties[`iceberg_catalog_type`] cluster configuration set to `object_storage`). Redpanda stores the table metadata in https://iceberg.apache.org/javadoc/1.5.0/org/apache/iceberg/hadoop/HadoopCatalog.html[HadoopCatalog^] format in the same object storage bucket or container as the data files. | ||
|
|
||
| If using the `object_storage` catalog type, you provide the object storage URI of the table's metadata.json file to an Iceberg client so it can access the catalog and data files for your Redpanda Iceberg tables. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| If using the `object_storage` catalog type, you provide the object storage URI of the table's metadata.json file to an Iceberg client so it can access the catalog and data files for your Redpanda Iceberg tables. | |
| If using the `object_storage` catalog type, you provide the object storage URI of the table's `metadata.json` file to an Iceberg client so it can access the catalog and data files for your Redpanda Iceberg tables. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably put a reminder note in here about the fact that this metadata.json file only point to a specific table snapshot. Due to the limitations of the object storage catalog specification in Apache Iceberg, tables must be updated anytime a new snapshot is created using this catalog type (effectively, any time new data is written to the table). For more information, see the Apache Iceberg documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
|
|
||
| === Specify metadata location | ||
|
|
||
| The config_ref:iceberg_catalog_base_location,true,properties/cluster-properties[`iceberg_catalog_base_location`] property stores the base path for the file-system based catalog if using the `object_storage` catalog type. The default value is `redpanda-iceberg-catalog`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| The config_ref:iceberg_catalog_base_location,true,properties/cluster-properties[`iceberg_catalog_base_location`] property stores the base path for the file-system based catalog if using the `object_storage` catalog type. The default value is `redpanda-iceberg-catalog`. | |
| The config_ref:iceberg_catalog_base_location,true,properties/cluster-properties[`iceberg_catalog_base_location`] property stores the base path for the file system-based catalog if using the `object_storage` catalog type. The default value is `redpanda-iceberg-catalog`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part shouldn't be in CLOUD docs as this will not be editable (but read-only)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
|
@kbatuigas the beta badge is still showing in the preview docs! |
| ---- | ||
| + | ||
| The `value_schema_id_prefix` requires that you produce to a topic using the Schema Registry wire format, which includes the magic byte and schema ID in the prefix of the message payload. This allows Redpanda to identify the correct schema version in the Schema Registry for a record. See the https://www.redpanda.com/blog/schema-registry-kafka-streaming#how-does-serialization-work-with-schema-registry-in-kafka[Understanding Apache Kafka Schema Registry^] blog post to learn more about the wire format. | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The link goes to a subsection of the blog, but maybe the whole sentence should change. I searched the blog for "wire format" and "wire" and got nothing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated - still mention wire format, but not explicitly linking it to the referenced blog post
Co-authored-by: Michele Cyran <[email protected]>
Co-authored-by: Michele Cyran <[email protected]>
18cbbb9 to
2c66f3d
Compare
Co-authored-by: Michele Cyran <[email protected]>
|
@kbatuigas the beta badge still appears in these Self-Managed files |
I'll look for that in the next PR! |
micheleRP
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Co-authored-by: Michele Cyran <[email protected]>
Co-authored-by: Michele Cyran <[email protected]>
Description
Resolves https://redpandadata.atlassian.net/browse/
Review deadline: 3 April
This pull request includes significant changes to the Iceberg documentation, with a focus on restructuring and updating content related to Iceberg table access, catalog integration, and query examples. The most important changes include the addition of new content and reorganization of existing content into partials for better modularity.
This PR reorganizes content so that it can be shared with Cloud docs. The ifdef::env-byoc and ifndef::env-byoc directives indicate when content should or should not display specifically on a Cloud doc page. The AsciiDoc files for Cloud are added in this PR https://github.com/redpanda-data/cloud-docs/pull/240/files and will contain
env-byocpage attributes which the directives will evaluate and make Cloud-specific content display.Documentation Updates:
Iceberg Table Access and Query Examples:
Catalog Integration:
Branch Update:
cloud-docstoDOC-805-Document-feature-Iceberg-Beta-on-Cloudin thelocal-antora-playbook.ymlfile.Page previews
Self-Managed:
https://deploy-preview-1032--redpanda-docs-preview.netlify.app/25.1/manage/iceberg/topic-iceberg-integration
https://deploy-preview-1032--redpanda-docs-preview.netlify.app/25.1/manage/iceberg/use-iceberg-catalogs
https://deploy-preview-1032--redpanda-docs-preview.netlify.app/25.1/manage/iceberg/query-iceberg-topics
Cloud doc previews available in redpanda-data/cloud-docs#240
Checks