From 020f1941dbac4720551d574a0e996ff8dd8e8212 Mon Sep 17 00:00:00 2001 From: Shirshanka Das Date: Wed, 29 Oct 2025 23:47:57 -0700 Subject: [PATCH] docs(kafka): clarify Kafka topic retention requirements - Mark MetadataGraphEvent_v4 as deprecated (removed in 2021, PR #3659) - Fix duplicate MetadataGraphEvent_v4 entry in kafka-config.md - Add description for PlatformEvent_v1 - Update DataHubUpgradeHistory_v1 retention guidance: - Recommend 7-30 days instead of infinite retention - Clarify that infinite retention is not required when DATAHUB_SYSTEM_UPDATE_WAIT_FOR_SYSTEM_UPDATE=false - GMS only reads the last message on startup, not the entire history These changes help users deploying DataHub in environments with Kafka retention restrictions while maintaining system reliability. --- docs/deploy/confluent-cloud.md | 6 +++--- docs/how/kafka-config.md | 9 ++++----- 2 files changed, 7 insertions(+), 8 deletions(-) diff --git a/docs/deploy/confluent-cloud.md b/docs/deploy/confluent-cloud.md index 0146379339241e..fe79be3be8d6fe 100644 --- a/docs/deploy/confluent-cloud.md +++ b/docs/deploy/confluent-cloud.md @@ -16,10 +16,10 @@ First, you'll need to create following new topics in the [Confluent Control Cent 6. (Deprecated) **MetadataChangeEvent_v4**: Metadata change proposal messages 7. (Deprecated) **MetadataAuditEvent_v4**: Metadata change log messages 8. (Deprecated) **FailedMetadataChangeEvent_v4**: Failed to process #1 event -9. **MetadataGraphEvent_v4**: -10. **PlatformEvent_v1** +9. (Deprecated) **MetadataGraphEvent_v4**: Legacy topic deprecated since 2021, no longer actively used +10. **PlatformEvent_v1**: High-level semantic events 11. **DataHubUpgradeHistory_v1**: Notifies the end of DataHub Upgrade job so dependants can act accordingly (_eg_, startup). - Note this topic requires special configuration: **Infinite retention**. Also, 1 partition is enough for the occasional traffic. + Note this topic requires special configuration: **Recommended: 7-30 day retention** (default config shows infinite retention for safety, but not required if `DATAHUB_SYSTEM_UPDATE_WAIT_FOR_SYSTEM_UPDATE=false`). Also, 1 partition is enough for the occasional traffic. The first five are the most important, and are explained in more depth in [MCP/MCL](../advanced/mcp-mcl.md). The final topics are those which are deprecated but still used under certain circumstances. It is likely that in the future they will be completely diff --git a/docs/how/kafka-config.md b/docs/how/kafka-config.md index 9b8b6d7751e30b..4e61908a7bca28 100644 --- a/docs/how/kafka-config.md +++ b/docs/how/kafka-config.md @@ -62,11 +62,10 @@ By default, DataHub relies on the a set of Kafka topics to operate. By default, 6. (Deprecated) **MetadataChangeEvent_v4**: Metadata change proposal messages 7. (Deprecated) **MetadataAuditEvent_v4**: Metadata change log messages 8. (Deprecated) **FailedMetadataChangeEvent_v4**: Failed to process #1 event -9. **MetadataGraphEvent_v4**: -10. **MetadataGraphEvent_v4**: -11. **PlatformEvent_v1**: -12. **DataHubUpgradeHistory_v1**: Notifies the end of DataHub Upgrade job so dependants can act accordingly (_eg_, startup). - Note this topic requires special configuration: **Infinite retention**. Also, 1 partition is enough for the occasional traffic. +9. (Deprecated) **MetadataGraphEvent_v4**: Legacy topic deprecated since 2021, no longer actively used +10. **PlatformEvent_v1**: High-level semantic events +11. **DataHubUpgradeHistory_v1**: Notifies the end of DataHub Upgrade job so dependants can act accordingly (_eg_, startup). + Note this topic requires special configuration: **Recommended: 7-30 day retention** (default config shows infinite retention for safety, but not required if `DATAHUB_SYSTEM_UPDATE_WAIT_FOR_SYSTEM_UPDATE=false`). Also, 1 partition is enough for the occasional traffic. How Metadata Events relate to these topics is discussed at more length in [Metadata Events](../what/mxe.md).