Skip to content

Commit 58ded83

Browse files
kbatuigasFeediver1
andauthored
Tombstone retention (#829)
Co-authored-by: Joyce Fee <[email protected]>
1 parent 16e9cda commit 58ded83

File tree

4 files changed

+83
-8
lines changed

4 files changed

+83
-8
lines changed

modules/develop/pages/kafka-clients.adoc

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,6 @@ ifdef::env-cloud[]
5454
endif::[]
5555
ifndef::env-cloud[]
5656
+
57-
* The `delete.retention.ms` topic configuration in Kafka is not supported. Tombstone markers are not removed for topics with a `compact` xref:develop:config-topics.adoc#change-the-cleanup-policy[cleanup policy]. Redpanda only deletes tombstone markers when topics with a cleanup policy of `compact,delete` have reached their xref:manage:cluster-maintenance/disk-utilization.adoc#configure-message-retention[retention limits].
5857
* Quotas per user for bandwidth and API request rates. However, xref:manage:cluster-maintenance/manage-throughput.adoc#client-throughput-limits[quotas per client and per client group] using AlterClientQuotas and DescribeClientQuotas APIs are supported.
5958
endif::[]
6059

modules/get-started/pages/whats-new.adoc

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,10 @@ This topic includes new content added in version {page-component-version}. For a
77
* xref:redpanda-cloud:get-started:whats-new-cloud.adoc[]
88
* xref:redpanda-cloud:get-started:cloud-overview.adoc#redpanda-cloud-vs-self-managed-feature-compatibility[Redpanda Cloud vs Self-Managed feature compatibility]
99
10+
== Tombstone removal
11+
12+
Redpanda now supports the Kafka `delete.retention.ms` topic configuration. You can specify how long Redpanda keeps xref:manage:cluster-maintenance/compaction-settings.adoc#tombstone-record-removal[tombstone records] for compacted topics by setting `delete.retention.ms` at the topic level, or `tombstone_retention_ms` at the cluster level.
13+
1014
== Mountable topics
1115

1216
For topics with Tiered Storage enabled, you can unmount a topic to safely detach it from a cluster and keep the topic data in the cluster's object storage bucket or container. You can mount the detached topic to either the same origin cluster, or a different one. This allows you to hibernate a topic and free up system resources taken up by the topic, or migrate a topic to a different cluster. See xref:manage:mountable-topics.adoc[Mountable topics] for details.
@@ -58,3 +62,4 @@ The following cluster properties are new in this version:
5862

5963
* xref:reference:properties/cluster-properties.adoc#default_leaders_preference[`default_leaders_preference`]
6064
* xref:reference:properties/cluster-properties.adoc#rpk_path[`rpk_path`]
65+
* xref:reference:properties/cluster-properties.adoc#tombstone_retention_ms[`tombstone_retention_ms`]

modules/manage/pages/cluster-maintenance/compaction-settings.adoc

Lines changed: 59 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -6,25 +6,77 @@ Configure compaction for your cluster to optimize storage utilization.
66

77
== Redpanda compaction overview
88

9-
Compaction is an optional mechanism intended to reduce the storage needs of Redpanda topics. You can enable compaction through configuration of a cluster or topic's cleanup policy. When compaction is enabled as part of the cleanup policy, a background process executes on a pre-set interval to perform compaction operations. When triggered for a partition, the process purges older versions of messages for a given key and only retains the most recent message in that partition. This is done by analyzing closed segments in the partition, copying the most recent messages for each key into a new segment, then deleting the source segments.
9+
Compaction is an optional mechanism intended to reduce the storage needs of Redpanda topics. You can enable compaction through configuration of a cluster or topic's cleanup policy. When compaction is enabled as part of the cleanup policy, a background process executes on a pre-set interval to perform compaction operations. When triggered for a partition, the process purges older versions of records for a given key and only retains the most recent record in that partition. This is done by analyzing closed segments in the partition, copying the most recent records for each key into a new segment, then deleting the source segments.
1010

1111
image::shared:compaction-example.png[Example of topic compaction]
1212

13-
This diagram provides an illustration of a compacted topic. Imagine a remote sensor network that uses image recognition to track appearances of red pandas in a geographic area. The sensor network employs special devices that send messages to a topic when they detect one. You might enable compaction to reduce topic storage while still maintaining a record in the topic of the last time each device saw a red panda, perhaps to see if they stop frequenting a given area. The left side of the diagram shows all messages sent across the topic. The right side illustrates the results of compaction; older messages for certain keys are deleted from the message log.
13+
This diagram illustrates a compacted topic. Imagine a remote sensor network that uses image recognition to track appearances of red pandas in a geographic area. The sensor network employs special devices that send records to a topic when they detect one. You might enable compaction to reduce topic storage while still maintaining a record in the topic of the last time each device saw a red panda, perhaps to see if they stop frequenting a given area. The left side of the diagram shows all records sent across the topic. The right side illustrates the results of compaction; older records for certain keys are deleted from the log.
1414

15-
NOTE: If your application requires consuming every message for a given key, consider using the `delete` xref:develop:config-topics#change-the-cleanup-policy.adoc[cleanup policy] instead.
15+
NOTE: If your application requires consuming every record for a given key, consider using the `delete` xref:develop:config-topics#change-the-cleanup-policy.adoc[cleanup policy] instead.
1616

17-
IMPORTANT: When using Tiered Storage, compaction functions at the local storage level. As long as a segment remains in local storage, its messages are eligible for compaction. Once a segment is uploaded to tiered storage and removed from local storage it is not retrieved for further compaction operations. A key may therefore appear in multiple segments between Tiered Storage and local storage.
17+
IMPORTANT: When using xref:manage:tiered-storage.adoc[Tiered Storage], compaction functions at the local storage level. As long as a segment remains in local storage, its records are eligible for compaction. Once a segment is uploaded to object storage and removed from local storage it is not retrieved for further compaction operations. A key may therefore appear in multiple segments between Tiered Storage and local storage.
1818

1919
While compaction reduces storage needs, Redpanda's compaction (just like Kafka's) does not guarantee perfect de-duplication of a topic. It represents a best effort mechanism to reduce storage needs but duplicates of a key may still exist within a topic. Compaction is not a complete topic operation, either, since it operates on subsets of each partition within the topic.
2020

2121
== Configure cleanup policy
2222

2323
Compaction policy may be applied to a cluster or to an individual topic. If both are set, the topic-level policy overrides the cluster-level policy. The cluster-level xref:reference:cluster-properties.adoc#log_cleanup_policy[`log_cleanup_policy`] and the topic-level xref:reference:topic-properties.adoc#cleanuppolicy[`cleanup.policy`] support the following three options:
2424

25-
* `delete`: Messages are deleted from the topic once the specified retention period (time and/or size allocations) is exceeded. This is the default mechanism and is analogous to disabling compaction.
26-
* `compact`: This triggers only cleanup of messages with multiple versions. A message that represents the only version for a given key is not deleted.
27-
* `compact,delete`: This combines both policies, deleting messages exceeding the retention period while compacting multiple versions of messages.
25+
* `delete`: Records are deleted from the topic once the specified retention period (time and/or size allocations) is exceeded. This is the default mechanism and is analogous to disabling compaction.
26+
* `compact`: This triggers only cleanup of records with multiple versions. A record that represents the only version for a given key is not deleted.
27+
* `compact,delete`: This combines both policies, deleting records exceeding the retention period while compacting multiple versions of records.
28+
29+
== Tombstone record removal
30+
31+
Compaction also enables deletion of existing records through tombstones. For example, as data is deleted from a source system, clients produce a tombstone record to the log. A tombstone contains a key and the value `null`. Tombstones signal to brokers and consumers that records with the same key prior to it in the log should be deleted.
32+
33+
You can specify how long Redpanda keeps these tombstones for compacted topics using both a cluster configuration property config_ref:tombstone_retention_ms,true,properties/cluster-properties[] and a topic configuration property xref:reference:properties/topic-properties.adoc#deleteretentionms[`delete.retention.ms`]. If both are set, the topic-level tombstone retention policy overrides the cluster-level policy.
34+
35+
[NOTE]
36+
====
37+
Redpanda does not currently remove tombstone records for compacted topics with Tiered Storage enabled.
38+
39+
You cannot enable `tombstone_retention_ms` if you have enabled any of the Tiered Storage cluster properties `cloud_storage_enabled`, `cloud_storage_enable_remote_read`, and `cloud_storage_enable_remote_write`.
40+
41+
On the topic level, you cannot enable `delete.retention.ms` at the same time as the Tiered Storage topic configuration properties `redpanda.remote.read` and `redpanda.remote.write`.
42+
====
43+
44+
To set the cluster-level tombstone retention policy, run the command:
45+
46+
[,bash]
47+
----
48+
rpk cluster config set tombstone_retention_ms=100
49+
----
50+
51+
You can unset the tombstone retention policy for a topic so it inherits the cluster-wide default policy:
52+
53+
[,bash]
54+
----
55+
rpk topic alter-config <topic-name> --delete delete.retention.ms
56+
----
57+
58+
To override the cluster-wide default for a specific topic:
59+
60+
[,bash]
61+
----
62+
rpk topic alter-config <topic-name> --set delete.retention.ms=5
63+
----
64+
65+
To disable tombstone removal for a specific topic:
66+
67+
[,bash]
68+
----
69+
rpk topic alter-config <topic-name> --set delete.retention.ms=-1
70+
----
71+
72+
Redpanda removes tombstones as follows:
73+
74+
* For topics with a `compact` only cleanup policy: Tombstones are removed when the topic exceeds the tombstone retention limit. The `delete.retention.ms` or `tombstone_retention_ms` values therefore also set the time bound that a consumer has in order to see a complete view of the log with tombstones present before they are removed.
75+
* For topics with a `compact,delete` cleanup policy: Both the tombstone retention policy and standard garbage collection can remove tombstone records.
76+
77+
If obtaining a complete snapshot of the log, including tombstone records, is important to your consumers, set the tombstone retention value such that consumers have enough time for their reads to complete before tombstones are removed. Consumers may not see tombstones if their reads take longer than `delete.retention.ms` and `tombstone_retention_ms`. The trade-offs to ensuring tombstone visibility to consumers are increased disk usage and potentially slower compaction.
78+
79+
On the other hand, if more frequent cleanup of tombstones is important for optimizing workloads and space management, consider setting a shorter tombstone retention, for example the typical default of 24 hours (86400000 ms).
2880

2981
== Compaction policy settings
3082

modules/reference/pages/properties/topic-properties.adoc

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -547,6 +547,25 @@ NOTE: Although `replication.factor` isn't returned or displayed by xref:referenc
547547

548548
---
549549

550+
[[deleteretentionms]]
551+
==== delete.retention.ms
552+
553+
The retention time for tombstone records in a compacted topic. Redpanda removes tombstone records after the retention limit is exceeded.
554+
555+
If you have enabled Tiered Storage and set <<redpandaremoteread,`redpanda.remote.read`>> or <<redpandaremotewrite,`redpanda.remote.write`>> for the topic, you cannot enable tombstone removal.
556+
557+
If both `delete.retention.ms` and the cluster property config_ref:tombstone_retention_ms,true,properties/cluster-properties[] are set, `delete.retention.ms` overrides the cluster level tombstone retention for an individual topic.
558+
559+
*Unit:* milliseconds
560+
561+
**Default**: null
562+
563+
**Related topics**:
564+
565+
- xref:manage:cluster-maintenance/compaction-settings.adoc#tombstone-record-removal[Tombstone record removal]
566+
567+
---
568+
550569
== Related topics
551570

552571
- xref:develop:produce-data/configure-producers.adoc[Configure Producers]

0 commit comments

Comments
 (0)