Skip to content

Commit 18590dc

Browse files
committed
Add log compaction docs
1 parent a74cfd8 commit 18590dc

File tree

5 files changed

+139
-0
lines changed

5 files changed

+139
-0
lines changed

articles/event-hubs/log-compaction.md

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
---
2+
title: Log Compaction
3+
description: This article describes how log compaction feature works in Event Hubs.
4+
ms.topic: article
5+
ms.date: 10/7/2022
6+
ms.custom: ignite-2022
7+
---
8+
9+
# Log Compaction
10+
11+
Log compaction is a way of retaining data in Event Hubs using event key based retention. By default, each event hub/Kafka topic is created with time-based retention or deletes* cleanup policy, where events are purged upon the expiration of the retention time. Rather using coarser-grained time based retention, you can use event key-based retention mechanism where Event Hubs retrains the last known value for each event key of an event hub or a Kafka topic.
12+
13+
> [!NOTE]
14+
> Log compaction feature is available only in **premium** and **dedicated** tiers.
15+
16+
As shown below, an event log (of an event hub partition) may have multiple events with the same key. If you're using a compacted event hub, then Event Hubs service will take care of purging old events and only keeping the latest events of a given event key.
17+
18+
:::image type="content" source="./media/event-hubs-log-compaction/log-compaction.png" alt-text="Image showing capturing of Event Hubs data into Azure Storage or Azure Data Lake Storage." lightbox="./media/event-hubs-resource-governance-overview/app-groups.png":::
19+
20+
### Compaction key
21+
The partition key that you set with each event is used as the compaction key.
22+
23+
### Tombstones
24+
Client application can mark existing events of an event hub to be deleted during compaction job. These markers are known as *Tombstones*. Tombstones are set by the client applications by sending a new event with an existing key and a `null` event payload.
25+
26+
## How log compaction works
27+
28+
You can enable log compaction at each event hub/Kafka topic level. You can ingest events to a compacted article from any support protocol. Azure Event Hubs service runs a compaction job for each compacted event hub. Compaction job cleans each event hub partition log by only retaining the latest event of a given event key.
29+
30+
:::image type="content" source="./media/event-hubs-log-compaction/how-compaction-work.png" alt-text="Image showing capturing of Event Hubs data into Azure Storage or Azure Data Lake Storage." lightbox="./media/event-hubs-log-compaction/how-compaction-work.png":::
31+
32+
At a given time the event log of a compacted event hub can have a *cleaned* portion and *dirty* portion. The clean portion contains the events that are compacted by the compaction job while the dirty portion comprises the events that are yet to be compacted.
33+
34+
The execution of the compaction job is managed by the Event Hubs service and user can't control it. Therefore, Event Hubs service determines when to start compaction and how fast it compact a given compacted event hub.
35+
36+
## Compaction guarantees
37+
Log compaction feature of Event Hubs provides the following guarantee:
38+
- Ordering of the messages is always maintained at the key and partition level. Compaction job doesn't alter ordering of messages but it just discards the old events of the same key.
39+
- The sequence number and offset of a message never changes.
40+
- Any consumer progressing from the start of the event log will see at least the final state of all events in the order they were written.
41+
- Events that the user mark to be deleted can still be seen by consumers for time defined by *Tombstone Retention Time(hours)*.
42+
43+
44+
## Log compaction use cases
45+
Log compaction can be useful in scenarios where you stream the same set of updatable events. As compacted event hubs only keep the latest events, users don't need to worry about the growth of the event storage. Therefore log compaction is commonly used in scenarios such as Change Data Capture(CDC), maintaining event in tables for stream processing applications and event caching.
46+
47+
48+
## Next steps
49+
For instructions on how to use log compaction in Event Hubs, see [Use log compaction](user-log-compaction.md)
24.9 KB
Loading
225 KB
Loading
106 KB
Loading
Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
---
2+
title: Use log compaction
3+
description: Learn how to use log compaction.
4+
ms.topic: how-to
5+
ms.custom: log-compaction
6+
ms.date: 10/7/2022
7+
---
8+
9+
# Use log compaction
10+
This article shows you how to create log compaction feature in Event Hubs. To understand the details of log compaction, see [Log Compaction](log-compaction.md).
11+
12+
In this article you'll, follow these key steps:
13+
- Create a compacted event hub/Kafka topic.
14+
- Publish events to a compacted event hub.
15+
- Consume events from a compacted event hub.
16+
17+
> [!NOTE]
18+
> Log compation feature is available only in **premium** and **dedicated** tiers.
19+
20+
## Create a compacted event hub/Kafka topic
21+
This section shows you how to create a compacted event hub using Azure portal and an Azure Resource Manager (ARM) template.
22+
23+
### [Azure portal](#tab/portal)
24+
You can create a compacted event hub using the Azure portal by following these steps.
25+
26+
1. Navigate to your Event Hubs namespace.
27+
1. On the Event Hubs Namespace page, select Event Hubs in the left menu.
28+
1. At the top of the window, select + Event Hubs.
29+
:::image type="content" source="./media/event-hubs-quickstart-portal/create-event-hub4.png" alt-text="Screenshot of the Application Groups page in the Azure portal.":::
30+
1. Type a *name* for your event hub, and specify the *partition count*. Since we're creating a compacted event hub, select *compaction policy* as *compaction* and provide the desired value for *tombstone retention time*.
31+
:::image type="content" source="./media/event-hubs-log-compaction/enabling-compaction.png" alt-text="Screenshot of the Application Groups page in the Azure portal.":::
32+
1. Select *create* and create the compacted event hub.
33+
34+
### [ARM template](#tab/arm)
35+
The following example shows how to create a compacted event hub/Kafka topic using an ARM template.
36+
37+
```json
38+
"resources": [
39+
{
40+
"apiVersion": "2017-04-01",
41+
"name": "[parameters('eventHubName')]",
42+
"type": "eventhubs",
43+
"dependsOn": [
44+
"[resourceId('Microsoft.EventHub/namespaces/', parameters('eventHubNamespaceName'))]"
45+
],
46+
"properties": {
47+
"partitionCount": "[parameters('partitionCount')]",
48+
"retentionDescription": {
49+
"cleanupPolicy": "compact",
50+
"tombstoneRetentionTimeInHours": "24"
51+
}
52+
}
53+
}
54+
]
55+
```
56+
57+
58+
---
59+
60+
## Triggering compaction
61+
Event Hubs service determines when the compaction job of a given compacted event hub should be executed. Compacted event hub reaches the compaction threshold when there are considerable number of events or the total size of a given event log grows significantly.
62+
63+
## Publish event to a compacted topic
64+
Publishing events to a compacted event hub is the same as publishing events to a regular event hub. As the client application you only need to determine the compaction key, which you set using partition key.
65+
66+
### Using Event Hubs SDK(AMQP)
67+
With Event Hubs SDK, you can set partition key and publish events as shown below:
68+
69+
```csharp
70+
var enqueueOptions = new EnqueueEventOptions
71+
{
72+
PartitionKey = "Key-1"
73+
74+
};
75+
await producer.EnqueueEventAsync(eventData, enqueueOptions);
76+
```
77+
78+
### Using Kafka
79+
With Kafka you can set the partition key when you create the `ProducerRecord` as shown below:
80+
```java
81+
ProducerRecord<String, String> record = new ProducerRecord<String, String>(TOPIC, "Key-1" , "Value-1");
82+
```
83+
84+
85+
## Consuming events from a compacted topic
86+
There are no changes required at the consumer side to consume events from a compacted event hub. So, you can use any of the existing consumer applications to consume data from a compacted event hub.
87+
88+
## Next steps
89+
90+
- For conceptual information on how log compaction work, see [Log compaction](log-compaction.md).

0 commit comments

Comments
 (0)