Skip to content

Commit 37008e0

Browse files
Merge pull request #214533 from kasun04/main
Log compaction support
2 parents 8b4877d + cb2afb3 commit 37008e0

File tree

8 files changed

+155
-7
lines changed

8 files changed

+155
-7
lines changed

articles/event-hubs/TOC.yml

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -92,6 +92,8 @@
9292
href: event-hubs-capture-overview.md
9393
- name: Explore captured Avro files
9494
href: explore-captured-avro-files.md
95+
- name: Log compaction
96+
href: log-compaction.md
9597
- name: Application groups
9698
href: resource-governance-overview.md
9799
- name: Tiers
@@ -173,8 +175,6 @@
173175
href: event-hubs-amqp-troubleshoot.md
174176
- name: How-to guides
175177
items:
176-
- name: Create and manage application groups
177-
href: resource-governance-with-app-groups.md
178178
- name: Develop
179179
items:
180180
- name: Get Event Hubs connection string
@@ -203,6 +203,11 @@
203203
href: https://github.com/Azure/azure-sdk-for-python/blob/master/sdk/eventhub/azure-eventhub/README.md
204204
- name: JavaScript in GitHub (azure/event-hubs)
205205
href: https://github.com/Azure/azure-sdk-for-js/blob/master/sdk/eventhub/event-hubs/README.md
206+
- name: Use log compaction
207+
href: use-log-compaction.md
208+
- name: Create and manage application groups
209+
href: resource-governance-with-app-groups.md
210+
206211
- name: Process data
207212
items:
208213
- name: Capture Event Hubs data in Parquet format

articles/event-hubs/event-hubs-features.md

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ Any entity that sends data to an event hub is an *event publisher* (synonymously
3333

3434
You can publish an event via AMQP 1.0, the Kafka protocol, or HTTPS. The Event Hubs service provides [REST API](/rest/api/eventhub/) and [.NET](event-hubs-dotnet-standard-getstarted-send.md), [Java](event-hubs-java-get-started-send.md), [Python](event-hubs-python-get-started-send.md), [JavaScript](event-hubs-node-get-started-send.md), and [Go](event-hubs-go-get-started-send.md) client libraries for publishing events to an event hub. For other runtimes and platforms, you can use any AMQP 1.0 client, such as [Apache Qpid](https://qpid.apache.org/).
3535

36-
The choice to use AMQP or HTTPS is specific to the usage scenario. AMQP requires the establishment of a persistent bidirectional socket in addition to transport level security (TLS) or SSL/TLS. AMQP has higher network costs when initializing the session, however HTTPS requires additional TLS overhead for every request. AMQP has significantly higher performance for frequent publishers and can achieve much lower latencies when used with asynchronous publishing code.
36+
The choice to use AMQP or HTTPS is specific to the usage scenario. AMQP requires the establishment of a persistent bidirectional socket in addition to transport level security (TLS) or SSL/TLS. AMQP has higher network costs when initializing the session, however HTTPS requires extra TLS overhead for every request. AMQP has higher performance for frequent publishers and can achieve much lower latencies when used with asynchronous publishing code.
3737

3838
You can publish events individually or batched. A single publication has a limit of 1 MB, regardless of whether it's a single event or a batch. Publishing events larger than this threshold will be rejected.
3939

@@ -163,6 +163,14 @@ If a reader disconnects from a partition, when it reconnects it begins reading a
163163
> - [JavaScript](https://github.com/Azure/azure-sdk-for-js/tree/main/sdk/eventhub/eventhubs-checkpointstore-blob/samples/v1/javascript) or [TypeScript](https://github.com/Azure/azure-sdk-for-js/tree/main/sdk/eventhub/eventhubs-checkpointstore-blob/samples/v1/typescript)
164164
> - [Python](https://github.com/Azure/azure-sdk-for-python/blob/master/sdk/eventhub/azure-eventhub-checkpointstoreblob-aio/samples/)
165165
166+
167+
### Log compaction
168+
169+
Azure Event Hubs supports compacting event log to retain the latest events of a given event key. With compacted event hubs/Kafka topic, you can use key-baesd retention rather than using the coarser-grained time-based retention.
170+
171+
For more information on log compaction, see [Log compaction](log-compaction.md).
172+
173+
166174
### Common consumer tasks
167175

168176
All Event Hubs consumers connect via an AMQP 1.0 session, a state-aware bidirectional communication channel. Each partition has an AMQP 1.0 session that facilitates the transport of events segregated by partition.

articles/event-hubs/event-hubs-for-kafka-ecosystem-overview.md

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -120,10 +120,6 @@ This feature is fundamentally at odds with Azure Event Hubs' multi-protocol mode
120120

121121
The payload of any Event Hubs event is a byte stream and the content can be compressed with an algorithm of your choosing. The Apache Avro encoding format supports compression natively.
122122

123-
### Log Compaction
124-
125-
Apache Kafka log compaction is a feature that allows evicting all but the last record of each key from a partition, which effectively turns an Apache Kafka topic into a key-value store where the last value added overrides the previous one. This feature is presently not implemented by Azure Event Hubs. The key-value store pattern, even with frequent updates, is far better supported by database services like [Azure Cosmos DB](../cosmos-db/introduction.md). For more information, see [Log Projection](event-hubs-federation-overview.md#log-projections).
126-
127123
### Kafka Streams
128124

129125
Kafka Streams is a client library for stream analytics that is part of the Apache Kafka open-source project, but is separate from the Apache Kafka event stream broker.

articles/event-hubs/log-compaction.md

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
---
2+
title: Log compaction
3+
description: This article describes how the log compaction feature works in Event Hubs.
4+
ms.topic: article
5+
ms.date: 10/7/2022
6+
ms.custom: ignite-2022
7+
---
8+
9+
# Log compaction
10+
11+
Log compaction is a way of retaining data in Event Hubs using event key based retention. By default, each event hub/Kafka topic is created with time-based retention or *delete* cleanup policy, where events are purged upon the expiration of the retention time. Rather using coarser-grained time based retention, you can use event key-based retention mechanism where Event Hubs retrains the last known value for each event key of an event hub or a Kafka topic.
12+
13+
> [!NOTE]
14+
> Log compaction feature is available only in **premium** and **dedicated** tiers.
15+
16+
As shown below, an event log (of an event hub partition) may have multiple events with the same key. If you're using a compacted event hub, then Event Hubs service will take care of purging old events and only keeping the latest events of a given event key.
17+
18+
:::image type="content" source="./media/event-hubs-log-compaction/log-compaction.png" alt-text="Diagram showing how a topic gets compacted." lightbox="./media/event-hubs-resource-governance-overview/app-groups.png":::
19+
20+
### Compaction key
21+
The partition key that you set with each event is used as the compaction key.
22+
23+
### Tombstones
24+
Client application can mark existing events of an event hub to be deleted during compaction job. These markers are known as *Tombstones*. Tombstones are set by the client applications by sending a new event with an existing key and a `null` event payload.
25+
26+
## How log compaction works
27+
28+
You can enable log compaction at each event hub/Kafka topic level. You can ingest events to a compacted article from any support protocol. Azure Event Hubs service runs a compaction job for each compacted event hub. Compaction job cleans each event hub partition log by only retaining the latest event of a given event key.
29+
30+
:::image type="content" source="./media/event-hubs-log-compaction/how-compaction-work.png" alt-text="Diagram showing how log compaction works." lightbox="./media/event-hubs-log-compaction/how-compaction-work.png":::
31+
32+
At a given time the event log of a compacted event hub can have a *cleaned* portion and *dirty* portion. The clean portion contains the events that are compacted by the compaction job while the dirty portion comprises the events that are yet to be compacted.
33+
34+
The execution of the compaction job is managed by the Event Hubs service and user can't control it. Therefore, Event Hubs service determines when to start compaction and how fast it compact a given compacted event hub.
35+
36+
## Compaction guarantees
37+
Log compaction feature of Event Hubs provides the following guarantee:
38+
- Ordering of the messages is always maintained at the key and partition level. Compaction job doesn't alter ordering of messages but it just discards the old events of the same key.
39+
- The sequence number and offset of a message never changes.
40+
- Any consumer progressing from the start of the event log will see at least the final state of all events in the order they were written.
41+
- Events that the user mark to be deleted can still be seen by consumers for time defined by *Tombstone Retention Time(hours)*.
42+
43+
44+
## Log compaction use cases
45+
Log compaction can be useful in scenarios where you stream the same set of updatable events. As compacted event hubs only keep the latest events, users don't need to worry about the growth of the event storage. Therefore log compaction is commonly used in scenarios such as Change Data Capture(CDC), maintaining event in tables for stream processing applications and event caching.
46+
47+
48+
## Next steps
49+
For instructions on how to use log compaction in Event Hubs, see [Use log compaction](./use-log-compaction.md)
24.9 KB
Loading
225 KB
Loading
106 KB
Loading
Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
---
2+
title: Use log compaction
3+
description: Learn how to use log compaction.
4+
ms.topic: how-to
5+
ms.custom: log-compaction
6+
ms.date: 10/7/2022
7+
---
8+
9+
# Use log compaction
10+
This article shows you how to use log compaction feature in Event Hubs. To understand the details of log compaction, see [Log Compaction](log-compaction.md).
11+
12+
In this article you'll, follow these key steps:
13+
- Create a compacted event hub/Kafka topic.
14+
- Publish events to a compacted event hub.
15+
- Consume events from a compacted event hub.
16+
17+
> [!NOTE]
18+
> Log compaction feature is available only in **premium** and **dedicated** tiers.
19+
20+
## Create a compacted event hub/Kafka topic
21+
This section shows you how to create a compacted event hub using Azure portal and an Azure Resource Manager (ARM) template.
22+
23+
### [Azure portal](#tab/portal)
24+
You can create a compacted event hub using the Azure portal by following these steps.
25+
26+
1. Navigate to your Event Hubs namespace.
27+
1. On the Event Hubs Namespace page, select Event Hubs in the left menu.
28+
1. At the top of the window, select + Event Hubs.
29+
:::image type="content" source="./media/event-hubs-quickstart-portal/create-event-hub4.png" alt-text="Screenshot of event hub creation UI.":::
30+
1. Type a *name* for your event hub, and specify the *partition count*. Since we're creating a compacted event hub, select *compaction policy* as *compaction* and provide the desired value for *tombstone retention time*.
31+
:::image type="content" source="./media/event-hubs-log-compaction/enabling-compaction.png" alt-text="Screenshot of the event hubs creation UI with compaction related attributes.":::
32+
1. Select *create* and create the compacted event hub.
33+
34+
### [ARM template](#tab/arm)
35+
The following example shows how to create a compacted event hub/Kafka topic using an ARM template.
36+
37+
```json
38+
"resources": [
39+
{
40+
"apiVersion": "2017-04-01",
41+
"name": "[parameters('eventHubName')]",
42+
"type": "eventhubs",
43+
"dependsOn": [
44+
"[resourceId('Microsoft.EventHub/namespaces/', parameters('eventHubNamespaceName'))]"
45+
],
46+
"properties": {
47+
"partitionCount": "[parameters('partitionCount')]",
48+
"retentionDescription": {
49+
"cleanupPolicy": "compact",
50+
"tombstoneRetentionTimeInHours": "24"
51+
}
52+
}
53+
}
54+
]
55+
```
56+
57+
58+
---
59+
60+
## Triggering compaction
61+
Event Hubs service determines when the compaction job of a given compacted event hub should be executed. Compacted event hub reaches the compaction threshold when there are considerable number of events or the total size of a given event log grows significantly.
62+
63+
## Publish event to a compacted topic
64+
Publishing events to a compacted event hub is the same as publishing events to a regular event hub. As the client application you only need to determine the compaction key, which you set using partition key.
65+
66+
### Using Event Hubs SDK(AMQP)
67+
With Event Hubs SDK, you can set partition key and publish events as shown below:
68+
69+
```csharp
70+
var enqueueOptions = new EnqueueEventOptions
71+
{
72+
PartitionKey = "Key-1"
73+
74+
};
75+
await producer.EnqueueEventAsync(eventData, enqueueOptions);
76+
```
77+
78+
### Using Kafka
79+
With Kafka you can set the partition key when you create the `ProducerRecord` as shown below:
80+
```java
81+
ProducerRecord<String, String> record = new ProducerRecord<String, String>(TOPIC, "Key-1" , "Value-1");
82+
```
83+
84+
85+
## Consuming events from a compacted topic
86+
There are no changes required at the consumer side to consume events from a compacted event hub. So, you can use any of the existing consumer applications to consume data from a compacted event hub.
87+
88+
## Next steps
89+
90+
- For conceptual information on how log compaction work, see [Log compaction](log-compaction.md).

0 commit comments

Comments
 (0)