Skip to content

Commit 27112ac

Browse files
committed
Acrolynx
1 parent e07995c commit 27112ac

19 files changed

+82
-82
lines changed

articles/event-hubs/apache-kafka-configurations.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ description: This article provides recommended Apache Kafka configurations for c
44
ms.topic: reference
55
ms.subservice: kafka
66
ms.custom: devx-track-extended-java
7-
ms.date: 03/30/2022
7+
ms.date: 03/06/2025
88
---
99

1010
# Recommended configurations for Apache Kafka clients
@@ -17,16 +17,16 @@ Here are the recommended configurations for using Azure Event Hubs from Apache K
1717
Property | Recommended values | Permitted range | Notes
1818
---|---:|-----:|---
1919
`metadata.max.age.ms` | 180000 (approximate) | < 240000 | Can be lowered to pick up metadata changes sooner.
20-
`connections.max.idle.ms` | 180000 | < 240000 | Azure closes inbound Transmission Control Protocol (TCP) idle > 240,000 ms, which can result in sending on dead connections (shown as expired batches because of send timeout).
20+
`connections.max.idle.ms` | 180000 | < 240000 | Azure closes inbound Transmission Control Protocol (TCP) idle > 240,000 ms, which can result in sending on dead connections (shown as expired batches because of send time-out).
2121

2222
### Producer configurations only
2323
Producer configs can be found [here](https://kafka.apache.org/documentation/#producerconfigs).
2424

2525
|Property | Recommended Values | Permitted Range | Notes|
2626
|---|---:|---:|---|
2727
|`max.request.size` | 1000000 | < 1046528 | The service closes connections if requests larger than 1,046,528 bytes are sent. *This value **must** be changed and causes issues in high-throughput produce scenarios.*|
28-
|`retries` | > 0 | | Might require increasing delivery.timeout.ms value, see documentation.|
29-
|`request.timeout.ms` | 30000 .. 60000 | > 20000| Event Hubs internally defaults to a minimum of 20,000 ms. *While requests with lower timeout values are accepted, client behavior isn't guaranteed.* <p>Make sure that your **request.timeout.ms** is at least the recommended value of 60000 and your **session.timeout.ms** is at least the recommended value of 30000. Having these settings too low could cause consumer timeouts, which then cause rebalances (which then cause more timeouts, which cause more rebalancing, and so on).</p>|
28+
|`retries` | > 0 | | Might require increasing `delivery.timeout.ms` value, see documentation.|
29+
|`request.timeout.ms` | 30000 .. 60000 | > 20000| Event Hubs internally defaults to a minimum of 20,000 ms. *While requests with lower time out values are accepted, client behavior isn't guaranteed.* <p>Make sure that your **request.timeout.ms** is at least the recommended value of 60000 and your **session.timeout.ms** is at least the recommended value of 30000. Having these settings too low could cause consumer time-outs, which then cause rebalances (which then cause more time-outs, which cause more rebalancing, and so on).</p>|
3030
|`metadata.max.idle.ms` | 180000 | > 5000 | Controls how long the producer caches metadata for a topic that's idle. If the elapsed time since a topic was last produced exceeds the metadata idle duration, then the topic's metadata is forgotten and the next access to it will force a metadata fetch request.|
3131
|`linger.ms` | > 0 | | For high throughput scenarios, linger value should be equal to the highest tolerable value to take advantage of batching.|
3232
|`delivery.timeout.ms` | | | Set according to the formula (`request.timeout.ms` + `linger.ms`) * `retries`.|
@@ -38,8 +38,8 @@ Consumer configs can be found [here](https://kafka.apache.org/documentation/#con
3838
Property | Recommended Values | Permitted Range | Notes
3939
---|---:|-----:|---
4040
`heartbeat.interval.ms` | 3000 | | 3000 is the default value and shouldn't be changed.
41-
`session.timeout.ms` | 30000 |6000 .. 300000| Start with 30000, increase if seeing frequent rebalancing because of missed heartbeats.<p>Make sure that your request.timeout.ms is at least the recommended value of 60000 and your session.timeout.ms is at least the recommended value of 30000. Having these settings too low could cause consumer timeouts, which then cause rebalances (which then cause more timeouts, which cause more rebalancing, and so on).</p>
42-
`max.poll.interval.ms` | 300000 (default) |>session.timeout.ms| Used for rebalance timeout, so it shouldn't be set too low. Must be greater than session.timeout.ms.
41+
`session.timeout.ms` | 30000 |6000 .. 300000| Start with 30000, increase if seeing frequent rebalancing because of missed heartbeats.<p>Make sure that your request.timeout.ms is at least the recommended value of 60000 and your session.timeout.ms is at least the recommended value of 30000. Having these settings too low could cause consumer time-outs, which then cause rebalances (which then cause more time-outs, which cause more rebalancing, and so on).</p>
42+
`max.poll.interval.ms` | 300000 (default) |>session.timeout.ms| Used for rebalance time-out, so it shouldn't be set too low. Must be greater than session.timeout.ms.
4343

4444
## librdkafka configuration properties
4545
The main `librdkafka` configuration file ([link](https://github.com/edenhill/librdkafka/blob/master/CONFIGURATION.md)) contains extended descriptions for the properties described in the following sections.
@@ -56,7 +56,7 @@ Property | Recommended Values | Permitted Range | Notes
5656
|Property | Recommended Values | Permitted Range | Notes|
5757
|---|---:|-----:|---|
5858
|`retries` | > 0 | | Default is 2147483647.|
59-
|`request.timeout.ms` | 30000 .. 60000 | > 20000| Event Hubs internally defaults to a minimum of 20,000 ms. `librdkafka` default value is 5000, which can be problematic. *While requests with lower timeout values are accepted, client behavior isn't guaranteed.*|
59+
|`request.timeout.ms` | 30000 .. 60000 | > 20000| Event Hubs internally defaults to a minimum of 20,000 ms. `librdkafka` default value is 5000, which can be problematic. *While requests with lower time-out values are accepted, client behavior isn't guaranteed.*|
6060
|`partitioner` | `consistent_random` | See librdkafka documentation | `consistent_random` is default and best. Empty and null keys are handled ideally for most cases.|
6161
|`compression.codec` | `none, gzip` || Only gzip compression is currently supported.|
6262

@@ -66,7 +66,7 @@ Property | Recommended Values | Permitted Range | Notes
6666
---|---:|-----:|---
6767
`heartbeat.interval.ms` | 3000 || 3000 is the default value and shouldn't be changed.
6868
`session.timeout.ms` | 30000 |6000 .. 300000| Start with 30000, increase if seeing frequent rebalancing because of missed heartbeats.
69-
`max.poll.interval.ms` | 300000 (default) |>session.timeout.ms| Used for rebalance timeout, so it shouldn't be set too low. Must be greater than session.timeout.ms.
69+
`max.poll.interval.ms` | 300000 (default) |>session.timeout.ms| Used for rebalance time out, so it shouldn't be set too low. Must be greater than session.timeout.ms.
7070

7171

7272
## Further notes
@@ -75,7 +75,7 @@ Check the following table of common configuration-related error scenarios.
7575

7676
Symptoms | Problem | Solution
7777
----|---|-----
78-
Offset commit failures because of rebalancing | Your consumer is waiting too long in between calls to poll() and the service is kicking the consumer out of the group. | You have several options: <ul><li>Increase poll processing timeout (`max.poll.interval.ms`)</li><li>Decrease message batch size to speed up processing</li><li>Improve processing parallelization to avoid blocking consumer.poll()</li></ul> Applying some combination of the three is likely wisest.
78+
Offset commit failures because of rebalancing | Your consumer is waiting too long in between calls to poll() and the service is kicking the consumer out of the group. | You have several options: <ul><li>Increase poll processing time out (`max.poll.interval.ms`)</li><li>Decrease message batch size to speed up processing</li><li>Improve processing parallelization to avoid blocking consumer.poll()</li></ul> Applying some combination of the three is likely wisest.
7979
Network exceptions at high produce throughput | If you're using Java client + default max.request.size, your requests might be too large. | See Java configs mentioned earlier.
8080

8181
## Next steps

articles/event-hubs/apache-kafka-developer-guide.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
title: Apache Kafka developer guide for Event Hubs
33
description: This article provides links to articles that describe how to integrate your Kafka applications with Azure Event Hubs.
4-
ms.date: 12/18/2024
4+
ms.date: 03/06/2025
55
ms.subservice: kafka
66
ms.topic: article
77
---

articles/event-hubs/apache-kafka-frequently-asked-questions.yml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ metadata:
55
ms.topic: faq
66
ms.subservice: kafka
77

8-
ms.date: 10/14/2022
8+
ms.date: 03/06/2025
99
title: Frequently asked questions - Event Hubs for Apache Kafka
1010
summary: This article provides answers to some of the frequently asked questions on migrating to Event Hubs for Apache Kafka.
1111

@@ -34,8 +34,8 @@ sections:
3434
- They're autocreated. Kafka groups can be managed via the Kafka consumer group APIs.
3535
- They can store offsets in the Event Hubs service.
3636
- They're used as keys in what is effectively an offset key-value store. For a unique pair of `group.id` and `topic-partition`, we store an offset in Azure Storage (3x replication). Event Hubs users don't incur extra storage costs from storing Kafka offsets. Offsets are manipulable via the Kafka consumer group APIs, but the offset storage *accounts* aren't directly visible or manipulable for Event Hubs users.
37-
- They span a namespace. Using the same Kafka group name for multiple applications on multiple topics means that all applications and their Kafka clients will be rebalanced whenever only a single application needs rebalancing. Choose your group names wisely.
38-
- They fully distinct from Event Hubs consumer groups. You **don't** need to use '$Default', nor do you need to worry about Kafka clients interfering with AMQP workloads.
37+
- They span a namespace. Using the same Kafka group name for multiple applications on multiple topics means that all applications and their Kafka clients are rebalanced whenever only a single application needs rebalancing. Choose your group names wisely.
38+
- They fully distinct from Event Hubs consumer groups. You **don't** need to use `$Default`, nor do you need to worry about Kafka clients interfering with AMQP workloads.
3939
- They aren't viewable in the Azure portal. Consumer group info is accessible via Kafka APIs.
4040
4141
- question: |

articles/event-hubs/apache-kafka-migration-guide.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ title: Migrate to Azure Event Hubs for Apache Kafka
33
description: This article explains how to migrate clients from Apache Kafka to Azure Event Hubs.
44
ms.topic: article
55
ms.subservice: kafka
6-
ms.date: 12/18/2024
6+
ms.date: 03/06/2025
77
---
88

99
# Migrate to Azure Event Hubs for Apache Kafka Ecosystems

articles/event-hubs/apache-kafka-streams.md

Lines changed: 14 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ title: Kafka Streams for Apache Kafka in Event Hubs on Azure Cloud
33
description: Learn about how to use the Apache Kafka Streams API with Event Hubs service on Azure Cloud.
44
ms.topic: overview
55
ms.subservice: kafka
6-
ms.date: 04/29/2024
6+
ms.date: 03/06/2025
77
---
88

99
# Kafka Streams for Azure Event Hubs
@@ -27,21 +27,21 @@ Azure Event Hubs natively supports both the AMQP and Kafka protocol. However, to
2727
| Property | Default behavior for Event Hubs | Modified behavior for Kafka streams | Explanation |
2828
| ----- | ---- | ----| ---- |
2929
| `messageTimestampType` | set to `AppendTime` | should be set to `CreateTime` | Kafka Streams relies on creation timestamp rather than append timestamp |
30-
| `message.timestamp.difference.max.ms` | max allowed value is 90 days | Property is used to govern past timestamps only. Future time is set to 1 hour and can't be changed. | This is in line with the Kafka protocol specification |
30+
| `message.timestamp.difference.max.ms` | max allowed value is 90 days | Property is used to govern past timestamps only. Future time is set to 1 hour and can't be changed. | It is in line with the Kafka protocol specification |
3131
| `min.compaction.lag.ms` | | max allowed value is two days ||
3232
| Infinite retention topics | | size based truncation of 250 GB for each topic-partition||
33-
| Delete record API for infinite retention topics| | Not implemented. As a workaround, the topic can be updated and a finite retention time can be set.| This will be done in GA |
33+
| Delete record API for infinite retention topics| | Not implemented. As a workaround, the topic can be updated and a finite retention time can be set.| This functionality will be supported in GA |
3434

3535
### Other considerations
3636

3737
Here are some of the other considerations to keep in mind.
3838

3939
* Kafka streams client applications must be granted management, read, and write permissions for the entire namespaces to be able to create temporary topics for stream processing.
40-
* Temporary topics and partitions count towards the quota for the given namespace. These should be kept under consideration when provisioning the namespace or cluster.
41-
* Infinite retention time for "Offset" Store is limited by max message retention time of the SKU. Check [Event Hubs Quotas](event-hubs-quotas.md) for these tier specific values.
40+
* Temporary topics and partitions count towards the quota for the given namespace. They should be kept under consideration when provisioning the namespace or cluster.
41+
* Infinite retention time for "Offset" Store is limited by max message retention time of the Stock Keeping Unit (SKU). Check [Event Hubs Quotas](event-hubs-quotas.md) for these tier specific values.
4242

4343

44-
These include, updating the topic configuration in the `messageTimestampType` to use the `CreateTime` (that is, Event creation time) instead of the `AppendTime` (that is, log append time).
44+
They include, updating the topic configuration in the `messageTimestampType` to use the `CreateTime` (that is, Event creation time) instead of the `AppendTime` (that is, log append time).
4545

4646
To override the default behavior (required), the below setting must be set in Azure Resource Manager (ARM).
4747

@@ -73,7 +73,7 @@ To override the default behavior (required), the below setting must be set in Az
7373

7474
## Kafka Streams concepts
7575

76-
Kafka streams provides a simple abstraction layer over the Kafka producer and consumer APIs to help developers get started with real time streaming scenarios faster. The light-weight library depends on an **Apache Kafka compatible broker** (like Azure Event Hubs) for the internal messaging layer, and manages a **fault tolerant local state store**. With the transactional API, the Kafka streams library supports rich processing features such as **exactly once processing** and **one record at a time processing**.
76+
Kafka streams provide a simple abstraction layer over the Kafka producer and consumer APIs to help developers get started with real time streaming scenarios faster. The light-weight library depends on an **Apache Kafka compatible broker** (like Azure Event Hubs) for the internal messaging layer, and manages a **fault tolerant local state store**. With the transactional API, the Kafka streams library supports rich processing features such as **exactly once processing** and **one record at a time processing**.
7777

7878
Records arriving out of order benefit from **event-time based windowing operations**.
7979

@@ -100,7 +100,7 @@ Stream processing topology can be defined either with the [Kafka Streams DSL](ht
100100

101101
Streams and tables are 2 different but useful abstractions provided by the [Kafka Streams DSL](https://kafka.apache.org/37/documentation/streams/developer-guide/dsl-api.html), modeling both time series and relational data formats that must coexist for stream processing use-cases.
102102

103-
Kafka extends this further and introduces a duality between streams and tables, where a
103+
Kafka extends it further and introduces a duality between streams and tables, where a
104104
* A **stream** can be considered as a changelog of a **table**, and
105105
* A **table** can be considered as a snapshot of the latest value of each key in a **stream**.
106106

@@ -113,11 +113,11 @@ For example
113113

114114
### Time
115115

116-
Kafka Streams allows windowing and grace functions to allow for out of order data records to be ingested and still be included in the processing. To ensure that this behavior is deterministic, there are additional notions of time in Kafka streams. These include:
116+
Kafka Streams allows windowing and grace functions to allow for out of order data records to be ingested and still be included in the processing. To ensure that this behavior is deterministic, there are more notions of time in Kafka streams. They include:
117117

118-
* Creation time (also known as 'Event time') - This is the time when the event occurred and the data record was created.
119-
* Processing time - This is the time when the data record is processed by the stream processing application (or when it's consumed).
120-
* Append time (also known as 'Creation time') - This is the time when the data is stored and committed to the storage of the Kafka broker. This differs from the creation time because of the time difference between the creation of the event and the actual ingestion by the broker.
118+
* Creation time (also known as 'Event time') - It's the time when the event occurred and the data record was created.
119+
* Processing time - It's the time when the data record is processed by the stream processing application (or when it's consumed).
120+
* Append time (also known as 'Creation time') - It's the time when the data is stored and committed to the storage of the Kafka broker. It differs from the creation time because of the time difference between the creation of the event and the actual ingestion by the broker.
121121

122122

123123

@@ -130,7 +130,7 @@ Stateful transformations in the DSL include:
130130
* [Aggregating](https://kafka.apache.org/37/documentation/streams/developer-guide/dsl-api.html#streams-developer-guide-dsl-aggregating)
131131
* [Joining](https://kafka.apache.org/37/documentation/streams/developer-guide/dsl-api.html#streams-developer-guide-dsl-joins)
132132
* [Windowing (as part of aggregations and joins)](https://kafka.apache.org/37/documentation/streams/developer-guide/dsl-api.html#streams-developer-guide-dsl-windowing)
133-
* [Applying custom processors and transformers](https://kafka.apache.org/37/documentation/streams/developer-guide/dsl-api.html#streams-developer-guide-dsl-process), which may be stateful, for Processor API integration
133+
* [Applying custom processors and transformers](https://kafka.apache.org/37/documentation/streams/developer-guide/dsl-api.html#streams-developer-guide-dsl-process), which can be stateful, for Processor API integration
134134

135135
### Window and grace
136136

@@ -143,7 +143,7 @@ Applications must utilize the windowing and grace period controls to improve fau
143143

144144
### Processing guarantees
145145

146-
Business and technical users seek to extract key business insights from the output of stream processing workloads, which translate to high transactional guarantee requirements. Kafka streams works together with Kafka transactions to ensure transactional processing guarantees by integrating with the Kafka compatible brokers' (such as Azure Event Hubs) underlying storage system to ensure that offset commits and state store updates are written atomically.
146+
Business and technical users seek to extract key business insights from the output of stream processing workloads, which translate to high transactional guarantee requirements. Kafka streams work together with Kafka transactions to ensure transactional processing guarantees by integrating with the Kafka compatible brokers' (such as Azure Event Hubs) underlying storage system to ensure that offset commits and state store updates are written atomically.
147147

148148
To ensure transactional processing guarantees, the `processing.guarantee` setting in the Kafka Streams configs must be updated from the default value of `at_least_once` to `exactly_once_v2` (for client versions at or after Apache Kafka 2.5) or `exactly_once` (for client versions before Apache Kafka 2.5.x).
149149

0 commit comments

Comments
 (0)