Skip to content

Commit 8b84bb8

Browse files
Merge pull request #251610 from v-akarnase/patch-21
Update apache-kafka-performance-tuning.md
2 parents 72590b0 + 85e683e commit 8b84bb8

File tree

1 file changed

+15
-15
lines changed

1 file changed

+15
-15
lines changed

articles/hdinsight/kafka/apache-kafka-performance-tuning.md

Lines changed: 15 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -3,32 +3,32 @@ title: Performance optimization for Apache Kafka HDInsight clusters
33
description: Provides an overview of techniques for optimizing Apache Kafka workloads on Azure HDInsight.
44
ms.service: hdinsight
55
ms.topic: conceptual
6-
ms.date: 08/21/2022
6+
ms.date: 09/15/2023
77
---
88

99
# Performance optimization for Apache Kafka HDInsight clusters
1010

11-
This article gives some suggestions for optimizing the performance of your Apache Kafka workloads in HDInsight. The focus is on adjusting producer, broker and consumer configuration. Sometimes, you also need to adjust OS settings to tune the performance with heavy workload. There are different ways of measuring performance, and the optimizations that you apply will depend on your business needs.
11+
This article gives some suggestions for optimizing the performance of your Apache Kafka workloads in HDInsight. The focus is on adjusting producer, broker and consumer configuration. Sometimes, you also need to adjust OS settings to tune the performance with heavy workload. There are different ways of measuring performance, and the optimizations that you apply depends on your business needs.
1212

1313
## Architecture overview
1414

15-
Kafka topics are used to organize records. Records are produced by producers, and consumed by consumers. Producers send records to Kafka brokers, which then store the data. Each worker node in your HDInsight cluster is a Kafka broker.
15+
Kafka topics are used to organize records. Producers produce records, and consumers consume them. Producers send records to Kafka brokers, which then store the data. Each worker node in your HDInsight cluster is a Kafka broker.
1616

1717
Topics partition records across brokers. When consuming records, you can use up to one consumer per partition to achieve parallel processing of the data.
1818

19-
Replication is used to duplicate partitions across nodes. This protects against node (broker) outages. A single partition among the group of replicas is designated as the partition leader. Producer traffic is routed to the leader of each node, using the state managed by ZooKeeper.
19+
Replication is used to duplicate partitions across nodes. This partition protects against node (broker) outages. A single partition among the group of replicas is designated as the partition leader. Producer traffic is routed to the leader of each node, using the state managed by ZooKeeper.
2020

2121
## Identify your scenario
2222

23-
Apache Kafka performance has two main aspects – throughput and latency. Throughput is the maximum rate at which data can be processed. Higher throughput is usually better. Latency is the time it takes for data to be stored or retrieved. Lower latency is usually better. Finding the right balance between throughput, latency and the cost of the application's infrastructure can be challenging. Your performance requirements will likely match one of the following three common situations, based on whether you require high throughput, low latency, or both:
23+
Apache Kafka performance has two main aspects – throughput and latency. Throughput is the maximum rate at which data can be processed. Higher throughput is better. Latency is the time it takes for data to be stored or retrieved. Lower latency is better. Finding the right balance between throughput, latency and the cost of the application's infrastructure can be challenging. Your performance requirements should match with one of the following three common situations, based on whether you require high throughput, low latency, or both:
2424

2525
* High throughput, low latency. This scenario requires both high throughput and low latency (~100 milliseconds). An example of this type of application is service availability monitoring.
2626
* High throughput, high latency. This scenario requires high throughput (~1.5 GBps) but can tolerate higher latency (< 250 ms). An example of this type of application is telemetry data ingestion for near real-time processes like security and intrusion detection applications.
2727
* Low throughput, low latency. This scenario requires low latency (< 10 ms) for real-time processing, but can tolerate lower throughput. An example of this type of application is online spelling and grammar checks.
2828

2929
## Producer configurations
3030

31-
The following sections will highlight some of the most important generic configuration properties to optimize performance of your Kafka producers. For a detailed explanation of all configuration properties, see [Apache Kafka documentation on producer configurations](https://kafka.apache.org/documentation/#producerconfigs).
31+
The following sections highlight some of the most important generic configuration properties to optimize performance of your Kafka producers. For a detailed explanation of all configuration properties, see [Apache Kafka documentation on producer configurations](https://kafka.apache.org/documentation/#producerconfigs).
3232

3333
### Batch size
3434

@@ -44,11 +44,11 @@ A Kafka producer can be configured to compress messages before sending them to b
4444

4545
Among the two commonly used compression codecs, `gzip` and `snappy`, `gzip` has a higher compression ratio, which results in lower disk usage at the cost of higher CPU load. The `snappy` codec provides less compression with less CPU overhead. You can decide which codec to use based on broker disk or producer CPU limitations. `gzip` can compress data at a rate five times higher than `snappy`.
4646

47-
Using data compression will increase the number of records that can be stored on a disk. It may also increase CPU overhead in cases where there's a mismatch between the compression formats being used by the producer and the broker. as the data must be compressed before sending and then decompressed before processing.
47+
Data compression increases the number of records that can be stored on a disk. It may also increase CPU overhead in cases where there's a mismatch between the compression formats being used by the producer and the broker. as the data must be compressed before sending and then decompressed before processing.
4848

4949
## Broker settings
5050

51-
The following sections will highlight some of the most important settings to optimize performance of your Kafka brokers. For a detailed explanation of all broker settings, see [Apache Kafka documentation on broker configurations](https://kafka.apache.org/documentation/#brokerconfigs).
51+
The following sections highlight some of the most important settings to optimize performance of your Kafka brokers. For a detailed explanation of all broker settings, see [Apache Kafka documentation on broker configurations](https://kafka.apache.org/documentation/#brokerconfigs).
5252

5353
### Number of disks
5454

@@ -62,7 +62,7 @@ Each Kafka partition is a log file on the system, and producer threads can write
6262

6363
Increasing the partition density (the number of partitions per broker) adds an overhead related to metadata operations and per partition request/response between the partition leader and its followers. Even in the absence of data flowing through, partition replicas still fetch data from leaders, which results in extra processing for send and receive requests over the network.
6464

65-
For Apache Kafka clusters 2.1 and 2.4 and above in HDInsight, we recommend you to have a maximum of 2000 partitions per broker, including replicas. Increasing the number of partitions per broker decreases throughput and may also cause topic unavailability. For more information on Kafka partition support, see [the official Apache Kafka blog post on the increase in the number of supported partitions in version 1.1.0](https://blogs.apache.org/kafka/entry/apache-kafka-supports-more-partitions). For details on modifying topics, see [Apache Kafka: modifying topics](https://kafka.apache.org/documentation/#basic_ops_modify_topic).
65+
For Apache Kafka clusters 2.1 and 2.4 and as noted before in HDInsight, we recommend you to have a maximum of 2000 partitions per broker, including replicas. Increasing the number of partitions per broker decreases throughput and may also cause topic unavailability. For more information on Kafka partition support, see [the official Apache Kafka blog post on the increase in the number of supported partitions in version 1.1.0](https://blogs.apache.org/kafka/entry/apache-kafka-supports-more-partitions). For details on modifying topics, see [Apache Kafka: modifying topics](https://kafka.apache.org/documentation/#basic_ops_modify_topic).
6666

6767
### Number of replicas
6868

@@ -74,25 +74,25 @@ For more information on replication, see [Apache Kafka: replication](https://kaf
7474

7575
## Consumer configurations
7676

77-
The following section will highlight some important generic configurations to optimize the performance of your Kafka consumers. For a detailed explanation of all configurations, see [Apache Kafka documentation on consumer configurations](https://kafka.apache.org/documentation/#consumerconfigs).
77+
The following section highlight some important generic configurations to optimize the performance of your Kafka consumers. For a detailed explanation of all configurations, see [Apache Kafka documentation on consumer configurations](https://kafka.apache.org/documentation/#consumerconfigs).
7878

7979
### Number of consumers
8080

81-
It is a good practice to have the number of partitions equal to the number of consumers. If the number of consumers is less than the number of partitions then a few of the consumers will read from multiple partitions, increasing consumer latency.
81+
It is a good practice to have the number of partitions equal to the number of consumers. If the number of consumers is less than the number of partitions, then a few of the consumers read from multiple partitions, increasing consumer latency.
8282

83-
If the number of consumers is greater than the number of partitions, then you will be wasting your consumer resources since those consumers will be idle.
83+
If the number of consumers is greater than the number of partitions, then you are wasting your consumer resources since those consumers are idle.
8484

8585
### Avoid frequent consumer rebalance
8686

8787
Consumer rebalance is triggered by partition ownership change (i.e., consumers scales out or scales down), a broker crash (since brokers are group coordinator for consumer groups), a consumer crash, adding a new topic or adding new partitions. During rebalancing, consumers cannot consume, hence increasing the latency.
8888

89-
Consumers are considered alive if it can send a heartbeat to a broker within `session.timeout.ms`. Otherwise, the consumer will be considered dead or failed. This will lead to a consumer rebalance. The lower the consumer `session.timeout.ms` the faster we will be able to detect those failures.
89+
Consumers are considered alive if it can send a heartbeat to a broker within `session.timeout.ms`. Otherwise, the consumer is considered dead or failed. This delay leads to a consumer rebalance. Lower the consumer `session.timeout.ms`, faster we can detect those failures.
9090

9191
If the `session.timeout.ms` is too low, a consumer could experience repeated unnecessary rebalances, due to scenarios such as when a batch of messages takes longer to process or when a JVM GC pause takes too long. If you have a consumer that spends too much time processing messages, you can address this either by increasing the upper bound on the amount of time that a consumer can be idle before fetching more records with `max.poll.interval.ms` or by reducing the maximum size of batches returned with the configuration parameter `max.poll.records`.
9292

9393
### Batching
9494

95-
Like producers, we can add batching for consumers. The amount of data consumers can get in each fetch request can be configured by changing the configuration `fetch.min.bytes`. This parameter defines the minimum bytes expected from a fetch response of a consumer. Increasing this value will reduce the number of fetch requests made to the broker, therefore reducing extra overhead. By default, this value is 1. Similarly, there is another configuration `fetch.max.wait.ms`. If a fetch request doesn’t have enough messages as per the size of `fetch.min.bytes`, it will wait until the expiration of the wait time based on this config `fetch.max.wait.ms`.
95+
Like producers, we can add batching for consumers. The amount of data consumers can get in each fetch request can be configured by changing the configuration `fetch.min.bytes`. This parameter defines the minimum bytes expected from a fetch response of a consumer. Increasing this value reduces the number of fetch requests made to the broker, therefore reducing extra overhead. By default, this value is 1. Similarly, there is another configuration `fetch.max.wait.ms`. If a fetch request does not have enough messages as per the size of `fetch.min.bytes`, it waits until the expiration of the wait time based on this config `fetch.max.wait.ms`.
9696

9797
> [!NOTE]
9898
> In few scenarios, consumers may seem to be slow, when it fails to process the message. If you are not committing the offset after an exception, consumer will be stuck at a particular offset in an infinite loop and will not move forward, increasing the lag on consumer side as a result.
@@ -103,7 +103,7 @@ Like producers, we can add batching for consumers. The amount of data consumers
103103

104104
`vm.max_map_count` defines maximum number of mmap a process can have. By default, on HDInsight Apache Kafka cluster linux VM, the value is 65535.
105105

106-
In Apache Kafka, each log segment requires a pair of index/timeindex files, and each of these files consumes 1 mmap. In other words, each log segment uses 2 mmap. Thus, if each partition hosts a single log segment, it requires minimum 2 mmap. The number of log segments per partition varies depending on the **segment size, load intensity, retention policy, rolling period** and, generally tends to be more than one. `Mmap value = 2*((partition size)/(segment size))*(partitions)`
106+
In Apache Kafka, each log segment requires a pair of index/timeindex files, and each of these files consumes one mmap. In other words, each log segment uses two mmap. Thus, if each partition hosts a single log segment, it requires minimum two mmap. The number of log segments per partition varies depending on the **segment size, load intensity, retention policy, rolling period** and, generally tends to be more than one. `Mmap value = 2*((partition size)/(segment size))*(partitions)`
107107

108108
If required mmap value exceeds the `vm.max_map_count`, broker would raise **"Map failed"** exception.
109109

0 commit comments

Comments
 (0)