-
Notifications
You must be signed in to change notification settings - Fork 26
update: improve single-AZ Kafka docs #1193
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
harshini-rangaswamy
merged 6 commits into
main
from
FLEET-6228-clarify-data-loss-risk-for-single-az-kafka-with-rf-1-and-rf-3
Jan 20, 2026
Merged
Changes from all commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
b5badb1
update: improve single-AZ Kafka docs
harshini-rangaswamy 736b429
update: add line break
harshini-rangaswamy b2734a7
Review RF guidelines. Mention diskless
Stuzanna fd07606
update: add diskless topics link
harshini-rangaswamy a82329c
update: content to align with style guide
harshini-rangaswamy 8a47a41
update: content
harshini-rangaswamy File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,17 +1,27 @@ | ||
| --- | ||
| title: Get the best from Apache Kafka® | ||
| title: Optimize Apache Kafka® performance | ||
| --- | ||
|
|
||
| Follow these best practices to ensure that your Aiven for Apache Kafka® service is fast and reliable. | ||
| Follow these best practices to optimize the performance and reliability of your Aiven for Apache Kafka® service. | ||
|
|
||
| ## Check your topic replication factors | ||
|
|
||
| Apache Kafka services use replication between brokers to preserve data in case of a | ||
| node failure. Consider how critical the data in each topic is to your business, and set | ||
| a replication factor high enough to ensure data protection. | ||
| Apache Kafka uses replication between brokers to protect data in case of node failures. | ||
| The replication factor (RF) determines how many copies of each partition are maintained | ||
| across the cluster. | ||
|
|
||
| Evaluate the importance of each topic and set a replication factor that balances | ||
| durability requirements with cost and performance. An RF of 3 is recommended | ||
| for production because it improves durability and availability. In multi-AZ deployments, | ||
| replication traffic across availability zones can increase network costs, especially | ||
| for high-throughput workloads. | ||
|
|
||
| For Diskless Topics architecture and considerations, see | ||
| [Diskless Topics overview](/docs/products/kafka/diskless/concepts/diskless-overview). | ||
|
|
||
| Set the replication factor when creating or editing a | ||
| [topic](/docs/products/kafka/howto/create-topic) in the [Aiven Console](https://console.aiven.io/). | ||
| [topic](/docs/products/kafka/howto/create-topic) in the | ||
| [Aiven Console](https://console.aiven.io/). | ||
|
|
||
| :::note | ||
| Replication factors below 2 are not allowed to prevent data loss from unexpected node | ||
|
|
@@ -20,77 +30,109 @@ terminations. | |
|
|
||
| ## Choose a reasonable number of partitions for a topic | ||
|
|
||
| Too few partitions can cause bottlenecks in data processing. In the extreme case, a | ||
| single partition means that messages are processed sequentially. Too many | ||
| partitions strain the cluster due to overhead. Since partition numbers cannot be reduced, | ||
| start with a low number that supports efficient processing and increase as needed. | ||
| Too few partitions can create processing bottlenecks. A single partition processes | ||
| messages sequentially, which limits throughput. Too many partitions increase overhead | ||
| and reduce cluster efficiency. Because partition counts cannot be reduced, start with a | ||
| number that supports parallel processing and increase it as needed. | ||
|
|
||
| A maximum of 4,000 partitions per broker and 200,000 per cluster is recommended. For more | ||
| details, see this [Apache Kafka blog post](https://blogsarchive.apache.org/kafka/entry/apache-kafka-supports-more-partitions). | ||
| In addition, the total number of topics per cluster should remain under 7,000. | ||
| A maximum of 4,000 partitions per broker and 200,000 per cluster is recommended. | ||
| For details, see this | ||
| [Apache Kafka blog post](https://blogsarchive.apache.org/kafka/entry/apache-kafka-supports-more-partitions). | ||
| Keep the total number of topics under 7,000. | ||
|
|
||
| :::note | ||
| Ordering is only guaranteed within a partition. To maintain the order of related records, | ||
| Ordering is guaranteed only within a partition. To maintain ordering of related records, | ||
| place them in the same partition. | ||
| ::: | ||
|
|
||
| ## Check entity-based partitions for imbalances | ||
|
|
||
| Partitioning messages based on an entity ID (such as a user ID) can lead to | ||
| imbalanced partitions. This results in uneven load distribution and reduces the | ||
| cluster's efficiency in processing messages in parallel. | ||
| Partitioning messages by an entity identifier, such as a user ID, can create imbalanced | ||
| partitions. This results in uneven load distribution and reduces parallel processing | ||
| efficiency. | ||
|
|
||
| You can view the size of each partition in the **Partitions** tab under | ||
| [topic](/docs/products/kafka/howto/create-topic) details in the | ||
| [Aiven Console](https://console.aiven.io/). | ||
| You can view the size of each partition by selecting the | ||
| [topic](/docs/products/kafka/howto/create-topic) in the Topics list and opening | ||
| the **Partitions** tab in the [Aiven Console](https://console.aiven.io/). | ||
|
|
||
| ## Balance between throughput and latency | ||
|
|
||
| To find the right balance between throughput and latency, adjust the batch sizes in | ||
| your producer and consumer settings. Larger batches improve throughput but can increase | ||
| the time it takes to process individual messages. Smaller batches reduce this time | ||
| but increase the overhead, which may lower overall throughput. | ||
| Adjust producer and consumer batch sizes to balance throughput and latency. Larger | ||
| batches increase throughput but add latency. Smaller batches reduce latency but increase | ||
| overhead, which can lower throughput. | ||
|
|
||
| You can change settings like `batch.size` and `linger.ms` in your producer | ||
| configuration. For more details, refer to the | ||
| Settings such as `batch.size` and `linger.ms` can be configured in the producer. For | ||
| more details, refer to the | ||
| [Apache Kafka documentation](https://kafka.apache.org/documentation/). | ||
|
|
||
| ## Configure acknowledgments for received data | ||
|
|
||
| The `acks` parameter in the producer configuration controls how the success of a | ||
| write operation is determined. Choose the appropriate setting based on your data | ||
| reliability needs: | ||
| The `acks` parameter in the producer configuration controls how write operations are | ||
| acknowledged. Choose a setting that matches your reliability requirements: | ||
|
|
||
| - **`acks=0`**: The producer sends data without waiting for confirmation from the | ||
| broker. This speeds up communication, but there’s a risk of data loss if the broker | ||
| goes down during transmission. Use this setting only if some data loss is acceptable. | ||
| - **`acks=0`**: The producer does not wait for confirmation. This minimizes latency but | ||
| increases the risk of data loss if the broker fails during transmission. Use this | ||
| setting only when some data loss is acceptable. | ||
|
|
||
| - **`acks=1` (default and recommended setting)**: The producer waits for the leader | ||
| broker to confirm receipt of the data. This reduces the chance of data loss, but | ||
| data can still be lost if the leader fails before the data is fully replicated. | ||
| - **`acks=1`** (default and recommended): The producer waits for the leader broker to | ||
| confirm receipt. This reduces the risk of data loss but does not protect against | ||
| leader failure before replication completes. | ||
|
|
||
| - **`acks=all`**: The producer waits for acknowledgment from both the leader and all | ||
| replicas. This ensures no data loss but can slow down communication. | ||
| - **`acks=all`**: The producer waits for acknowledgment from the leader and all in-sync | ||
| replicas. This prevents data loss but increases latency. | ||
|
|
||
| ## Configure single availability zone (AZ) for BYOC | ||
|
|
||
| For Bring Your Own Cloud (BYOC) customers, deploying Aiven for Apache Kafka in a single | ||
| AZ can reduce costs by removing inter-zone data transfer fees. | ||
| However, using a single AZ removes Kafka's resiliency, as data is not replicated across | ||
| zones. This increases the risk of downtime if the AZ fails. | ||
| Deploying Aiven for Apache Kafka in a single availability zone (AZ) reduces inter-zone | ||
| data transfer costs. Single-AZ deployment places all brokers and replicas in one | ||
| failure domain, so the cluster cannot tolerate an AZ outage. If the zone becomes | ||
| unavailable, the service cannot recover until the zone is restored. | ||
|
|
||
| :::note | ||
| Before enabling this configuration, contact your account team to discuss your use case | ||
| and agree on the reduced SLA. The standard uptime SLA does not apply to services | ||
| deployed in a single AZ. | ||
| ::: | ||
|
|
||
| ### Replication factor considerations in a single AZ | ||
|
|
||
| **Replication factor 1 (RF=1):** | ||
| Creates a single copy of each partition. In a single-AZ deployment, a broker or AZ | ||
| failure results in data loss. Use RF=1 only when losing data is acceptable. | ||
|
|
||
| **Replication factor 3 (RF=3):** | ||
harshini-rangaswamy marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| Protects against individual broker failures. It does not protect against an AZ failure | ||
| when all replicas are in the same zone. If the AZ becomes unavailable, all replicas can | ||
| be lost, and the cluster cannot recover until the zone is restored. | ||
|
|
||
| ### When to use single AZ | ||
|
|
||
| Avoid single-AZ deployment for production workloads or any data that cannot be | ||
| recreated. Use single AZ only for: | ||
|
|
||
| - Development, QA, or test workloads | ||
| - Temporary proof-of-concept environments | ||
| - Workloads where data can be recreated | ||
harshini-rangaswamy marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| ### Risks and considerations | ||
|
|
||
| - All brokers and replicas exist within one failure domain, increasing the impact of an | ||
| AZ outage. | ||
| - Recovery options are limited because the cluster cannot fail over to another zone. | ||
| - Service downtime may increase during an AZ failure because no cross-zone redundancy | ||
| exists. | ||
| - SLA terms for single-AZ deployments must be agreed with your account team. | ||
harshini-rangaswamy marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| When considering a single AZ allocation, evaluate your organization's risk tolerance, | ||
| as Aiven's standard uptime SLA does not apply to services deployed in a single AZ. | ||
| ### Enable single-AZ allocation | ||
|
|
||
| - To enable this option for your project, contact | ||
| [Aiven support](mailto:[email protected]) or your account team. | ||
| - You must configure single AZ allocation during service creation. It cannot be applied | ||
| to existing services. | ||
| Single-AZ allocation must be configured during service creation. It cannot be enabled | ||
| for existing Kafka services. | ||
|
|
||
| To enable single AZ allocation, use the [Aiven CLI](/docs/tools/cli) and | ||
| set `single_zone.enabled=true`. | ||
| To enable this option for your project, contact [Aiven support](mailto:[email protected]) | ||
| or your account team. | ||
harshini-rangaswamy marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| Example command: | ||
| To create a single-AZ Kafka service using the [Aiven CLI](/docs/tools/cli), | ||
| set `single_zone.enabled=true`: | ||
|
|
||
| ```bash | ||
| avn service create SERVICE_NAME \ | ||
|
|
||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.