You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/hdinsight/kafka/apache-kafka-mirror-maker-2.md
+66-22Lines changed: 66 additions & 22 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,20 +1,25 @@
1
1
---
2
-
title: Use MirrorMaker 2 to replicate Apache Kafka topics - Azure HDInsight
3
-
description: Learn how to use Use MirrorMaker 2 to replicate Apache Kafka topics
2
+
title: Use MirrorMaker 2 to migrate Kafka clusters between different Azure HDInsight versions - Azure HDInsight
3
+
description: Learn how to use MirrorMaker 2 to migrate Kafka clusters between different Azure HDInsight versions
4
4
ms.service: hdinsight
5
5
ms.topic: how-to
6
6
ms.custom: hdinsightactive
7
-
ms.date: 03/10/2023
7
+
ms.date: 04/25/2023
8
8
---
9
9
10
-
# Use MirrorMaker 2 to replicate Apache Kafka topics with Kafka on HDInsight
10
+
# Use MirrorMaker 2 to migrate Kafka clusters between different Azure HDInsight versions
11
11
12
12
Learn how to use Apache Kafka's mirroring feature to replicate topics to a secondary cluster. You can run mirroring as a continuous process, or intermittently, to migrate data from one cluster to another.
13
13
14
14
In this article, you use mirroring to replicate topics between two HDInsight clusters. These clusters are in different virtual networks in different datacenters.
15
15
16
-
> [!WARNING]
17
-
> Don't use mirroring as a means to achieve fault-tolerance. The offset to items within a topic are different between the primary and secondary clusters, so clients can't use the two interchangeably. If you are concerned about fault tolerance, you should set replication for the topics within your cluster. For more information, see [Get started with Apache Kafka on HDInsight](apache-kafka-get-started.md).
16
+
> [!NOTE]
17
+
> 1. We can use mirroring cluster as a fault tolerance.
18
+
> 2. This is valid only is primary cluster HDI Kafka 2.4.1, 3.2.0 and secondary cluster is HDI Kafka 3.2.0 versions.
19
+
> 3. Secondary cluster would work seamlessly if your primary cluster went down.
20
+
> 4. Consumer group offsets will be automatically translated to secondary cluster.
21
+
> 5. Just point your primary cluster consumers to secondary cluster with same consumer group and your consumer group will start consuming from the offset where it left in primary cluster.
22
+
> 6. The only difference would be that the topic name in backup cluster will change from TOPIC_NAME to primary-cluster-name.TOPIC_NAME.
18
23
19
24
## How Apache Kafka mirroring works
20
25
@@ -73,10 +78,10 @@ This architecture features two clusters in different resource groups and virtual
73
78
74
79
1. Create two new Kafka clusters:
75
80
76
-
| Cluster name | Resource group | Virtual network | Storage account |
81
+
| Cluster name |HDInsight version| Resource group | Virtual network | Storage account |
@@ -152,7 +157,7 @@ This architecture features two clusters in different resource groups and virtual
152
157
```
153
158
154
159
1. Here source is your `PRIMARYCLUSTER` and destination is your `SECONDARYCLUSTR`. Replace it everywhere with correct name and replace `source.bootstrap.servers` and `destination.bootstrap.servers` with correct FQDN or IP of their respective worker nodes.
155
-
1. You can control the topics that you want to replicate along with configurations using regular expressions. `replication.factor=3` makes the replication factor = 3 for all the topic which Mirror maker script creates by itself.
160
+
1. You can use regular expressions to specify the topics and their configurations that you want to replicate. By setting the `replication.factor` parameter to 3, you can ensure that all topics created by the MirrorMaker script hsd a replication factor of 3.
156
161
1. Increase the replication factor from 1 to 3 for these topics
157
162
```
158
163
checkpoints.topic.replication.factor=1
@@ -172,6 +177,23 @@ This architecture features two clusters in different resource groups and virtual
172
177
destination->source.enabled=true
173
178
destination->source.topics = .*
174
179
```
180
+
1. For automated consumer offset sync, we need to enable replication and control the sync duration too. Following property syncs offset every 30 second. For active-active scenario, we need to do it both ways.
You can notice that in secondary cluster consumer group my-group cant't consume any messages because, already consumed by primary cluster consumer group. Now produce more messages in primary-cluster and try to consumer then in secondary-cluster. You are able to consume from `SECONDARYCLUSTER`.
0 commit comments