Skip to content

Commit 2e05608

Browse files
Merge pull request #234627 from sreekzz/patch-157
Modified the contents
2 parents 2415540 + d0358a8 commit 2e05608

File tree

1 file changed

+66
-22
lines changed

1 file changed

+66
-22
lines changed

articles/hdinsight/kafka/apache-kafka-mirror-maker-2.md

Lines changed: 66 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,25 @@
11
---
2-
title: Use MirrorMaker 2 to replicate Apache Kafka topics - Azure HDInsight
3-
description: Learn how to use Use MirrorMaker 2 to replicate Apache Kafka topics
2+
title: Use MirrorMaker 2 to migrate Kafka clusters between different Azure HDInsight versions - Azure HDInsight
3+
description: Learn how to use MirrorMaker 2 to migrate Kafka clusters between different Azure HDInsight versions
44
ms.service: hdinsight
55
ms.topic: how-to
66
ms.custom: hdinsightactive
7-
ms.date: 03/10/2023
7+
ms.date: 04/25/2023
88
---
99

10-
# Use MirrorMaker 2 to replicate Apache Kafka topics with Kafka on HDInsight
10+
# Use MirrorMaker 2 to migrate Kafka clusters between different Azure HDInsight versions
1111

1212
Learn how to use Apache Kafka's mirroring feature to replicate topics to a secondary cluster. You can run mirroring as a continuous process, or intermittently, to migrate data from one cluster to another.
1313

1414
In this article, you use mirroring to replicate topics between two HDInsight clusters. These clusters are in different virtual networks in different datacenters.
1515

16-
> [!WARNING]
17-
> Don't use mirroring as a means to achieve fault-tolerance. The offset to items within a topic are different between the primary and secondary clusters, so clients can't use the two interchangeably. If you are concerned about fault tolerance, you should set replication for the topics within your cluster. For more information, see [Get started with Apache Kafka on HDInsight](apache-kafka-get-started.md).
16+
> [!NOTE]
17+
> 1. We can use mirroring cluster as a fault tolerance.
18+
> 2. This is valid only is primary cluster HDI Kafka 2.4.1, 3.2.0 and secondary cluster is HDI Kafka 3.2.0 versions.
19+
> 3. Secondary cluster would work seamlessly if your primary cluster went down.
20+
> 4. Consumer group offsets will be automatically translated to secondary cluster.
21+
> 5. Just point your primary cluster consumers to secondary cluster with same consumer group and your consumer group will start consuming from the offset where it left in primary cluster.
22+
> 6. The only difference would be that the topic name in backup cluster will change from TOPIC_NAME to primary-cluster-name.TOPIC_NAME.
1823
1924
## How Apache Kafka mirroring works
2025

@@ -73,10 +78,10 @@ This architecture features two clusters in different resource groups and virtual
7378

7479
1. Create two new Kafka clusters:
7580

76-
| Cluster name | Resource group | Virtual network | Storage account |
81+
| Cluster name |HDInsight version| Resource group | Virtual network | Storage account |
7782
|---|---|---|---|
78-
| primary-kafka-cluster | kafka-primary-rg | kafka-primary-vnet | kafkaprimarystorage |
79-
| secondary-kafka-cluster | kafka-secondary-rg | kafka-secondary-vnet | kafkasecondarystorage |
83+
| primary-kafka-cluster | 5.0|kafka-primary-rg | kafka-primary-vnet | kafkaprimarystorage |
84+
| secondary-kafka-cluster |5.1|kafka-secondary-rg | kafka-secondary-vnet | kafkasecondarystorage |
8085

8186
> [!NOTE]
8287
> From now onwards we will use `primary-kafka-cluster` as `PRIMARYCLUSTER` and `secondary-kafka-cluster` as `SECONDARYCLUSTER`.
@@ -98,7 +103,7 @@ This architecture features two clusters in different resource groups and virtual
98103
```
99104
1. Edit the `/etc/hosts` file of secondary cluster and add those entries here.
100105

101-
1. After making the changes, the `/etc/hosts` file for `SECONDARYCLUSTER` looks like the given image.
106+
1. After you making the changes, the `/etc/hosts` file for `SECONDARYCLUSTER` looks like the given image.
102107

103108
:::image type="content" source="./media/apache-kafka-mirror-maker2/ect-host.png" lightbox="./media/apache-kafka-mirror-maker2/ect-host.png" alt-text="Screenshot that shows etc hosts file output." border="false":::
104109

@@ -152,7 +157,7 @@ This architecture features two clusters in different resource groups and virtual
152157
```
153158

154159
1. Here source is your `PRIMARYCLUSTER` and destination is your `SECONDARYCLUSTR`. Replace it everywhere with correct name and replace `source.bootstrap.servers` and `destination.bootstrap.servers` with correct FQDN or IP of their respective worker nodes.
155-
1. You can control the topics that you want to replicate along with configurations using regular expressions. `replication.factor=3` makes the replication factor = 3 for all the topic which Mirror maker script creates by itself.
160+
1. You can use regular expressions to specify the topics and their configurations that you want to replicate. By setting the `replication.factor` parameter to 3, you can ensure that all topics created by the MirrorMaker script hsd a replication factor of 3.
156161
1. Increase the replication factor from 1 to 3 for these topics
157162
```
158163
checkpoints.topic.replication.factor=1
@@ -172,6 +177,23 @@ This architecture features two clusters in different resource groups and virtual
172177
destination->source.enabled=true
173178
destination->source.topics = .*
174179
```
180+
1. For automated consumer offset sync, we need to enable replication and control the sync duration too. Following property syncs offset every 30 second. For active-active scenario, we need to do it both ways.
181+
```
182+
groups=.*
183+
184+
emit.checkpoints.enabled = true
185+
source->destination.sync.group.offsets.enabled = true
186+
source->destination.sync.group.offsets.interval.ms=30000
187+
188+
destination->source.sync.group.offsets.enabled = true
189+
destination->source.sync.group.offsets.interval.ms=30000
190+
```
191+
1. If we don’t want to replicate internal topics across clusters, then use following property
192+
193+
```
194+
topics.blacklist="*.internal,__.*"
195+
```
196+
175197
1. Final Configuration file after changes should look like this
176198
```
177199
# specify any number of cluster aliases
@@ -194,6 +216,11 @@ This architecture features two clusters in different resource groups and virtual
194216
secondary-kafka-cluster->primary-kafka-cluster.topics = .*
195217
196218
groups=.*
219+
emit.checkpoints.enabled = true
220+
primary-kafka-cluster->secondary-kafka-cluster.sync.group.offsets.enabled=true
221+
primary-kafka-cluster->secondary-kafka-cluster.sync.group.offsets.interval.ms=30000
222+
secondary-kafka-cluster->primary-kafka-cluster.sync.group.offsets.enabled = true
223+
secondary-kafka-cluster->primary-kafka-cluster.sync.group.offsets.interval.ms=30000
197224
topics.blacklist="*.internal,__.*"
198225
199226
# Setting replication factor of newly created remote topics
@@ -220,25 +247,42 @@ This architecture features two clusters in different resource groups and virtual
220247
export clusterName='primary-kafka-cluster'
221248
export TOPICNAME='TestMirrorMakerTopic'
222249
export KAFKABROKERS='wn0-primar:9092'
223-
export KAFKAZKHOSTS='zk0-primar:2181'
224-
250+
export KAFKAZKHOSTS='zk0-primar:2181'
251+
225252
//Start Producer
226-
bash /usr/hdp/current/kafka-broker/bin/kafka-console-producer.sh --broker-list $KAFKABROKERS --topic $TOPICNAME
253+
254+
# For Kafka 2.4
255+
bash /usr/hdp/current/kafka-broker/bin/kafka-console-producer.sh --zookeeper $KAFKAZKHOSTS --topic $TOPICNAME
256+
# For Kafka 3.2
257+
bash /usr/hdp/current/kafka-broker/bin/kafka-console-producer.sh --boostrap-server $KAFKABROKERS --topic $TOPICNAME
227258
```
228-
1. Now start consumer in `SECONDARYCLUSTER`
229-
259+
1. Now start the consumer in PRIMARYCLUSTER with a consumer group
260+
```
261+
//Start Consumer
262+
263+
# For Kafka 2.4
264+
bash /usr/hdp/current/kafka-broker/bin/kafka-console-consumer.sh --zookeeper $KAFKAZKHOSTS --topic $TOPICNAME -–group my-group –-from- beginning
265+
266+
# For Kafka 3.2
267+
bash /usr/hdp/current/kafka-broker/bin/kafka-console-producer.sh --boostrap-server $KAFKABROKERS --topic $TOPICNAME -–group my-group –-from-beginning
230268
```
231-
export clusterName='secondary-kafka-cluster'
232-
export TOPICNAME='TestMirrorMakerTopic'
233-
export KAFKABROKERS='wn0-second:9092'
234-
export KAFKAZKHOSTS='zk0-second:2181'
235-
236-
# List all the topics whether they are replicated or not
269+
1. Now stop the consumer in PRIMARYCONSUMER and start consumer in SECONDARYCLUSTER with same consumer group
270+
```
271+
export clusterName='secondary-kafka-cluster'
272+
273+
export TOPICNAME='primary-kafka-cluster.TestMirrorMakerTopic'
274+
275+
export KAFKABROKERS='wn0-second:9092'
276+
277+
export KAFKAZKHOSTS='zk0-second:2181'
278+
279+
# List all the topics whether they're replicated or not
237280
bash /usr/hdp/current/kafka-broker/bin/kafka-topics.sh --zookeeper $KAFKAZKHOSTS --list
238281
239282
# Start Consumer
240283
bash /usr/hdp/current/kafka-broker/bin/kafka-console-consumer.sh --bootstrap-server $KAFKABROKERS --topic $TOPICNAME --from-beginning
241284
```
285+
You can notice that in secondary cluster consumer group my-group cant't consume any messages because, already consumed by primary cluster consumer group. Now produce more messages in primary-cluster and try to consumer then in secondary-cluster. You are able to consume from `SECONDARYCLUSTER`.
242286

243287
## Delete cluster
244288

0 commit comments

Comments
 (0)