Skip to content

Commit a48bb27

Browse files
authored
Merge pull request #107705 from dagiro/kafka3
kafka3
2 parents 9d9ed71 + 72eb075 commit a48bb27

File tree

5 files changed

+66
-86
lines changed

5 files changed

+66
-86
lines changed

articles/hdinsight/TOC.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -547,8 +547,8 @@
547547
href: ./kafka/apache-kafka-get-started.md
548548
- name: Create Apache Kafka cluster - PowerShell
549549
href: ./kafka/apache-kafka-quickstart-powershell.md
550-
- name: Create Apache Kafka cluster - Template
551-
displayName: resource manager template, arm template, resource manager group
550+
- name: Create Apache Kafka cluster - ARM Template
551+
displayName: Resource Manager
552552
href: ./kafka/apache-kafka-quickstart-resource-manager-template.md
553553
- name: Tutorials
554554
items:

articles/hdinsight/kafka/apache-kafka-quickstart-resource-manager-template.md

Lines changed: 64 additions & 84 deletions
Original file line numberDiff line numberDiff line change
@@ -5,114 +5,97 @@ author: hrasheed-msft
55
ms.author: hrasheed
66
ms.reviewer: jasonh
77
ms.service: hdinsight
8-
ms.custom: mvc
98
ms.topic: quickstart
10-
ms.date: 06/12/2019
9+
ms.custom: subject-armqs
10+
ms.date: 03/13/2020
1111
#Customer intent: I need to create a Kafka cluster so that I can use it to process streaming data
1212
---
1313

1414
# Quickstart: Create Apache Kafka cluster in Azure HDInsight using Resource Manager template
1515

16-
[Apache Kafka](https://kafka.apache.org/) is an open-source, distributed streaming platform. It's often used as a message broker, as it provides functionality similar to a publish-subscribe message queue.
16+
In this quickstart, you use an Azure Resource Manager template to create an [Apache Kafka](./apache-kafka-introduction.md) cluster in Azure HDInsight. Kafka is an open-source, distributed streaming platform. It's often used as a message broker, as it provides functionality similar to a publish-subscribe message queue.
1717

18-
In this quickstart, you learn how to create an [Apache Kafka](https://kafka.apache.org) cluster using an Azure Resource Manager template. You also learn how to use included utilities to send and receive messages using Kafka. Similar templates can be viewed at [Azure quickstart templates](https://azure.microsoft.com/resources/templates/?resourceType=Microsoft.Hdinsight&pageNumber=1&sort=Popular). The template reference can be found [here](https://docs.microsoft.com/azure/templates/microsoft.hdinsight/allversions).
19-
20-
[!INCLUDE [delete-cluster-warning](../../../includes/hdinsight-delete-cluster-warning.md)]
18+
[!INCLUDE [About Azure Resource Manager](../../../includes/resource-manager-quickstart-introduction.md)]
2119

2220
The Kafka API can only be accessed by resources inside the same virtual network. In this quickstart, you access the cluster directly using SSH. To connect other services, networks, or virtual machines to Kafka, you must first create a virtual network and then create the resources within the network. For more information, see the [Connect to Apache Kafka using a virtual network](apache-kafka-connect-vpn-gateway.md) document.
2321

2422
If you don't have an Azure subscription, create a [free account](https://azure.microsoft.com/free/?WT.mc_id=A261C142F) before you begin.
2523

26-
## Prerequisites
24+
## Create an Apache Kafka cluster
2725

28-
An SSH client. For more information, see [Connect to HDInsight (Apache Hadoop) using SSH](../hdinsight-hadoop-linux-use-ssh-unix.md).
26+
### Review the template
2927

30-
## Create an Apache Kafka cluster
28+
The template used in this quickstart is from [Azure Quickstart templates](https://github.com/Azure/azure-quickstart-templates/tree/master/101-hdinsight-kafka).
3129

32-
1. Click the following image to open the template in the Azure portal.
30+
:::code language="json" source="~/quickstart-templates/101-hdinsight-kafka/azuredeploy.json" range="1-150":::
3331

34-
<a href="https://portal.azure.com/#create/Microsoft.Template/uri/https%3A%2F%2Fraw.githubusercontent.com%2FAzure-Samples%2Fhdinsight-kafka-java-get-started%2Fmaster%2Fazuredeploy.json" target="_blank"><img src="./media/apache-kafka-quickstart-resource-manager-template/hdi-deploy-to-azure1.png" alt="Deploy to Azure button for new cluster"></a>
32+
Two Azure resources are defined in the template:
3533

36-
2. To create the Kafka cluster, use the following values:
34+
* [Microsoft.Storage/storageAccounts](https://docs.microsoft.com/azure/templates/microsoft.storage/storageaccounts): create an Azure Storage Account.
35+
* [Microsoft.HDInsight/cluster](https://docs.microsoft.com/azure/templates/microsoft.hdinsight/clusters): create an HDInsight cluster.
3736

38-
| Property | Value |
39-
| --- | --- |
40-
| Subscription | Your Azure subscription. |
41-
| Resource group | The resource group that the cluster is created in. |
42-
| Location | The Azure region that the cluster is created in. |
43-
| Cluster Name | The name of the Kafka cluster. |
44-
| Cluster Login User Name | The account name used to login to HTTPs-based services on hosted on the cluster. |
45-
| Cluster Login Password | The password for the login user name. |
46-
| SSH User Name | The SSH user name. This account can access the cluster using SSH. |
47-
| SSH Password | The password for the SSH user. |
37+
### Deploy the template
4838

49-
![A screenshot of the template properties](./media/apache-kafka-quickstart-resource-manager-template/kafka-template-parameters.png)
39+
1. Select the **Deploy to Azure** button below to sign in to Azure and open the Resource Manager template.
5040

51-
3. Select **I agree to the terms and conditions stated above**, select **Pin to dashboard**, and then click **Purchase**. It can take up to 20 minutes to create the cluster.
41+
<a href="https://portal.azure.com/#create/Microsoft.Template/uri/https%3A%2F%2Fraw.githubusercontent.com%2FAzure-Samples%2Fhdinsight-kafka-java-get-started%2Fmaster%2Fazuredeploy.json" target="_blank"><img src="./media/apache-kafka-quickstart-resource-manager-template/hdi-deploy-to-azure1.png" alt="Deploy to Azure button for new cluster"></a>
5242

53-
## Connect to the cluster
43+
1. Enter or select the following values:
5444

55-
1. To connect to the primary head node of the Kafka cluster, use the following command. Replace `sshuser` with the SSH user name. Replace `mykafka` with the name of your Kafka cluster
45+
|Property |Description |
46+
|---|---|
47+
|Subscription|From the drop-down list, select the Azure subscription that's used for the cluster.|
48+
|Resource group|From the drop-down list, select your existing resource group, or select **Create new**.|
49+
|Location|The value will autopopulate with the location used for the resource group.|
50+
|Cluster Name|Enter a globally unique name. For this template, use only lowercase letters, and numbers.|
51+
|Cluster Login User Name|Provide the username, default is **admin**.|
52+
|Cluster Login Password|Provide a password. The password must be at least 10 characters in length and must contain at least one digit, one uppercase, and one lower case letter, one non-alphanumeric character (except characters ' " ` ). |
53+
|Ssh User Name|Provide the username, default is **sshuser**|
54+
|Ssh Password|Provide the password.|
5655

57-
```bash
58-
59-
```
56+
![A screenshot of the template properties](./media/apache-kafka-quickstart-resource-manager-template/resource-manager-template-kafka.png)
6057

61-
2. When you first connect to the cluster, your SSH client may display a warning that the authenticity of the host can't be established. When prompted type __yes__, and then press __Enter__ to add the host to your SSH client's trusted server list.
62-
63-
3. When prompted, enter the password for the SSH user.
64-
65-
Once connected, you see information similar to the following text:
66-
67-
```output
68-
Authorized uses only. All activity may be monitored and reported.
69-
Welcome to Ubuntu 16.04.4 LTS (GNU/Linux 4.13.0-1011-azure x86_64)
70-
71-
* Documentation: https://help.ubuntu.com
72-
* Management: https://landscape.canonical.com
73-
* Support: https://ubuntu.com/advantage
74-
75-
Get cloud support with Ubuntu Advantage Cloud Guest:
76-
https://www.ubuntu.com/business/services/cloud
77-
78-
83 packages can be updated.
79-
37 updates are security updates.
80-
81-
82-
Welcome to Kafka on HDInsight.
83-
84-
Last login: Thu Mar 29 13:25:27 2018 from 108.252.109.241
85-
```
58+
1. Review the **TERMS AND CONDITIONS**. Then select **I agree to the terms and conditions stated above**, then **Purchase**. You'll receive a notification that your deployment is in progress. It takes about 20 minutes to create a cluster.
59+
60+
## Review deployed resources
8661

87-
## <a id="getkafkainfo"></a>Get the Apache Zookeeper and Broker host information
62+
Once the cluster is created, you'll receive a **Deployment succeeded** notification with a **Go to resource** link. Your Resource group page will list your new HDInsight cluster and the default storage associated with the cluster. Each cluster has an [Azure Storage](../hdinsight-hadoop-use-blob-storage.md) account or an [Azure Data Lake Storage account](../hdinsight-hadoop-use-data-lake-store.md) dependency. It's referred as the default storage account. The HDInsight cluster and its default storage account must be colocated in the same Azure region. Deleting clusters doesn't delete the storage account.
63+
64+
## Get the Apache Zookeeper and Broker host information
8865

8966
When working with Kafka, you must know the *Apache Zookeeper* and *Broker* hosts. These hosts are used with the Kafka API and many of the utilities that ship with Kafka.
9067

9168
In this section, you get the host information from the Ambari REST API on the cluster.
9269

93-
1. From the SSH connection to the cluster, use the following command to install the `jq` utility. This utility is used to parse JSON documents, and is useful in retrieving the host information:
94-
70+
1. Use [ssh command](../hdinsight-hadoop-linux-use-ssh-unix.md) to connect to your cluster. Edit the command below by replacing CLUSTERNAME with the name of your cluster, and then enter the command:
71+
72+
```cmd
73+
74+
```
75+
76+
1. From the SSH connection, use the following command to install the `jq` utility. This utility is used to parse JSON documents, and is useful in retrieving the host information:
77+
9578
```bash
9679
sudo apt -y install jq
9780
```
9881
99-
2. To set an environment variable to the cluster name, use the following command:
82+
1. To set an environment variable to the cluster name, use the following command:
10083
10184
```bash
10285
read -p "Enter the Kafka on HDInsight cluster name: " CLUSTERNAME
10386
```
10487
10588
When prompted, enter the name of the Kafka cluster.
10689
107-
3. To set an environment variable with Zookeeper host information, use the command below. The command retrieves all Zookeeper hosts, then returns only the first two entries. This is because you want some redundancy in case one host is unreachable.
90+
1. To set an environment variable with Zookeeper host information, use the command below. The command retrieves all Zookeeper hosts, then returns only the first two entries. This is because you want some redundancy in case one host is unreachable.
10891
10992
```bash
11093
export KAFKAZKHOSTS=`curl -sS -u admin -G https://$CLUSTERNAME.azurehdinsight.net/api/v1/clusters/$CLUSTERNAME/services/ZOOKEEPER/components/ZOOKEEPER_SERVER | jq -r '["\(.host_components[].HostRoles.host_name):2181"] | join(",")' | cut -d',' -f1,2`
11194
```
11295
11396
When prompted, enter the password for the cluster login account (not the SSH account).
11497
115-
4. To verify that the environment variable is set correctly, use the following command:
98+
1. To verify that the environment variable is set correctly, use the following command:
11699
117100
```bash
118101
echo '$KAFKAZKHOSTS='$KAFKAZKHOSTS
@@ -122,22 +105,22 @@ In this section, you get the host information from the Ambari REST API on the cl
122105
123106
`zk0-kafka.eahjefxxp1netdbyklgqj5y1ud.ex.internal.cloudapp.net:2181,zk2-kafka.eahjefxxp1netdbyklgqj5y1ud.ex.internal.cloudapp.net:2181`
124107
125-
5. To set an environment variable with Kafka broker host information, use the following command:
108+
1. To set an environment variable with Kafka broker host information, use the following command:
126109
127110
```bash
128111
export KAFKABROKERS=`curl -sS -u admin -G https://$CLUSTERNAME.azurehdinsight.net/api/v1/clusters/$CLUSTERNAME/services/KAFKA/components/KAFKA_BROKER | jq -r '["\(.host_components[].HostRoles.host_name):9092"] | join(",")' | cut -d',' -f1,2`
129112
```
130113
131114
When prompted, enter the password for the cluster login account (not the SSH account).
132115
133-
6. To verify that the environment variable is set correctly, use the following command:
116+
1. To verify that the environment variable is set correctly, use the following command:
134117
135-
```bash
118+
```bash
136119
echo '$KAFKABROKERS='$KAFKABROKERS
137120
```
138121
139122
This command returns information similar to the following text:
140-
123+
141124
`wn1-kafka.eahjefxxp1netdbyklgqj5y1ud.cx.internal.cloudapp.net:9092,wn0-kafka.eahjefxxp1netdbyklgqj5y1ud.cx.internal.cloudapp.net:9092`
142125
143126
## Manage Apache Kafka topics
@@ -150,7 +133,7 @@ Kafka stores streams of data in *topics*. You can use the `kafka-topics.sh` util
150133
/usr/hdp/current/kafka-broker/bin/kafka-topics.sh --create --replication-factor 3 --partitions 8 --topic test --zookeeper $KAFKAZKHOSTS
151134
```
152135
153-
This command connects to Zookeeper using the host information stored in `$KAFKAZKHOSTS`. It then creates a Kafka topic named **test**.
136+
This command connects to Zookeeper using the host information stored in `$KAFKAZKHOSTS`. It then creates a Kafka topic named **test**.
154137
155138
* Data stored in this topic is partitioned across eight partitions.
156139
@@ -162,7 +145,7 @@ Kafka stores streams of data in *topics*. You can use the `kafka-topics.sh` util
162145
163146
For information on the number of fault domains in a region, see the [Availability of Linux virtual machines](../../virtual-machines/windows/manage-availability.md#use-managed-disks-for-vms-in-an-availability-set) document.
164147
165-
Kafka is not aware of Azure fault domains. When creating partition replicas for topics, it may not distribute replicas properly for high availability.
148+
Kafka isn't aware of Azure fault domains. When creating partition replicas for topics, it may not distribute replicas properly for high availability.
166149
167150
To ensure high availability, use the [Apache Kafka partition rebalance tool](https://github.com/hdinsight/hdinsight-kafka-tools). This tool must be ran from an SSH connection to the head node of your Kafka cluster.
168151
@@ -204,45 +187,42 @@ Kafka stores *records* in topics. Records are produced by *producers*, and consu
204187
To store records into the test topic you created earlier, and then read them using a consumer, use the following steps:
205188

206189
1. To write records to the topic, use the `kafka-console-producer.sh` utility from the SSH connection:
207-
190+
208191
```bash
209192
/usr/hdp/current/kafka-broker/bin/kafka-console-producer.sh --broker-list $KAFKABROKERS --topic test
210193
```
211-
194+
212195
After this command, you arrive at an empty line.
213196

214-
2. Type a text message on the empty line and hit enter. Enter a few messages this way, and then use **Ctrl + C** to return to the normal prompt. Each line is sent as a separate record to the Kafka topic.
197+
1. Type a text message on the empty line and hit enter. Enter a few messages this way, and then use **Ctrl + C** to return to the normal prompt. Each line is sent as a separate record to the Kafka topic.
198+
199+
1. To read records from the topic, use the `kafka-console-consumer.sh` utility from the SSH connection:
215200

216-
3. To read records from the topic, use the `kafka-console-consumer.sh` utility from the SSH connection:
217-
218201
```bash
219202
/usr/hdp/current/kafka-broker/bin/kafka-console-consumer.sh --bootstrap-server $KAFKABROKERS --topic test --from-beginning
220203
```
221-
204+
222205
This command retrieves the records from the topic and displays them. Using `--from-beginning` tells the consumer to start from the beginning of the stream, so all records are retrieved.
223206

224-
If you are using an older version of Kafka, replace `--bootstrap-server $KAFKABROKERS` with `--zookeeper $KAFKAZKHOSTS`.
207+
If you're using an older version of Kafka, replace `--bootstrap-server $KAFKABROKERS` with `--zookeeper $KAFKAZKHOSTS`.
225208
226-
4. Use __Ctrl + C__ to stop the consumer.
209+
1. Use __Ctrl + C__ to stop the consumer.
227210
228211
You can also programmatically create producers and consumers. For an example of using this API, see the [Apache Kafka Producer and Consumer API with HDInsight](apache-kafka-producer-consumer-api.md) document.
229212
230213
## Clean up resources
231214
232-
If you wish to clean up the resources created by this quickstart, you can delete the resource group. Deleting the resource group also deletes the associated HDInsight cluster, and any other resources associated with the resource group.
215+
After you complete the quickstart, you may want to delete the cluster. With HDInsight, your data is stored in Azure Storage, so you can safely delete a cluster when it isn't in use. You're also charged for an HDInsight cluster, even when it isn't in use. Since the charges for the cluster are many times more than the charges for storage, it makes economic sense to delete clusters when they aren't in use.
233216

234-
To remove the resource group using the Azure portal:
217+
From the Azure portal, navigate to your cluster, and select **Delete**.
235218

236-
1. In the Azure portal, expand the menu on the left side to open the menu of services, and then choose __Resource Groups__ to display the list of your resource groups.
237-
2. Locate the resource group to delete, and then right-click the __More__ button (...) on the right side of the listing.
238-
3. Select __Delete resource group__, and then confirm.
219+
![Resource Manager template HBase](./media/apache-kafka-quickstart-resource-manager-template/azure-portal-delete-kafka.png)
239220

240-
> [!WARNING]
241-
> HDInsight cluster billing starts once a cluster is created and stops when the cluster is deleted. Billing is pro-rated per minute, so you should always delete your cluster when it is no longer in use.
242-
>
243-
> Deleting a Kafka on HDInsight cluster deletes any data stored in Kafka.
221+
You can also select the resource group name to open the resource group page, and then select **Delete resource group**. By deleting the resource group, you delete both the HDInsight cluster, and the default storage account.
244222

245223
## Next steps
246224

225+
In this quickstart, you learned how to create an Apache Kafka cluster in HDInsight using a Resource Manager template. In the next article, you learn how to create an application that uses the Apache Kafka Streams API and run it with Kafka on HDInsight.
226+
247227
> [!div class="nextstepaction"]
248-
> [Use Apache Spark with Apache Kafka](../hdinsight-apache-kafka-spark-structured-streaming.md)
228+
> [Use Apache Kafka streams API in Azure HDInsight](./apache-kafka-streams-api.md)
114 KB
Loading
Loading

0 commit comments

Comments
 (0)