You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/hdinsight/kafka/apache-kafka-quickstart-resource-manager-template.md
+64-84Lines changed: 64 additions & 84 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,114 +5,97 @@ author: hrasheed-msft
5
5
ms.author: hrasheed
6
6
ms.reviewer: jasonh
7
7
ms.service: hdinsight
8
-
ms.custom: mvc
9
8
ms.topic: quickstart
10
-
ms.date: 06/12/2019
9
+
ms.custom: subject-armqs
10
+
ms.date: 03/13/2020
11
11
#Customer intent: I need to create a Kafka cluster so that I can use it to process streaming data
12
12
---
13
13
14
14
# Quickstart: Create Apache Kafka cluster in Azure HDInsight using Resource Manager template
15
15
16
-
[Apache Kafka](https://kafka.apache.org/)is an open-source, distributed streaming platform. It's often used as a message broker, as it provides functionality similar to a publish-subscribe message queue.
16
+
In this quickstart, you use an Azure Resource Manager template to create an [Apache Kafka](./apache-kafka-introduction.md) cluster in Azure HDInsight. Kafka is an open-source, distributed streaming platform. It's often used as a message broker, as it provides functionality similar to a publish-subscribe message queue.
17
17
18
-
In this quickstart, you learn how to create an [Apache Kafka](https://kafka.apache.org) cluster using an Azure Resource Manager template. You also learn how to use included utilities to send and receive messages using Kafka. Similar templates can be viewed at [Azure quickstart templates](https://azure.microsoft.com/resources/templates/?resourceType=Microsoft.Hdinsight&pageNumber=1&sort=Popular). The template reference can be found [here](https://docs.microsoft.com/azure/templates/microsoft.hdinsight/allversions).
The Kafka API can only be accessed by resources inside the same virtual network. In this quickstart, you access the cluster directly using SSH. To connect other services, networks, or virtual machines to Kafka, you must first create a virtual network and then create the resources within the network. For more information, see the [Connect to Apache Kafka using a virtual network](apache-kafka-connect-vpn-gateway.md) document.
23
21
24
22
If you don't have an Azure subscription, create a [free account](https://azure.microsoft.com/free/?WT.mc_id=A261C142F) before you begin.
25
23
26
-
## Prerequisites
24
+
## Create an Apache Kafka cluster
27
25
28
-
An SSH client. For more information, see [Connect to HDInsight (Apache Hadoop) using SSH](../hdinsight-hadoop-linux-use-ssh-unix.md).
26
+
### Review the template
29
27
30
-
## Create an Apache Kafka cluster
28
+
The template used in this quickstart is from [Azure Quickstart templates](https://github.com/Azure/azure-quickstart-templates/tree/master/101-hdinsight-kafka).
31
29
32
-
1. Click the following image to open the template in the Azure portal.
<ahref="https://portal.azure.com/#create/Microsoft.Template/uri/https%3A%2F%2Fraw.githubusercontent.com%2FAzure-Samples%2Fhdinsight-kafka-java-get-started%2Fmaster%2Fazuredeploy.json"target="_blank"><imgsrc="./media/apache-kafka-quickstart-resource-manager-template/hdi-deploy-to-azure1.png"alt="Deploy to Azure button for new cluster"></a>
32
+
Two Azure resources are defined in the template:
35
33
36
-
2. To create the Kafka cluster, use the following values:
34
+
*[Microsoft.Storage/storageAccounts](https://docs.microsoft.com/azure/templates/microsoft.storage/storageaccounts): create an Azure Storage Account.
35
+
*[Microsoft.HDInsight/cluster](https://docs.microsoft.com/azure/templates/microsoft.hdinsight/clusters): create an HDInsight cluster.
37
36
38
-
| Property | Value |
39
-
| --- | --- |
40
-
| Subscription | Your Azure subscription. |
41
-
| Resource group | The resource group that the cluster is created in. |
42
-
| Location | The Azure region that the cluster is created in. |
43
-
| Cluster Name | The name of the Kafka cluster. |
44
-
| Cluster Login User Name | The account name used to login to HTTPs-based services on hosted on the cluster. |
45
-
| Cluster Login Password | The password for the login user name. |
46
-
| SSH User Name | The SSH user name. This account can access the cluster using SSH. |
47
-
| SSH Password | The password for the SSH user. |
37
+
### Deploy the template
48
38
49
-

39
+
1. Select the **Deploy to Azure** button below to sign in to Azure and open the Resource Manager template.
50
40
51
-
3. Select **I agree to the terms and conditions stated above**, select **Pin to dashboard**, and then click **Purchase**. It can take up to 20 minutes to create the cluster.
41
+
<ahref="https://portal.azure.com/#create/Microsoft.Template/uri/https%3A%2F%2Fraw.githubusercontent.com%2FAzure-Samples%2Fhdinsight-kafka-java-get-started%2Fmaster%2Fazuredeploy.json"target="_blank"><imgsrc="./media/apache-kafka-quickstart-resource-manager-template/hdi-deploy-to-azure1.png"alt="Deploy to Azure button for new cluster"></a>
52
42
53
-
## Connect to the cluster
43
+
1. Enter or select the following values:
54
44
55
-
1. To connect to the primary head node of the Kafka cluster, use the following command. Replace `sshuser` with the SSH user name. Replace `mykafka` with the name of your Kafka cluster
45
+
|Property |Description |
46
+
|---|---|
47
+
|Subscription|From the drop-down list, select the Azure subscription that's used for the cluster.|
48
+
|Resource group|From the drop-down list, select your existing resource group, or select **Create new**.|
49
+
|Location|The value will autopopulate with the location used for the resource group.|
50
+
|Cluster Name|Enter a globally unique name. For this template, use only lowercase letters, and numbers.|
51
+
|Cluster Login User Name|Provide the username, default is **admin**.|
52
+
|Cluster Login Password|Provide a password. The password must be at least 10 characters in length and must contain at least one digit, one uppercase, and one lower case letter, one non-alphanumeric character (except characters ' " ` ). |
53
+
|Ssh User Name|Provide the username, default is **sshuser**|

60
57
61
-
2. When you first connect to the cluster, your SSH client may display a warning that the authenticity of the host can't be established. When prompted type __yes__, and then press __Enter__ to add the host to your SSH client's trusted server list.
62
-
63
-
3. When prompted, enter the password for the SSH user.
64
-
65
-
Once connected, you see information similar to the following text:
66
-
67
-
```output
68
-
Authorized uses only. All activity may be monitored and reported.
69
-
Welcome to Ubuntu 16.04.4 LTS (GNU/Linux 4.13.0-1011-azure x86_64)
70
-
71
-
* Documentation: https://help.ubuntu.com
72
-
* Management: https://landscape.canonical.com
73
-
* Support: https://ubuntu.com/advantage
74
-
75
-
Get cloud support with Ubuntu Advantage Cloud Guest:
76
-
https://www.ubuntu.com/business/services/cloud
77
-
78
-
83 packages can be updated.
79
-
37 updates are security updates.
80
-
81
-
82
-
Welcome to Kafka on HDInsight.
83
-
84
-
Last login: Thu Mar 29 13:25:27 2018 from 108.252.109.241
85
-
```
58
+
1. Review the **TERMS AND CONDITIONS**. Then select **I agree to the terms and conditions stated above**, then **Purchase**. You'll receive a notification that your deployment is in progress. It takes about 20 minutes to create a cluster.
59
+
60
+
## Review deployed resources
86
61
87
-
## <a id="getkafkainfo"></a>Get the Apache Zookeeper and Broker host information
62
+
Once the cluster is created, you'll receive a **Deployment succeeded** notification with a **Go to resource** link. Your Resource group page will list your new HDInsight cluster and the default storage associated with the cluster. Each cluster has an [Azure Storage](../hdinsight-hadoop-use-blob-storage.md) account or an [Azure Data Lake Storage account](../hdinsight-hadoop-use-data-lake-store.md) dependency. It's referred as the default storage account. The HDInsight cluster and its default storage account must be colocated in the same Azure region. Deleting clusters doesn't delete the storage account.
63
+
64
+
## Get the Apache Zookeeper and Broker host information
88
65
89
66
When working with Kafka, you must know the *Apache Zookeeper* and *Broker* hosts. These hosts are used with the Kafka API and many of the utilities that ship with Kafka.
90
67
91
68
In this section, you get the host information from the Ambari REST API on the cluster.
92
69
93
-
1. From the SSH connection to the cluster, use the following command to install the `jq` utility. This utility is used to parse JSON documents, and is useful in retrieving the host information:
94
-
70
+
1. Use [ssh command](../hdinsight-hadoop-linux-use-ssh-unix.md) to connect to your cluster. Edit the command below by replacing CLUSTERNAME with the name of your cluster, and then enter the command:
1. From the SSH connection, use the following command to install the `jq` utility. This utility is used to parse JSON documents, and is useful in retrieving the host information:
77
+
95
78
```bash
96
79
sudo apt -y install jq
97
80
```
98
81
99
-
2. To set an environment variable to the cluster name, use the following command:
82
+
1. To set an environment variable to the cluster name, use the following command:
100
83
101
84
```bash
102
85
read -p "Enter the Kafka on HDInsight cluster name: " CLUSTERNAME
103
86
```
104
87
105
88
When prompted, enter the name of the Kafka cluster.
106
89
107
-
3. To set an environment variable with Zookeeper host information, use the command below. The command retrieves all Zookeeper hosts, then returns only the first two entries. This is because you want some redundancy incase one host is unreachable.
90
+
1. To set an environment variable with Zookeeper host information, use the command below. The command retrieves all Zookeeper hosts, then returns only the first two entries. This is because you want some redundancy in case one host is unreachable.
@@ -150,7 +133,7 @@ Kafka stores streams of data in *topics*. You can use the `kafka-topics.sh` util
150
133
/usr/hdp/current/kafka-broker/bin/kafka-topics.sh --create --replication-factor 3 --partitions 8 --topic test --zookeeper $KAFKAZKHOSTS
151
134
```
152
135
153
-
This command connects to Zookeeper using the host information stored in `$KAFKAZKHOSTS`. It then creates a Kafka topic named **test**.
136
+
This command connects to Zookeeper using the host information stored in `$KAFKAZKHOSTS`. It then creates a Kafka topic named **test**.
154
137
155
138
* Data stored in this topic is partitioned across eight partitions.
156
139
@@ -162,7 +145,7 @@ Kafka stores streams of data in *topics*. You can use the `kafka-topics.sh` util
162
145
163
146
For information on the number of fault domains in a region, see the [Availability of Linux virtual machines](../../virtual-machines/windows/manage-availability.md#use-managed-disks-for-vms-in-an-availability-set) document.
164
147
165
-
Kafka is not aware of Azure fault domains. When creating partition replicas for topics, it may not distribute replicas properly for high availability.
148
+
Kafka isn't aware of Azure fault domains. When creating partition replicas for topics, it may not distribute replicas properly for high availability.
166
149
167
150
To ensure high availability, use the [Apache Kafka partition rebalance tool](https://github.com/hdinsight/hdinsight-kafka-tools). This tool must be ran from an SSH connection to the head node of your Kafka cluster.
168
151
@@ -204,45 +187,42 @@ Kafka stores *records* in topics. Records are produced by *producers*, and consu
204
187
To store records into the test topic you created earlier, and then read them using a consumer, use the following steps:
205
188
206
189
1. To write records to the topic, use the `kafka-console-producer.sh` utility from the SSH connection:
207
-
190
+
208
191
```bash
209
192
/usr/hdp/current/kafka-broker/bin/kafka-console-producer.sh --broker-list $KAFKABROKERS --topic test
210
193
```
211
-
194
+
212
195
After this command, you arrive at an empty line.
213
196
214
-
2. Type a text message on the empty line and hit enter. Enter a few messages this way, and then use **Ctrl + C** to return to the normal prompt. Each line is sent as a separate record to the Kafka topic.
197
+
1. Type a text message on the empty line and hit enter. Enter a few messages this way, and then use **Ctrl + C** to return to the normal prompt. Each line is sent as a separate record to the Kafka topic.
198
+
199
+
1. To read records from the topic, use the `kafka-console-consumer.sh` utility from the SSH connection:
215
200
216
-
3. To read records from the topic, use the `kafka-console-consumer.sh` utility from the SSH connection:
217
-
218
201
```bash
219
202
/usr/hdp/current/kafka-broker/bin/kafka-console-consumer.sh --bootstrap-server $KAFKABROKERS --topic test --from-beginning
220
203
```
221
-
204
+
222
205
This command retrieves the records from the topic and displays them. Using `--from-beginning` tells the consumer to start from the beginning of the stream, so all records are retrieved.
223
206
224
-
If you are using an older version of Kafka, replace `--bootstrap-server $KAFKABROKERS` with `--zookeeper $KAFKAZKHOSTS`.
207
+
If you're using an older version of Kafka, replace `--bootstrap-server $KAFKABROKERS` with `--zookeeper $KAFKAZKHOSTS`.
225
208
226
-
4. Use __Ctrl + C__ to stop the consumer.
209
+
1. Use __Ctrl + C__ to stop the consumer.
227
210
228
211
You can also programmatically create producers and consumers. For an example of using this API, see the [Apache Kafka Producer and Consumer API with HDInsight](apache-kafka-producer-consumer-api.md) document.
229
212
230
213
## Clean up resources
231
214
232
-
If you wish to clean up the resources created by this quickstart, you can delete the resource group. Deleting the resource group also deletes the associated HDInsight cluster, and any other resources associated with the resource group.
215
+
After you complete the quickstart, you may want to delete the cluster. With HDInsight, your data is stored in Azure Storage, so you can safely delete a cluster when it isn't in use. You're also charged for an HDInsight cluster, even when it isn't in use. Since the charges forthe cluster are many times more than the charges for storage, it makes economic sense to delete clusters when they aren'tin use.
233
216
234
-
To remove the resource group using the Azure portal:
217
+
From the Azure portal, navigate to your cluster, and select**Delete**.
235
218
236
-
1. In the Azure portal, expand the menu on the left side to open the menu of services, and then choose __Resource Groups__ to display the list of your resource groups.
237
-
2. Locate the resource group to delete, and then right-click the __More__ button (...) on the right side of the listing.
238
-
3. Select __Delete resource group__, and then confirm.
> HDInsight cluster billing starts once a cluster is created and stops when the cluster is deleted. Billing is pro-rated per minute, so you should always delete your cluster when it is no longer in use.
242
-
>
243
-
> Deleting a Kafka on HDInsight cluster deletes any data stored in Kafka.
221
+
You can also selectthe resource group name to open the resource group page, and thenselect**Delete resource group**. By deleting the resource group, you delete both the HDInsight cluster, and the default storage account.
244
222
245
223
## Next steps
246
224
225
+
In this quickstart, you learned how to create an Apache Kafka cluster in HDInsight using a Resource Manager template. In the next article, you learn how to create an application that uses the Apache Kafka Streams API and run it with Kafka on HDInsight.
226
+
247
227
> [!div class="nextstepaction"]
248
-
> [Use Apache Spark with Apache Kafka](../hdinsight-apache-kafka-spark-structured-streaming.md)
228
+
> [Use Apache Kafka streams API in Azure HDInsight](./apache-kafka-streams-api.md)
0 commit comments