Skip to content

Commit f43e0da

Browse files
committed
kafka3
1 parent 5872334 commit f43e0da

File tree

5 files changed

+68
-86
lines changed

5 files changed

+68
-86
lines changed

articles/hdinsight/TOC.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -545,8 +545,8 @@
545545
href: ./kafka/apache-kafka-get-started.md
546546
- name: Create Apache Kafka cluster - PowerShell
547547
href: ./kafka/apache-kafka-quickstart-powershell.md
548-
- name: Create Apache Kafka cluster - Template
549-
displayName: resource manager template, arm template, resource manager group
548+
- name: Create Apache Kafka cluster - ARM Template
549+
displayName: Resource Manager
550550
href: ./kafka/apache-kafka-quickstart-resource-manager-template.md
551551
- name: Tutorials
552552
items:

articles/hdinsight/kafka/apache-kafka-quickstart-resource-manager-template.md

Lines changed: 66 additions & 84 deletions
Original file line numberDiff line numberDiff line change
@@ -5,114 +5,99 @@ author: hrasheed-msft
55
ms.author: hrasheed
66
ms.reviewer: jasonh
77
ms.service: hdinsight
8-
ms.custom: mvc
98
ms.topic: quickstart
10-
ms.date: 06/12/2019
9+
ms.custom: subject-armqs
10+
ms.date: 03/13/2020
1111
#Customer intent: I need to create a Kafka cluster so that I can use it to process streaming data
1212
---
1313

1414
# Quickstart: Create Apache Kafka cluster in Azure HDInsight using Resource Manager template
1515

16-
[Apache Kafka](https://kafka.apache.org/) is an open-source, distributed streaming platform. It's often used as a message broker, as it provides functionality similar to a publish-subscribe message queue.
16+
In this quickstart, you use an Azure Resource Manager template to create an [Apache Kafka](./apache-kafka-introduction.md) cluster in Azure HDInsight. Kafka is an open-source, distributed streaming platform. It's often used as a message broker, as it provides functionality similar to a publish-subscribe message queue.
1717

18-
In this quickstart, you learn how to create an [Apache Kafka](https://kafka.apache.org) cluster using an Azure Resource Manager template. You also learn how to use included utilities to send and receive messages using Kafka. Similar templates can be viewed at [Azure quickstart templates](https://azure.microsoft.com/resources/templates/?resourceType=Microsoft.Hdinsight&pageNumber=1&sort=Popular). The template reference can be found [here](https://docs.microsoft.com/azure/templates/microsoft.hdinsight/allversions).
19-
20-
[!INCLUDE [delete-cluster-warning](../../../includes/hdinsight-delete-cluster-warning.md)]
18+
[!INCLUDE [About Azure Resource Manager](../../../includes/resource-manager-quickstart-introduction.md)]
2119

2220
The Kafka API can only be accessed by resources inside the same virtual network. In this quickstart, you access the cluster directly using SSH. To connect other services, networks, or virtual machines to Kafka, you must first create a virtual network and then create the resources within the network. For more information, see the [Connect to Apache Kafka using a virtual network](apache-kafka-connect-vpn-gateway.md) document.
2321

2422
If you don't have an Azure subscription, create a [free account](https://azure.microsoft.com/free/?WT.mc_id=A261C142F) before you begin.
2523

26-
## Prerequisites
24+
## Create an Apache Kafka cluster
2725

28-
An SSH client. For more information, see [Connect to HDInsight (Apache Hadoop) using SSH](../hdinsight-hadoop-linux-use-ssh-unix.md).
26+
### Review the template
2927

30-
## Create an Apache Kafka cluster
28+
The template used in this quickstart is from [Azure Quickstart templates](https://github.com/Azure/azure-quickstart-templates/tree/master/101-hdinsight-kafka).
3129

32-
1. Click the following image to open the template in the Azure portal.
30+
:::code language="json" source="~/quickstart-templates/101-hdinsight-kafka/azuredeploy.json" range="1-150":::
3331

34-
<a href="https://portal.azure.com/#create/Microsoft.Template/uri/https%3A%2F%2Fraw.githubusercontent.com%2FAzure-Samples%2Fhdinsight-kafka-java-get-started%2Fmaster%2Fazuredeploy.json" target="_blank"><img src="./media/apache-kafka-quickstart-resource-manager-template/hdi-deploy-to-azure1.png" alt="Deploy to Azure button for new cluster"></a>
32+
The mapping is defined in the `openpublishing.publish.config` file.
3533

36-
2. To create the Kafka cluster, use the following values:
34+
Two Azure resources are defined in the template:
3735

38-
| Property | Value |
39-
| --- | --- |
40-
| Subscription | Your Azure subscription. |
41-
| Resource group | The resource group that the cluster is created in. |
42-
| Location | The Azure region that the cluster is created in. |
43-
| Cluster Name | The name of the Kafka cluster. |
44-
| Cluster Login User Name | The account name used to login to HTTPs-based services on hosted on the cluster. |
45-
| Cluster Login Password | The password for the login user name. |
46-
| SSH User Name | The SSH user name. This account can access the cluster using SSH. |
47-
| SSH Password | The password for the SSH user. |
36+
* [Microsoft.Storage/storageAccounts](https://docs.microsoft.com/azure/templates/microsoft.storage/storageaccounts): create an Azure Storage Account.
37+
* [Microsoft.HDInsight/cluster](https://docs.microsoft.com/azure/templates/microsoft.hdinsight/clusters): create an HDInsight cluster.
4838

49-
![A screenshot of the template properties](./media/apache-kafka-quickstart-resource-manager-template/kafka-template-parameters.png)
39+
### Deploy the template
5040

51-
3. Select **I agree to the terms and conditions stated above**, select **Pin to dashboard**, and then click **Purchase**. It can take up to 20 minutes to create the cluster.
41+
1. Select the **Deploy to Azure** button below to sign in to Azure and open the Resource Manager template.
5242

53-
## Connect to the cluster
43+
<a href="https://portal.azure.com/#create/Microsoft.Template/uri/https%3A%2F%2Fraw.githubusercontent.com%2FAzure-Samples%2Fhdinsight-kafka-java-get-started%2Fmaster%2Fazuredeploy.json" target="_blank"><img src="./media/apache-kafka-quickstart-resource-manager-template/hdi-deploy-to-azure1.png" alt="Deploy to Azure button for new cluster"></a>
5444

55-
1. To connect to the primary head node of the Kafka cluster, use the following command. Replace `sshuser` with the SSH user name. Replace `mykafka` with the name of your Kafka cluster
45+
1. Enter or select the following values:
5646

57-
```bash
58-
59-
```
47+
|Property |Description |
48+
|---|---|
49+
|Subscription|From the drop-down list, select the Azure subscription that's used for the cluster.|
50+
|Resource group|From the drop-down list, select your existing resource group, or select **Create new**.|
51+
|Location|The value will autopopulate with the location used for the resource group.|
52+
|Cluster Name|Enter a globally unique name. For this template, use only lowercase letters, and numbers.|
53+
|Cluster Login User Name|Provide the username, default is **admin**.|
54+
|Cluster Login Password|Provide a password. The password must be at least 10 characters in length and must contain at least one digit, one uppercase, and one lower case letter, one non-alphanumeric character (except characters ' " ` ). |
55+
|Ssh User Name|Provide the username, default is **sshuser**|
56+
|Ssh Password|Provide the password.|
6057

61-
2. When you first connect to the cluster, your SSH client may display a warning that the authenticity of the host can't be established. When prompted type __yes__, and then press __Enter__ to add the host to your SSH client's trusted server list.
62-
63-
3. When prompted, enter the password for the SSH user.
64-
65-
Once connected, you see information similar to the following text:
66-
67-
```output
68-
Authorized uses only. All activity may be monitored and reported.
69-
Welcome to Ubuntu 16.04.4 LTS (GNU/Linux 4.13.0-1011-azure x86_64)
70-
71-
* Documentation: https://help.ubuntu.com
72-
* Management: https://landscape.canonical.com
73-
* Support: https://ubuntu.com/advantage
74-
75-
Get cloud support with Ubuntu Advantage Cloud Guest:
76-
https://www.ubuntu.com/business/services/cloud
77-
78-
83 packages can be updated.
79-
37 updates are security updates.
80-
81-
82-
Welcome to Kafka on HDInsight.
83-
84-
Last login: Thu Mar 29 13:25:27 2018 from 108.252.109.241
85-
```
58+
![A screenshot of the template properties](./media/apache-kafka-quickstart-resource-manager-template/resource-manager-template-kafka.png)
59+
60+
1. Review the **TERMS AND CONDITIONS**. Then select **I agree to the terms and conditions stated above**, then **Purchase**. You'll receive a notification that your deployment is in progress. It takes about 20 minutes to create a cluster.
61+
62+
## Review deployed resources
63+
64+
Once the cluster is created, you'll receive a **Deployment succeeded** notification with a **Go to resource** link. Your Resource group page will list your new HDInsight cluster and the default storage associated with the cluster. Each cluster has an [Azure Storage](../hdinsight-hadoop-use-blob-storage.md) account or an [Azure Data Lake Storage account](../hdinsight-hadoop-use-data-lake-store.md) dependency. It's referred as the default storage account. The HDInsight cluster and its default storage account must be colocated in the same Azure region. Deleting clusters doesn't delete the storage account.
8665

87-
## <a id="getkafkainfo"></a>Get the Apache Zookeeper and Broker host information
66+
## Get the Apache Zookeeper and Broker host information
8867

8968
When working with Kafka, you must know the *Apache Zookeeper* and *Broker* hosts. These hosts are used with the Kafka API and many of the utilities that ship with Kafka.
9069

9170
In this section, you get the host information from the Ambari REST API on the cluster.
9271

93-
1. From the SSH connection to the cluster, use the following command to install the `jq` utility. This utility is used to parse JSON documents, and is useful in retrieving the host information:
94-
72+
1. Use [ssh command](../hdinsight-hadoop-linux-use-ssh-unix.md) to connect to your cluster. Edit the command below by replacing CLUSTERNAME with the name of your cluster, and then enter the command:
73+
74+
```cmd
75+
76+
```
77+
78+
1. From the SSH connection, use the following command to install the `jq` utility. This utility is used to parse JSON documents, and is useful in retrieving the host information:
79+
9580
```bash
9681
sudo apt -y install jq
9782
```
9883
99-
2. To set an environment variable to the cluster name, use the following command:
84+
1. To set an environment variable to the cluster name, use the following command:
10085
10186
```bash
10287
read -p "Enter the Kafka on HDInsight cluster name: " CLUSTERNAME
10388
```
10489
10590
When prompted, enter the name of the Kafka cluster.
10691
107-
3. To set an environment variable with Zookeeper host information, use the command below. The command retrieves all Zookeeper hosts, then returns only the first two entries. This is because you want some redundancy in case one host is unreachable.
92+
1. To set an environment variable with Zookeeper host information, use the command below. The command retrieves all Zookeeper hosts, then returns only the first two entries. This is because you want some redundancy in case one host is unreachable.
10893
10994
```bash
11095
export KAFKAZKHOSTS=`curl -sS -u admin -G https://$CLUSTERNAME.azurehdinsight.net/api/v1/clusters/$CLUSTERNAME/services/ZOOKEEPER/components/ZOOKEEPER_SERVER | jq -r '["\(.host_components[].HostRoles.host_name):2181"] | join(",")' | cut -d',' -f1,2`
11196
```
11297
11398
When prompted, enter the password for the cluster login account (not the SSH account).
11499
115-
4. To verify that the environment variable is set correctly, use the following command:
100+
1. To verify that the environment variable is set correctly, use the following command:
116101
117102
```bash
118103
echo '$KAFKAZKHOSTS='$KAFKAZKHOSTS
@@ -122,22 +107,22 @@ In this section, you get the host information from the Ambari REST API on the cl
122107
123108
`zk0-kafka.eahjefxxp1netdbyklgqj5y1ud.ex.internal.cloudapp.net:2181,zk2-kafka.eahjefxxp1netdbyklgqj5y1ud.ex.internal.cloudapp.net:2181`
124109
125-
5. To set an environment variable with Kafka broker host information, use the following command:
110+
1. To set an environment variable with Kafka broker host information, use the following command:
126111
127112
```bash
128113
export KAFKABROKERS=`curl -sS -u admin -G https://$CLUSTERNAME.azurehdinsight.net/api/v1/clusters/$CLUSTERNAME/services/KAFKA/components/KAFKA_BROKER | jq -r '["\(.host_components[].HostRoles.host_name):9092"] | join(",")' | cut -d',' -f1,2`
129114
```
130115
131116
When prompted, enter the password for the cluster login account (not the SSH account).
132117
133-
6. To verify that the environment variable is set correctly, use the following command:
118+
1. To verify that the environment variable is set correctly, use the following command:
134119
135-
```bash
120+
```bash
136121
echo '$KAFKABROKERS='$KAFKABROKERS
137122
```
138123
139124
This command returns information similar to the following text:
140-
125+
141126
`wn1-kafka.eahjefxxp1netdbyklgqj5y1ud.cx.internal.cloudapp.net:9092,wn0-kafka.eahjefxxp1netdbyklgqj5y1ud.cx.internal.cloudapp.net:9092`
142127
143128
## Manage Apache Kafka topics
@@ -150,7 +135,7 @@ Kafka stores streams of data in *topics*. You can use the `kafka-topics.sh` util
150135
/usr/hdp/current/kafka-broker/bin/kafka-topics.sh --create --replication-factor 3 --partitions 8 --topic test --zookeeper $KAFKAZKHOSTS
151136
```
152137
153-
This command connects to Zookeeper using the host information stored in `$KAFKAZKHOSTS`. It then creates a Kafka topic named **test**.
138+
This command connects to Zookeeper using the host information stored in `$KAFKAZKHOSTS`. It then creates a Kafka topic named **test**.
154139
155140
* Data stored in this topic is partitioned across eight partitions.
156141
@@ -162,7 +147,7 @@ Kafka stores streams of data in *topics*. You can use the `kafka-topics.sh` util
162147
163148
For information on the number of fault domains in a region, see the [Availability of Linux virtual machines](../../virtual-machines/windows/manage-availability.md#use-managed-disks-for-vms-in-an-availability-set) document.
164149
165-
Kafka is not aware of Azure fault domains. When creating partition replicas for topics, it may not distribute replicas properly for high availability.
150+
Kafka isn't aware of Azure fault domains. When creating partition replicas for topics, it may not distribute replicas properly for high availability.
166151
167152
To ensure high availability, use the [Apache Kafka partition rebalance tool](https://github.com/hdinsight/hdinsight-kafka-tools). This tool must be ran from an SSH connection to the head node of your Kafka cluster.
168153
@@ -204,45 +189,42 @@ Kafka stores *records* in topics. Records are produced by *producers*, and consu
204189
To store records into the test topic you created earlier, and then read them using a consumer, use the following steps:
205190

206191
1. To write records to the topic, use the `kafka-console-producer.sh` utility from the SSH connection:
207-
192+
208193
```bash
209194
/usr/hdp/current/kafka-broker/bin/kafka-console-producer.sh --broker-list $KAFKABROKERS --topic test
210195
```
211-
196+
212197
After this command, you arrive at an empty line.
213198

214-
2. Type a text message on the empty line and hit enter. Enter a few messages this way, and then use **Ctrl + C** to return to the normal prompt. Each line is sent as a separate record to the Kafka topic.
199+
1. Type a text message on the empty line and hit enter. Enter a few messages this way, and then use **Ctrl + C** to return to the normal prompt. Each line is sent as a separate record to the Kafka topic.
200+
201+
1. To read records from the topic, use the `kafka-console-consumer.sh` utility from the SSH connection:
215202

216-
3. To read records from the topic, use the `kafka-console-consumer.sh` utility from the SSH connection:
217-
218203
```bash
219204
/usr/hdp/current/kafka-broker/bin/kafka-console-consumer.sh --bootstrap-server $KAFKABROKERS --topic test --from-beginning
220205
```
221-
206+
222207
This command retrieves the records from the topic and displays them. Using `--from-beginning` tells the consumer to start from the beginning of the stream, so all records are retrieved.
223208

224-
If you are using an older version of Kafka, replace `--bootstrap-server $KAFKABROKERS` with `--zookeeper $KAFKAZKHOSTS`.
209+
If you're using an older version of Kafka, replace `--bootstrap-server $KAFKABROKERS` with `--zookeeper $KAFKAZKHOSTS`.
225210
226-
4. Use __Ctrl + C__ to stop the consumer.
211+
1. Use __Ctrl + C__ to stop the consumer.
227212
228213
You can also programmatically create producers and consumers. For an example of using this API, see the [Apache Kafka Producer and Consumer API with HDInsight](apache-kafka-producer-consumer-api.md) document.
229214
230215
## Clean up resources
231216
232-
If you wish to clean up the resources created by this quickstart, you can delete the resource group. Deleting the resource group also deletes the associated HDInsight cluster, and any other resources associated with the resource group.
217+
After you complete the quickstart, you may want to delete the cluster. With HDInsight, your data is stored in Azure Storage, so you can safely delete a cluster when it isn't in use. You're also charged for an HDInsight cluster, even when it isn't in use. Since the charges for the cluster are many times more than the charges for storage, it makes economic sense to delete clusters when they aren't in use.
233218

234-
To remove the resource group using the Azure portal:
219+
From the Azure portal, navigate to your cluster, and select **Delete**.
235220

236-
1. In the Azure portal, expand the menu on the left side to open the menu of services, and then choose __Resource Groups__ to display the list of your resource groups.
237-
2. Locate the resource group to delete, and then right-click the __More__ button (...) on the right side of the listing.
238-
3. Select __Delete resource group__, and then confirm.
221+
![Resource Manager template HBase](./media/apache-kafka-quickstart-resource-manager-template/azure-portal-delete-kafka.png)
239222

240-
> [!WARNING]
241-
> HDInsight cluster billing starts once a cluster is created and stops when the cluster is deleted. Billing is pro-rated per minute, so you should always delete your cluster when it is no longer in use.
242-
>
243-
> Deleting a Kafka on HDInsight cluster deletes any data stored in Kafka.
223+
You can also select the resource group name to open the resource group page, and then select **Delete resource group**. By deleting the resource group, you delete both the HDInsight cluster, and the default storage account.
244224

245225
## Next steps
246226

227+
In this quickstart, you learned how to create an Apache Kafka cluster in HDInsight using a Resource Manager template. In the next article, you learn how to create an application that uses the Apache Kafka Streams API and run it with Kafka on HDInsight.
228+
247229
> [!div class="nextstepaction"]
248-
> [Use Apache Spark with Apache Kafka](../hdinsight-apache-kafka-spark-structured-streaming.md)
230+
> [Use Apache Kafka streams API in Azure HDInsight](../apache-kafka-streams-api.md)
114 KB
Loading
Loading

0 commit comments

Comments
 (0)