Skip to content

Commit d015f1b

Browse files
authored
Merge pull request #79540 from dagiro/mvc11
mvc11
2 parents 1a826cb + 4bf4095 commit d015f1b

File tree

1 file changed

+9
-24
lines changed

1 file changed

+9
-24
lines changed

articles/hdinsight/kafka/apache-kafka-quickstart-powershell.md

Lines changed: 9 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -19,10 +19,7 @@ In this quickstart, you learn how to create an [Apache Kafka](https://kafka.apac
1919

2020
[!INCLUDE [delete-cluster-warning](../../../includes/hdinsight-delete-cluster-warning.md)]
2121

22-
> [!IMPORTANT]
23-
> The Kafka API can only be accessed by resources inside the same virtual network. In this quickstart, you access the cluster directly using SSH. To connect other services, networks, or virtual machines to Kafka, you must first create a virtual network and then create the resources within the network.
24-
>
25-
> For more information, see the [Connect to Apache Kafka using a virtual network](apache-kafka-connect-vpn-gateway.md) document.
22+
The Kafka API can only be accessed by resources inside the same virtual network. In this quickstart, you access the cluster directly using SSH. To connect other services, networks, or virtual machines to Kafka, you must first create a virtual network and then create the resources within the network. For more information, see the [Connect to Apache Kafka using a virtual network](apache-kafka-connect-vpn-gateway.md) document.
2623

2724
If you don't have an Azure subscription, create a [free account](https://azure.microsoft.com/free/?WT.mc_id=A261C142F) before you begin.
2825

@@ -128,19 +125,13 @@ New-AzHDInsightCluster `
128125
-DisksPerWorkerNode $disksPerNode
129126
```
130127

131-
> [!WARNING]
132-
> It can take up to 20 minutes to create the HDInsight cluster.
128+
It can take up to 20 minutes to create the HDInsight cluster.
133129

134-
> [!TIP]
135-
> The `-DisksPerWorkerNode` parameter configures the scalability of Kafka on HDInsight. Kafka on HDInsight uses the local disk of the virtual machines in the cluster to store data. Kafka is I/O heavy, so [Azure Managed Disks](../../virtual-machines/windows/managed-disks-overview.md) are used to provide high throughput and more storage per node.
136-
>
137-
> The type of managed disk can be either __Standard__ (HDD) or __Premium__ (SSD). The type of disk depends on the VM size used by the worker nodes (Kafka brokers). Premium disks are used automatically with DS and GS series VMs. All other VM types use standard. You can set the VM type by using the `-WorkerNodeSize` parameter. For more information on parameters, see the [New-AzHDInsightCluster](/powershell/module/az.HDInsight/New-azHDInsightCluster) documentation.
130+
The `-DisksPerWorkerNode` parameter configures the scalability of Kafka on HDInsight. Kafka on HDInsight uses the local disk of the virtual machines in the cluster to store data. Kafka is I/O heavy, so [Azure Managed Disks](../../virtual-machines/windows/managed-disks-overview.md) are used to provide high throughput and more storage per node.
138131

132+
The type of managed disk can be either __Standard__ (HDD) or __Premium__ (SSD). The type of disk depends on the VM size used by the worker nodes (Kafka brokers). Premium disks are used automatically with DS and GS series VMs. All other VM types use standard. You can set the VM type by using the `-WorkerNodeSize` parameter. For more information on parameters, see the [New-AzHDInsightCluster](/powershell/module/az.HDInsight/New-azHDInsightCluster) documentation.
139133

140-
> [!IMPORTANT]
141-
> If you plan to use more than 32 worker nodes (either at cluster creation or by scaling the cluster after creation), you must use the `-HeadNodeSize` parameter to specify a VM size with at least 8 cores and 14 GB of RAM.
142-
>
143-
> For more information on node sizes and associated costs, see [HDInsight pricing](https://azure.microsoft.com/pricing/details/hdinsight/).
134+
If you plan to use more than 32 worker nodes (either at cluster creation or by scaling the cluster after creation), you must use the `-HeadNodeSize` parameter to specify a VM size with at least 8 cores and 14 GB of RAM. For more information on node sizes and associated costs, see [HDInsight pricing](https://azure.microsoft.com/pricing/details/hdinsight/).
144135

145136
## Connect to the cluster
146137

@@ -198,17 +189,14 @@ In this section, you get the host information from the Apache Ambari REST API on
198189

199190
When prompted, enter the name of the Kafka cluster.
200191

201-
3. To set an environment variable with Zookeeper host information, use the following command:
192+
3. To set an environment variable with Zookeeper host information, use the command below. The command retrieves all Zookeeper hosts, then returns only the first two entries. This is because you want some redundancy in case one host is unreachable.
202193

203194
```bash
204195
export KAFKAZKHOSTS=`curl -sS -u admin -G https://$CLUSTERNAME.azurehdinsight.net/api/v1/clusters/$CLUSTERNAME/services/ZOOKEEPER/components/ZOOKEEPER_SERVER | jq -r '["\(.host_components[].HostRoles.host_name):2181"] | join(",")' | cut -d',' -f1,2`
205196
```
206197

207198
When prompted, enter the password for the cluster login account (not the SSH account).
208199

209-
> [!NOTE]
210-
> This command retrieves all Zookeeper hosts, then returns only the first two entries. This is because you want some redundancy in case one host is unreachable.
211-
212200
4. To verify that the environment variable is set correctly, use the following command:
213201

214202
```bash
@@ -253,15 +241,13 @@ Kafka stores streams of data in *topics*. You can use the `kafka-topics.sh` util
253241

254242
* Each partition is replicated across three worker nodes in the cluster.
255243

256-
> [!IMPORTANT]
257-
> If you created the cluster in an Azure region that provides three fault domains, use a replication factor of 3. Otherwise, use a replication factor of 4.
244+
If you created the cluster in an Azure region that provides three fault domains, use a replication factor of 3. Otherwise, use a replication factor of 4.
258245

259246
In regions with three fault domains, a replication factor of 3 allows replicas to be spread across the fault domains. In regions with two fault domains, a replication factor of four spreads the replicas evenly across the domains.
260247

261248
For information on the number of fault domains in a region, see the [Availability of Linux virtual machines](../../virtual-machines/windows/manage-availability.md#use-managed-disks-for-vms-in-an-availability-set) document.
262249

263-
> [!IMPORTANT]
264-
> Kafka is not aware of Azure fault domains. When creating partition replicas for topics, it may not distribute replicas properly for high availability.
250+
Kafka is not aware of Azure fault domains. When creating partition replicas for topics, it may not distribute replicas properly for high availability.
265251

266252
To ensure high availability, use the [Apache Kafka partition rebalance tool](https://github.com/hdinsight/hdinsight-kafka-tools). This tool must be ran from an SSH connection to the head node of your Kafka cluster.
267253

@@ -320,8 +306,7 @@ To store records into the test topic you created earlier, and then read them usi
320306

321307
This command retrieves the records from the topic and displays them. Using `--from-beginning` tells the consumer to start from the beginning of the stream, so all records are retrieved.
322308

323-
> [!NOTE]
324-
> If you are using an older version of Kafka, replace `--bootstrap-server $KAFKABROKERS` with `--zookeeper $KAFKAZKHOSTS`.
309+
If you are using an older version of Kafka, replace `--bootstrap-server $KAFKABROKERS` with `--zookeeper $KAFKAZKHOSTS`.
325310

326311
4. Use __Ctrl + C__ to stop the consumer.
327312

0 commit comments

Comments
 (0)