Skip to content

Commit 18df271

Browse files
authored
Merge pull request #88970 from dagiro/cats130
cats130
2 parents 06b152a + 484a28c commit 18df271

File tree

1 file changed

+21
-22
lines changed

1 file changed

+21
-22
lines changed

articles/hdinsight/hdinsight-hadoop-provision-linux-clusters.md

Lines changed: 21 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -36,14 +36,14 @@ The following table shows the different methods you can use to set up an HDInsig
3636
| [Azure Resource Manager templates](hdinsight-hadoop-create-linux-clusters-arm-templates.md) |  ||  |  |
3737

3838
## Quick create: Basic cluster setup
39-
This article walks you through setup in the [Azure portal](https://portal.azure.com), where you can create an HDInsight cluster using *Quick create* or *Custom*.
39+
This article walks you through setup in the [Azure portal](https://portal.azure.com), where you can create an HDInsight cluster using *Quick create* or *Custom*.
4040

4141
![hdinsight create options custom quick create](./media/hdinsight-hadoop-provision-linux-clusters/hdinsight-creation-options.png)
4242

4343
Follow instructions on the screen to do a basic cluster setup. Details are provided below for:
4444

4545
* [Resource group name](#resource-group-name)
46-
* [Cluster types and configuration](#cluster-types)
46+
* [Cluster types and configuration](#cluster-types)
4747
* [Cluster name](#cluster-name)
4848
* [Cluster login and SSH username](#cluster-login-and-ssh-username)
4949
* [Location](#location)
@@ -56,7 +56,7 @@ Follow instructions on the screen to do a basic cluster setup. Details are provi
5656
Azure HDInsight currently provides the following cluster types, each with a set of components to provide certain functionalities.
5757

5858
> [!IMPORTANT]
59-
> HDInsight clusters are available in various types, each for a single workload or technology. There is no supported method to create a cluster that combines multiple types, such as Storm and HBase on one cluster. If your solution requires technologies that are spread across multiple HDInsight cluster types, an [Azure virtual network](https://docs.microsoft.com/azure/virtual-network) can connect the required cluster types.
59+
> HDInsight clusters are available in various types, each for a single workload or technology. There is no supported method to create a cluster that combines multiple types, such as Storm and HBase on one cluster. If your solution requires technologies that are spread across multiple HDInsight cluster types, an [Azure virtual network](https://docs.microsoft.com/azure/virtual-network) can connect the required cluster types.
6060
6161
| Cluster type | Functionality |
6262
| --- | --- |
@@ -68,7 +68,6 @@ Azure HDInsight currently provides the following cluster types, each with a set
6868
| [Spark](spark/apache-spark-overview.md) |In-memory processing, interactive queries, micro-batch stream processing |
6969
| [Storm](storm/apache-storm-overview.md) |Real-time event processing |
7070

71-
7271
### HDInsight version
7372
Choose the version of HDInsight for this cluster. For more information, see [Supported HDInsight versions](hdinsight-component-versioning.md#supported-hdinsight-versions).
7473

@@ -121,16 +120,15 @@ For more information on storage options with HDInsight, see [Compare storage opt
121120
> [!WARNING]
122121
> Using an additional storage account in a different location from the HDInsight cluster is not supported.
123122
124-
125123
During configuration, for the default storage endpoint you specify a blob container of an Azure Storage account or Data Lake Storage. The default storage contains application and system logs. Optionally, you can specify additional linked Azure Storage accounts and Data Lake Storage accounts that the cluster can access. The HDInsight cluster and the dependent storage accounts must be in the same Azure location.
126124

127125
![Cluster storage settings: HDFS-compatible storage endpoints](./media/hdinsight-hadoop-provision-linux-clusters/hdinsight-cluster-creation-storage.png)
128126

129127
[!INCLUDE [secure-transfer-enabled-storage-account](../../includes/hdinsight-secure-transfer.md)]
130128

131-
132129
### Optional metastores
133-
You can create optional Hive or Apache Oozie metastores. However, not all cluster types support metastores, and Azure SQL Data Warehouse isn't compatible with metastores.
130+
131+
You can create optional Hive or Apache Oozie metastores. However, not all cluster types support metastores, and Azure SQL Data Warehouse isn't compatible with metastores.
134132

135133
For more information, see [Use external metadata stores in Azure HDInsight](./hdinsight-use-external-metadata-stores.md).
136134

@@ -145,27 +143,26 @@ An HDInsight metastore that is created for one HDInsight cluster version cannot
145143

146144
### Oozie metastore
147145

148-
To increase performance when using Oozie, use a custom metastore. A metastore can also provide access to Oozie job data after you delete your cluster.
146+
To increase performance when using Oozie, use a custom metastore. A metastore can also provide access to Oozie job data after you delete your cluster.
149147

150148
> [!IMPORTANT]
151149
> You cannot reuse a custom Oozie metastore. To use a custom Oozie metastore, you must provide an empty Azure SQL Database when creating the HDInsight cluster.
152150
153-
154151
## Custom cluster setup
155152
Custom cluster setup builds on the Quick create settings, and adds the following options:
156153
- [Enterprise security package](#enterprise-security-package)
157154
- [HDInsight applications](#install-hdinsight-applications-on-clusters)
158155
- [Cluster size](#configure-cluster-size)
159156
- [Script actions](#advanced-settings-script-actions)
160157
- [Virtual network](#advanced-settings-extend-clusters-with-a-virtual-network)
161-
158+
162159
## Enterprise security package
163160

164161
For Hadoop, Spark, HBase, Kafka, and Interactive Query cluster types, you can choose to enable the **Enterprise Security Package**. This package provides option to have a more secure cluster setup by using Apache Ranger and integrating with Azure Active Directory. For more information, see [Overview of enterprise security in Azure HDInsight](./domain-joined/hdinsight-security-overview.md).
165162

166163
![hdinsight create options choose enterprise security package](./media/hdinsight-hadoop-provision-linux-clusters/hdinsight-creation-enterprise-security-package.png)
167164

168-
For more information on creating domain-joined HDInsight cluster, see [Create domain-joined HDInsight sandbox environment](./domain-joined/apache-domain-joined-configure.md).
165+
For more information on creating domain-joined HDInsight cluster, see [Create domain-joined HDInsight sandbox environment](./domain-joined/apache-domain-joined-configure.md).
169166

170167
## Install HDInsight applications on clusters
171168

@@ -178,38 +175,40 @@ Most of the HDInsight applications are installed on an empty edge node. An empt
178175
You are billed for node usage for as long as the cluster exists. Billing starts when a cluster is created and stops when the cluster is deleted. Clusters can’t be de-allocated or put on hold.
179176

180177
### Number of nodes for each cluster type
178+
181179
Each cluster type has its own number of nodes, terminology for nodes, and default VM size. In the following table, the number of nodes for each node type is in parentheses.
182180

183181
| Type | Nodes | Diagram |
184182
| --- | --- | --- |
185183
| Hadoop |Head node (2), Worker node (1+) |![HDInsight Hadoop cluster nodes](./media/hdinsight-hadoop-provision-linux-clusters/hdinsight-hadoop-cluster-type-nodes.png) |
186-
| HBase |Head server (2), region server (1+), master/ZooKeeper node (3) |![HDInsight HBase cluster nodes](./media/hdinsight-hadoop-provision-linux-clusters/hdinsight-hbase-cluster-type-setup.png) |
187-
| Storm |Nimbus node (2), supervisor server (1+), ZooKeeper node (3) |![HDInsight Storm cluster nodes](./media/hdinsight-hadoop-provision-linux-clusters/hdinsight-storm-cluster-type-setup.png) |
188-
| Spark |Head node (2), Worker node (1+), ZooKeeper node (3) (free for A1 ZooKeeper VM size) |![HDInsight Spark cluster nodes](./media/hdinsight-hadoop-provision-linux-clusters/hdinsight-spark-cluster-type-setup.png) |
184+
| HBase |Head server (2), region server (1+), master/ZooKeeper node (3) |![HDInsight HBase cluster type setup](./media/hdinsight-hadoop-provision-linux-clusters/hdinsight-hbase-cluster-type-setup.png) |
185+
| Storm |Nimbus node (2), supervisor server (1+), ZooKeeper node (3) |![HDInsight storm cluster type setup](./media/hdinsight-hadoop-provision-linux-clusters/hdinsight-storm-cluster-type-setup.png) |
186+
| Spark |Head node (2), Worker node (1+), ZooKeeper node (3) (free for A1 ZooKeeper VM size) |![HDInsight spark cluster type setup](./media/hdinsight-hadoop-provision-linux-clusters/hdinsight-spark-cluster-type-setup.png) |
189187

190188
For more information, see [Default node configuration and virtual machine sizes for clusters](hdinsight-component-versioning.md#default-node-configuration-and-virtual-machine-sizes-for-clusters) in "What are the Hadoop components and versions in HDInsight?"
191189

192-
The cost of HDInsight clusters is determined by the number of nodes and the virtual machines sizes for the nodes.
190+
The cost of HDInsight clusters is determined by the number of nodes and the virtual machines sizes for the nodes.
193191

194192
Different cluster types have different node types, numbers of nodes, and node sizes:
195-
* Hadoop cluster type default:
193+
* Hadoop cluster type default:
196194
* Two *head nodes*
197195
* Four *Worker nodes*
198-
* Storm cluster type default:
196+
* Storm cluster type default:
199197
* Two *Nimbus nodes*
200198
* Three *ZooKeeper nodes*
201-
* Four *supervisor nodes*
199+
* Four *supervisor nodes*
202200

203201
If you are just trying out HDInsight, we recommend you use one Worker node. For more information about HDInsight pricing, see [HDInsight pricing](https://go.microsoft.com/fwLink/?LinkID=282635&clcid=0x409).
204202

205203
> [!NOTE]
206204
> The cluster size limit varies among Azure subscriptions. Contact [Azure billing support](https://docs.microsoft.com/azure/azure-supportability/how-to-create-azure-support-request) to increase the limit.
207205
208-
When you use the Azure portal to configure the cluster, the node size is available through the **Node Pricing Tiers** blade. In the portal, you can also see the cost associated with the different node sizes.
206+
When you use the Azure portal to configure the cluster, the node size is available through the **Node Pricing Tiers** blade. In the portal, you can also see the cost associated with the different node sizes.
209207

210-
![HDInsight VM node sizes](./media/hdinsight-hadoop-provision-linux-clusters/hdinsight-node-sizes.png)
208+
![HDInsight choose your node size](./media/hdinsight-hadoop-provision-linux-clusters/hdinsight-node-sizes.png)
211209

212210
### Virtual machine sizes
211+
213212
When you deploy clusters, choose compute resources based on the solution you plan to deploy. The following VMs are used for HDInsight clusters:
214213
* A and D1-4 series VMs: [General-purpose Linux VM sizes](https://docs.microsoft.com/azure/virtual-machines/linux/sizes-general)
215214
* D11-14 series VM: [Memory-optimized Linux VM sizes](https://docs.microsoft.com/azure/virtual-machines/linux/sizes-memory)
@@ -219,7 +218,7 @@ To find out what value you should use to specify a VM size while creating a clus
219218
> [!IMPORTANT]
220219
> If you need more than 32 Worker nodes in a cluster, you must select a head node size with at least 8 cores and 14 GB of RAM.
221220
222-
For more information, see [Sizes for virtual machines](../virtual-machines/windows/sizes.md). For information about pricing of the various sizes, see [HDInsight pricing](https://azure.microsoft.com/pricing/details/hdinsight).
221+
For more information, see [Sizes for virtual machines](../virtual-machines/windows/sizes.md). For information about pricing of the various sizes, see [HDInsight pricing](https://azure.microsoft.com/pricing/details/hdinsight).
223222

224223
## Advanced settings: Script actions
225224

@@ -253,13 +252,13 @@ Sometimes, you want to configure the following configuration files during the cr
253252
For more information, see [Customize HDInsight clusters using Bootstrap](hdinsight-hadoop-customize-cluster-bootstrap.md).
254253

255254
## Advanced settings: Extend clusters with a virtual network
255+
256256
If your solution requires technologies that are spread across multiple HDInsight cluster types, an [Azure virtual network](https://docs.microsoft.com/azure/virtual-network) can connect the required cluster types. This configuration allows the clusters, and any code you deploy to them, to directly communicate with each other.
257257

258258
For more information on using an Azure virtual network with HDInsight, see [Plan a virtual network for HDInsight](hdinsight-plan-virtual-network-deployment.md).
259259

260260
For an example of using two cluster types within an Azure virtual network, see [Use Apache Spark Structured Streaming with Apache Kafka](hdinsight-apache-kafka-spark-structured-streaming.md). For more information about using HDInsight with a virtual network, including specific configuration requirements for the virtual network, see [Plan a virtual network for HDInsight](hdinsight-plan-virtual-network-deployment.md).
261261

262-
263262
## Next steps
264263

265264
- [Troubleshoot cluster creation failures with Azure HDInsight](./hadoop/hdinsight-troubleshoot-cluster-creation-fails.md)

0 commit comments

Comments
 (0)