You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This article walks you through setup in the [Azure portal](https://portal.azure.com), where you can create an HDInsight cluster using *Quick create* or *Custom*.
39
+
This article walks you through setup in the [Azure portal](https://portal.azure.com), where you can create an HDInsight cluster using *Quick create* or *Custom*.
Follow instructions on the screen to do a basic cluster setup. Details are provided below for:
44
44
45
45
*[Resource group name](#resource-group-name)
46
-
*[Cluster types and configuration](#cluster-types)
46
+
*[Cluster types and configuration](#cluster-types)
47
47
*[Cluster name](#cluster-name)
48
48
*[Cluster login and SSH username](#cluster-login-and-ssh-username)
49
49
*[Location](#location)
@@ -56,7 +56,7 @@ Follow instructions on the screen to do a basic cluster setup. Details are provi
56
56
Azure HDInsight currently provides the following cluster types, each with a set of components to provide certain functionalities.
57
57
58
58
> [!IMPORTANT]
59
-
> HDInsight clusters are available in various types, each for a single workload or technology. There is no supported method to create a cluster that combines multiple types, such as Storm and HBase on one cluster. If your solution requires technologies that are spread across multiple HDInsight cluster types, an [Azure virtual network](https://docs.microsoft.com/azure/virtual-network) can connect the required cluster types.
59
+
> HDInsight clusters are available in various types, each for a single workload or technology. There is no supported method to create a cluster that combines multiple types, such as Storm and HBase on one cluster. If your solution requires technologies that are spread across multiple HDInsight cluster types, an [Azure virtual network](https://docs.microsoft.com/azure/virtual-network) can connect the required cluster types.
60
60
61
61
| Cluster type | Functionality |
62
62
| --- | --- |
@@ -68,7 +68,6 @@ Azure HDInsight currently provides the following cluster types, each with a set
Choose the version of HDInsight for this cluster. For more information, see [Supported HDInsight versions](hdinsight-component-versioning.md#supported-hdinsight-versions).
74
73
@@ -121,16 +120,15 @@ For more information on storage options with HDInsight, see [Compare storage opt
121
120
> [!WARNING]
122
121
> Using an additional storage account in a different location from the HDInsight cluster is not supported.
123
122
124
-
125
123
During configuration, for the default storage endpoint you specify a blob container of an Azure Storage account or Data Lake Storage. The default storage contains application and system logs. Optionally, you can specify additional linked Azure Storage accounts and Data Lake Storage accounts that the cluster can access. The HDInsight cluster and the dependent storage accounts must be in the same Azure location.
You can create optional Hive or Apache Oozie metastores. However, not all cluster types support metastores, and Azure SQL Data Warehouse isn't compatible with metastores.
130
+
131
+
You can create optional Hive or Apache Oozie metastores. However, not all cluster types support metastores, and Azure SQL Data Warehouse isn't compatible with metastores.
134
132
135
133
For more information, see [Use external metadata stores in Azure HDInsight](./hdinsight-use-external-metadata-stores.md).
136
134
@@ -145,27 +143,26 @@ An HDInsight metastore that is created for one HDInsight cluster version cannot
145
143
146
144
### Oozie metastore
147
145
148
-
To increase performance when using Oozie, use a custom metastore. A metastore can also provide access to Oozie job data after you delete your cluster.
146
+
To increase performance when using Oozie, use a custom metastore. A metastore can also provide access to Oozie job data after you delete your cluster.
149
147
150
148
> [!IMPORTANT]
151
149
> You cannot reuse a custom Oozie metastore. To use a custom Oozie metastore, you must provide an empty Azure SQL Database when creating the HDInsight cluster.
152
150
153
-
154
151
## Custom cluster setup
155
152
Custom cluster setup builds on the Quick create settings, and adds the following options:
For Hadoop, Spark, HBase, Kafka, and Interactive Query cluster types, you can choose to enable the **Enterprise Security Package**. This package provides option to have a more secure cluster setup by using Apache Ranger and integrating with Azure Active Directory. For more information, see [Overview of enterprise security in Azure HDInsight](./domain-joined/hdinsight-security-overview.md).
For more information on creating domain-joined HDInsight cluster, see [Create domain-joined HDInsight sandbox environment](./domain-joined/apache-domain-joined-configure.md).
165
+
For more information on creating domain-joined HDInsight cluster, see [Create domain-joined HDInsight sandbox environment](./domain-joined/apache-domain-joined-configure.md).
169
166
170
167
## Install HDInsight applications on clusters
171
168
@@ -178,38 +175,40 @@ Most of the HDInsight applications are installed on an empty edge node. An empt
178
175
You are billed for node usage for as long as the cluster exists. Billing starts when a cluster is created and stops when the cluster is deleted. Clusters can’t be de-allocated or put on hold.
179
176
180
177
### Number of nodes for each cluster type
178
+
181
179
Each cluster type has its own number of nodes, terminology for nodes, and default VM size. In the following table, the number of nodes for each node type is in parentheses.
| HBase |Head server (2), region server (1+), master/ZooKeeper node (3) ||
| HBase |Head server (2), region server (1+), master/ZooKeeper node (3) ||
185
+
| Storm |Nimbus node (2), supervisor server (1+), ZooKeeper node (3) ||
186
+
| Spark |Head node (2), Worker node (1+), ZooKeeper node (3) (free for A1 ZooKeeper VM size) ||
189
187
190
188
For more information, see [Default node configuration and virtual machine sizes for clusters](hdinsight-component-versioning.md#default-node-configuration-and-virtual-machine-sizes-for-clusters) in "What are the Hadoop components and versions in HDInsight?"
191
189
192
-
The cost of HDInsight clusters is determined by the number of nodes and the virtual machines sizes for the nodes.
190
+
The cost of HDInsight clusters is determined by the number of nodes and the virtual machines sizes for the nodes.
193
191
194
192
Different cluster types have different node types, numbers of nodes, and node sizes:
195
-
* Hadoop cluster type default:
193
+
* Hadoop cluster type default:
196
194
* Two *head nodes*
197
195
* Four *Worker nodes*
198
-
* Storm cluster type default:
196
+
* Storm cluster type default:
199
197
* Two *Nimbus nodes*
200
198
* Three *ZooKeeper nodes*
201
-
* Four *supervisor nodes*
199
+
* Four *supervisor nodes*
202
200
203
201
If you are just trying out HDInsight, we recommend you use one Worker node. For more information about HDInsight pricing, see [HDInsight pricing](https://go.microsoft.com/fwLink/?LinkID=282635&clcid=0x409).
204
202
205
203
> [!NOTE]
206
204
> The cluster size limit varies among Azure subscriptions. Contact [Azure billing support](https://docs.microsoft.com/azure/azure-supportability/how-to-create-azure-support-request) to increase the limit.
207
205
208
-
When you use the Azure portal to configure the cluster, the node size is available through the **Node Pricing Tiers** blade. In the portal, you can also see the cost associated with the different node sizes.
206
+
When you use the Azure portal to configure the cluster, the node size is available through the **Node Pricing Tiers** blade. In the portal, you can also see the cost associated with the different node sizes.
209
207
210
-

208
+

211
209
212
210
### Virtual machine sizes
211
+
213
212
When you deploy clusters, choose compute resources based on the solution you plan to deploy. The following VMs are used for HDInsight clusters:
214
213
* A and D1-4 series VMs: [General-purpose Linux VM sizes](https://docs.microsoft.com/azure/virtual-machines/linux/sizes-general)
215
214
* D11-14 series VM: [Memory-optimized Linux VM sizes](https://docs.microsoft.com/azure/virtual-machines/linux/sizes-memory)
@@ -219,7 +218,7 @@ To find out what value you should use to specify a VM size while creating a clus
219
218
> [!IMPORTANT]
220
219
> If you need more than 32 Worker nodes in a cluster, you must select a head node size with at least 8 cores and 14 GB of RAM.
221
220
222
-
For more information, see [Sizes for virtual machines](../virtual-machines/windows/sizes.md). For information about pricing of the various sizes, see [HDInsight pricing](https://azure.microsoft.com/pricing/details/hdinsight).
221
+
For more information, see [Sizes for virtual machines](../virtual-machines/windows/sizes.md). For information about pricing of the various sizes, see [HDInsight pricing](https://azure.microsoft.com/pricing/details/hdinsight).
223
222
224
223
## Advanced settings: Script actions
225
224
@@ -253,13 +252,13 @@ Sometimes, you want to configure the following configuration files during the cr
253
252
For more information, see [Customize HDInsight clusters using Bootstrap](hdinsight-hadoop-customize-cluster-bootstrap.md).
254
253
255
254
## Advanced settings: Extend clusters with a virtual network
255
+
256
256
If your solution requires technologies that are spread across multiple HDInsight cluster types, an [Azure virtual network](https://docs.microsoft.com/azure/virtual-network) can connect the required cluster types. This configuration allows the clusters, and any code you deploy to them, to directly communicate with each other.
257
257
258
258
For more information on using an Azure virtual network with HDInsight, see [Plan a virtual network for HDInsight](hdinsight-plan-virtual-network-deployment.md).
259
259
260
260
For an example of using two cluster types within an Azure virtual network, see [Use Apache Spark Structured Streaming with Apache Kafka](hdinsight-apache-kafka-spark-structured-streaming.md). For more information about using HDInsight with a virtual network, including specific configuration requirements for the virtual network, see [Plan a virtual network for HDInsight](hdinsight-plan-virtual-network-deployment.md).
261
261
262
-
263
262
## Next steps
264
263
265
264
-[Troubleshoot cluster creation failures with Azure HDInsight](./hadoop/hdinsight-troubleshoot-cluster-creation-fails.md)
0 commit comments