You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/hdinsight/hdinsight-autoscale-clusters.md
+10-10Lines changed: 10 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -80,7 +80,7 @@ To enable the Autoscale feature with load-based scaling, complete the following
80
80
81
81
1. On the **Configuration + pricing** tab, select the **Enable autoscale** checkbox.
82
82
1. Select **Load-based** under **Autoscale type**.
83
-
1. Enter the intended values for the following properties:
83
+
1. Enter the intended values for the following properties:
84
84
85
85
* Initial **Number of nodes** for **Worker node**.
86
86
***Min** number of worker nodes.
@@ -115,16 +115,16 @@ Select the VM type for worker nodes by selecting a VM from the drop-down list un
115
115
116
116
Your subscription has a capacity quota for each region. The total number of cores of your head nodes and the maximum worker nodes can't exceed the capacity quota. However, this quota is a soft limit; you can always create a support ticket to get it increased easily.
117
117
118
-
> [!Note]
118
+
> [!Note]
119
119
> If you exceed the total core quota limit, You will receive an error message saying 'the maximum node exceeded the available cores in this region, please choose another region or contact the support to increase the quota.'
120
120
121
-
For more information on HDInsight cluster creation using the Azure portal, see [Create Linux-based clusters in HDInsight using the Azure portal](hdinsight-hadoop-create-linux-clusters-portal.md).
121
+
For more information on HDInsight cluster creation using the Azure portal, see [Create Linux-based clusters in HDInsight using the Azure portal](hdinsight-hadoop-create-linux-clusters-portal.md).
122
122
123
123
### Create a cluster with a Resource Manager template
124
124
125
125
#### Load-based autoscaling
126
126
127
-
You can create an HDInsight cluster with load-based Autoscaling an Azure Resource Manager template, by adding an `autoscale` node to the `computeProfile` > `workernode` section with the properties `minInstanceCount` and `maxInstanceCount` as shown in the json snippet below. For a complete Resource Manager template see [Quickstart template: Deploy Spark Cluster with load-based autoscale enabled](https://github.com/Azure/azure-quickstart-templates/tree/master/101-hdinsight-autoscale-loadbased).
127
+
You can create an HDInsight cluster with load-based Autoscaling an Azure Resource Manager template, by adding an `autoscale` node to the `computeProfile` > `workernode` section with the properties `minInstanceCount` and `maxInstanceCount` as shown in the json snippet below. For a complete Resource Manager template see [Quickstart template: Deploy Spark Cluster with load-based autoscale enabled](https://github.com/Azure/azure-quickstart-templates/tree/master/quickstarts/microsoft.hdinsight/hdinsight-autoscale-loadbased).
128
128
129
129
```json
130
130
{
@@ -152,7 +152,7 @@ You can create an HDInsight cluster with load-based Autoscaling an Azure Resourc
152
152
153
153
#### Schedule-based autoscaling
154
154
155
-
You can create an HDInsight cluster with schedule-based Autoscaling an Azure Resource Manager template, by adding an `autoscale` node to the `computeProfile` > `workernode` section. The `autoscale` node contains a `recurrence` that has a `timezone` and `schedule` that describes when the change will take place. For a complete Resource Manager template, see [Deploy Spark Cluster with schedule-based Autoscale Enabled](https://github.com/Azure/azure-quickstart-templates/tree/master/101-hdinsight-autoscale-schedulebased).
155
+
You can create an HDInsight cluster with schedule-based Autoscaling an Azure Resource Manager template, by adding an `autoscale` node to the `computeProfile` > `workernode` section. The `autoscale` node contains a `recurrence` that has a `timezone` and `schedule` that describes when the change will take place. For a complete Resource Manager template, see [Deploy Spark Cluster with schedule-based Autoscale Enabled](https://github.com/Azure/azure-quickstart-templates/tree/master/quickstarts/microsoft.hdinsight/hdinsight-autoscale-schedulebased).
156
156
157
157
```json
158
158
{
@@ -248,15 +248,15 @@ The running jobs will continue. The pending jobs will wait for scheduling with f
248
248
249
249
### Configure schedule-based Autoscale based on usage pattern
250
250
251
-
You need to understand your cluster usage pattern when you configure schedule based Autoscale. [Grafana dashboard](https://docs.microsoft.com/azure/hdinsight/interactive-query/hdinsight-grafana) can help you understand your query load and execution slots. You can get the available executor slots and total executor slots from the dashboard.
251
+
You need to understand your cluster usage pattern when you configure schedule based Autoscale. [Grafana dashboard](./interactive-query/hdinsight-grafana.md) can help you understand your query load and execution slots. You can get the available executor slots and total executor slots from the dashboard.
252
252
253
-
Here is a way you can estimate how many worker nodes will be needed. We recommend giving additional 10% buffer to handle the variation of the workload.
253
+
Here is a way you can estimate how many worker nodes will be needed. We recommend giving additional 10% buffer to handle the variation of the workload.
254
254
255
255
Number of executor slots actually used = Total executor slots – Total available executor slots.
256
256
257
257
Number of worker nodes required = Number of executor slots actually used / (hive.llap.daemon.num.executors + hive.llap.daemon.task.scheduler.wait.queue.size)
258
258
259
-
*hive.llap.daemon.num.executors is configurable and default is 4
259
+
*hive.llap.daemon.num.executors is configurable and default is 4
260
260
261
261
*hive.llap.daemon.task.scheduler.wait.queue.size is configurable and default is 10
262
262
@@ -267,13 +267,13 @@ Don't scale your cluster down to fewer than three nodes. Scaling your cluster to
267
267
268
268
### Increase the number of mappers and reducers
269
269
270
-
Autoscale for Hadoop clusters also monitors HDFS usage. If the HDFS is busy, it assumes the cluster still needs the current resources. When there is massive data involved in the query, you can increase the number of mappers and reducers to increase the parallelism and accelerate the HDFS operations. In this way, proper scaling down will be triggered when there are extra resources.
270
+
Autoscale for Hadoop clusters also monitors HDFS usage. If the HDFS is busy, it assumes the cluster still needs the current resources. When there is massive data involved in the query, you can increase the number of mappers and reducers to increase the parallelism and accelerate the HDFS operations. In this way, proper scaling down will be triggered when there are extra resources.
271
271
272
272
### Set the Hive configuration Maximum Total Concurrent Queries for the peak usage scenario
273
273
274
274
Autoscale events don't change the Hive configuration *Maximum Total Concurrent Queries* in Ambari. This means that the Hive Server 2 Interactive Service can handle only the given number of concurrent queries at any point of time even if the Interactive Query daemons count are scaled up and down based on load and schedule. The general recommendation is to set this configuration for the peak usage scenario to avoid manual intervention.
275
275
276
-
However, you may experience a Hive Server 2 restart failure if there are only a small number of worker nodes and the value for maximum total concurrent queries is configured too high. At a minimum, you need the minimum number of worker nodes that can accommodate the given number of Tez Ams (equal to the Maximum Total Concurrent Queries configuration).
276
+
However, you may experience a Hive Server 2 restart failure if there are only a small number of worker nodes and the value for maximum total concurrent queries is configured too high. At a minimum, you need the minimum number of worker nodes that can accommodate the given number of Tez Ams (equal to the Maximum Total Concurrent Queries configuration).
0 commit comments