Skip to content

Commit 681279a

Browse files
authored
Merge pull request #90692 from laurenhughes/node-deallocation
Update NodedDeallocationOption
2 parents 4428110 + 0e310a6 commit 681279a

File tree

1 file changed

+26
-7
lines changed

1 file changed

+26
-7
lines changed

articles/batch/batch-automatic-scaling.md

Lines changed: 26 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -12,14 +12,14 @@ ms.service: batch
1212
ms.topic: article
1313
ms.tgt_pltfrm:
1414
ms.workload: multiple
15-
ms.date: 06/20/2017
15+
ms.date: 10/08/2019
1616
ms.author: lahugh
1717
ms.custom: H1Hack27Feb2017
1818

1919
---
20-
# Create an automatic scaling formula for scaling compute nodes in a Batch pool
20+
# Create an automatic formula for scaling compute nodes in a Batch pool
2121

22-
Azure Batch can automatically scale pools based on parameters that you define. With automatic scaling, Batch dynamically adds nodes to a pool as task demands increase, and removes compute nodes as they decrease. You can save both time and money by automatically adjusting the number of compute nodes used by your Batch application.
22+
Azure Batch can automatically scale pools based on parameters that you define. With automatic scaling, Batch dynamically adds nodes to a pool as task demands increase, and removes compute nodes as they decrease. You can save both time and money by automatically adjusting the number of compute nodes used by your Batch application.
2323

2424
You enable automatic scaling on a pool of compute nodes by associating with it an *autoscale formula* that you define. The Batch service uses the autoscale formula to determine the number of compute nodes that are needed to execute your workload. Compute nodes may be dedicated nodes or [low-priority nodes](batch-low-pri-vms.md). Batch responds to service metrics data that is collected periodically. Using this metrics data, Batch adjusts the number of compute nodes in the pool based on your formula and at a configurable interval.
2525

@@ -35,6 +35,7 @@ This article discusses the various entities that make up your autoscale formulas
3535
>
3636
3737
## Automatic scaling formulas
38+
3839
An automatic scaling formula is a string value that you define that contains one or more statements. The autoscale formula is assigned to a pool's [autoScaleFormula][rest_autoscaleformula] element (Batch REST) or [CloudPool.AutoScaleFormula][net_cloudpool_autoscaleformula] property (Batch .NET). The Batch service uses your formula to determine the target number of compute nodes in the pool for the next interval of processing. The formula string cannot exceed 8 KB, can include up to 100 statements that are separated by semicolons, and can include line breaks and comments.
3940

4041
You can think of automatic scaling formulas as a Batch autoscale "language." Formula statements are free-formed expressions that can include both service-defined variables (variables defined by the Batch service) and user-defined variables (variables that you define). They can perform various operations on these values by using built-in types, operators, and functions. For example, a statement might take the following form:
@@ -59,28 +60,33 @@ The target number of nodes may be higher, lower, or the same as the current numb
5960
Below are examples of two autoscale formulas, which can be adjusted to work for most scenarios. The variables `startingNumberOfVMs` and `maxNumberofVMs` in the example formulas can be adjusted to your needs.
6061

6162
#### Pending tasks
63+
6264
```
6365
startingNumberOfVMs = 1;
6466
maxNumberofVMs = 25;
6567
pendingTaskSamplePercent = $PendingTasks.GetSamplePercent(180 * TimeInterval_Second);
6668
pendingTaskSamples = pendingTaskSamplePercent < 70 ? startingNumberOfVMs : avg($PendingTasks.GetSample(180 * TimeInterval_Second));
6769
$TargetDedicatedNodes=min(maxNumberofVMs, pendingTaskSamples);
70+
$NodeDeallocationOption = taskcompletion;
6871
```
6972

7073
With this autoscale formula, the pool is initially created with a single VM. The `$PendingTasks` metric defines the number of tasks that are running or queued. The formula finds the average number of pending tasks in the last 180 seconds and sets the `$TargetDedicatedNodes` variable accordingly. The formula ensures that the target number of dedicated nodes never exceeds 25 VMs. As new tasks are submitted, the pool automatically grows. As tasks complete, VMs become free one by one and the autoscaling formula shrinks the pool.
7174

7275
This formula scales dedicated nodes, but can be modified to apply to scale low-priority nodes as well.
7376

7477
#### Preempted nodes
78+
7579
```
7680
maxNumberofVMs = 25;
7781
$TargetDedicatedNodes = min(maxNumberofVMs, $PreemptedNodeCount.GetSample(180 * TimeInterval_Second));
7882
$TargetLowPriorityNodes = min(maxNumberofVMs , maxNumberofVMs - $TargetDedicatedNodes);
83+
$NodeDeallocationOption = taskcompletion;
7984
```
8085

8186
This example creates a pool that starts with 25 low-priority nodes. Every time a low-priority node is preempted, it is replaced with a dedicated node. As with the first example, the `maxNumberofVMs` variable prevents the pool from exceeding 25 VMs. This example is useful for taking advantage of low-priority VMs while also ensuring that only a fixed number of preemptions will occur for the lifetime of the pool.
8287

8388
## Variables
89+
8490
You can use both **service-defined** and **user-defined** variables in your autoscale formulas. The service-defined variables are built in to the Batch service. Some service-defined variables are read-write, and some are read-only. User-defined variables are variables that you define. In the example formula shown in the previous section, `$TargetDedicatedNodes` and `$PendingTasks` are service-defined variables. Variables `startingNumberOfVMs` and `maxNumberofVMs` are user-defined variables.
8591

8692
> [!NOTE]
@@ -96,7 +102,7 @@ You can get and set the values of these service-defined variables to manage the
96102
| --- | --- |
97103
| $TargetDedicatedNodes |The target number of dedicated compute nodes for the pool. The number of dedicated nodes is specified as a target because a pool may not always achieve the desired number of nodes. For example, if the target number of dedicated nodes is modified by an autoscale evaluation before the pool has reached the initial target, then the pool may not reach the target. <br /><br /> A pool in an account created with the Batch Service configuration may not achieve its target if the target exceeds a Batch account node or core quota. A pool in an account created with the User Subscription configuration may not achieve its target if the target exceeds the shared core quota for the subscription.|
98104
| $TargetLowPriorityNodes |The target number of low-priority compute nodes for the pool. The number of low-priority nodes is specified as a target because a pool may not always achieve the desired number of nodes. For example, if the target number of low-priority nodes is modified by an autoscale evaluation before the pool has reached the initial target, then the pool may not reach the target. A pool may also not achieve its target if the target exceeds a Batch account node or core quota. <br /><br /> For more information on low-priority compute nodes, see [Use low-priority VMs with Batch (Preview)](batch-low-pri-vms.md). |
99-
| $NodeDeallocationOption |The action that occurs when compute nodes are removed from a pool. Possible values are:<ul><li>**requeue**--Terminates tasks immediately and puts them back on the job queue so that they are rescheduled.<li>**terminate**--Terminates tasks immediately and removes them from the job queue.<li>**taskcompletion**--Waits for currently running tasks to finish and then removes the node from the pool.<li>**retaineddata**--Waits for all the local task-retained data on the node to be cleaned up before removing the node from the pool.</ul> |
105+
| $NodeDeallocationOption |The action that occurs when compute nodes are removed from a pool. Possible values are:<ul><li>**requeue**-- The default value. Terminates tasks immediately and puts them back on the job queue so that they are rescheduled. This action ensures the target number of nodes is reach as quickly as possible, but may be less efficient, as any running tasks will be interrupted and have to be restarted, wasting any work they had already done. <li>**terminate**--Terminates tasks immediately and removes them from the job queue.<li>**taskcompletion**--Waits for currently running tasks to finish and then removes the node from the pool. Use this option to avoid tasks being interrupted and requeued, wasting any work the task has done. <li>**retaineddata**--Waits for all the local task-retained data on the node to be cleaned up before removing the node from the pool.</ul> |
100106

101107
You can get the value of these service-defined variables to make adjustments that are based on metrics from the Batch service:
102108

@@ -128,6 +134,7 @@ You can get the value of these service-defined variables to make adjustments tha
128134
>
129135
130136
## Types
137+
131138
These types are supported in a formula:
132139

133140
* double
@@ -157,6 +164,7 @@ These types are supported in a formula:
157164
* TimeInterval_Year
158165

159166
## Operations
167+
160168
These operations are allowed on the types that are listed in the previous section.
161169

162170
| Operation | Supported operators | Result type |
@@ -212,6 +220,7 @@ Some of the functions that are described in the previous table can accept a list
212220
The *doubleVecList* value is converted to a single *doubleVec* before evaluation. For example, if `v = [1,2,3]`, then calling `avg(v)` is equivalent to calling `avg(1,2,3)`. Calling `avg(v, 7)` is equivalent to calling `avg(1,2,3,7)`.
213221

214222
## <a name="getsampledata"></a>Obtain sample data
223+
215224
Autoscale formulas act on metrics data (samples) that is provided by the Batch service. A formula grows or shrinks pool size based on the values that it obtains from the service. The service-defined variables that were described previously are objects that provide various methods to access data that is associated with that object. For example, the following expression shows a request to get the last five minutes of CPU usage:
216225

217226
```
@@ -271,6 +280,7 @@ Because there may be a delay in sample availability, it is important to always s
271280
>
272281
273282
## Metrics
283+
274284
You can use both resource and task metrics when you're defining a formula. You adjust the target number of dedicated nodes in the pool based on the metrics data that you obtain and evaluate. See the [Variables](#variables) section above for more information on each metric.
275285

276286
<table>
@@ -317,13 +327,15 @@ You can use both resource and task metrics when you're defining a formula. You a
317327
</table>
318328

319329
## Write an autoscale formula
330+
320331
You build an autoscale formula by forming statements that use the above components, then combine those statements into a complete formula. In this section, we create an example autoscale formula that can perform some real-world scaling decisions.
321332

322333
First, let's define the requirements for our new autoscale formula. The formula should:
323334

324335
1. Increase the target number of dedicated compute nodes in a pool if CPU usage is high.
325-
2. Decrease the target number of dedicated compute nodes in a pool when CPU usage is low.
326-
3. Always restrict the maximum number of dedicated nodes to 400.
336+
1. Decrease the target number of dedicated compute nodes in a pool when CPU usage is low.
337+
1. Always restrict the maximum number of dedicated nodes to 400.
338+
1. When reducing the number of nodes, do not remove nodes that are running tasks; if necessary, wait until tasks have finished to remove nodes.
327339

328340
To increase the number of nodes during high CPU usage, define the statement that populates a user-defined variable (`$totalDedicatedNodes`) with a value that is 110 percent of the current target number of dedicated nodes, but only if the minimum average CPU usage during the last 10 minutes was above 70 percent. Otherwise, use the value for the current number of dedicated nodes.
329341

@@ -630,9 +642,11 @@ Error:
630642
```
631643

632644
## Example autoscale formulas
645+
633646
Let's look at a few formulas that show different ways to adjust the amount of compute resources in a pool.
634647

635648
### Example 1: Time-based adjustment
649+
636650
Suppose you want to adjust the pool size based on the day of the week and time of day. This example shows how to increase or decrease the number of nodes in the pool accordingly.
637651

638652
The formula first obtains the current time. If it's a weekday (1-5) and within working hours (8 AM to 6 PM), the target pool size is set to 20 nodes. Otherwise, it's set to 10 nodes.
@@ -643,9 +657,11 @@ $workHours = $curTime.hour >= 8 && $curTime.hour < 18;
643657
$isWeekday = $curTime.weekday >= 1 && $curTime.weekday <= 5;
644658
$isWorkingWeekdayHour = $workHours && $isWeekday;
645659
$TargetDedicatedNodes = $isWorkingWeekdayHour ? 20:10;
660+
$NodeDeallocationOption = taskcompletion;
646661
```
647662

648663
### Example 2: Task-based adjustment
664+
649665
In this example, the pool size is adjusted based on the number of tasks in the queue. Both comments and line breaks are acceptable in formula strings.
650666

651667
```csharp
@@ -660,11 +676,12 @@ $targetVMs = $tasks > 0? $tasks:max(0, $TargetDedicatedNodes/2);
660676
// The pool size is capped at 20, if target VM value is more than that, set it
661677
// to 20. This value should be adjusted according to your use case.
662678
$TargetDedicatedNodes = max(0, min($targetVMs, 20));
663-
// Set node deallocation mode - keep nodes active only until tasks finish
679+
// Set node deallocation mode - let running tasks finish before removing a node
664680
$NodeDeallocationOption = taskcompletion;
665681
```
666682

667683
### Example 3: Accounting for parallel tasks
684+
668685
This example adjusts the pool size based on the number of tasks. This formula also takes into account the [MaxTasksPerComputeNode][net_maxtasks] value that has been set for the pool. This approach is useful in situations where [parallel task execution](batch-parallel-node-tasks.md) has been enabled on your pool.
669686

670687
```csharp
@@ -686,6 +703,7 @@ $NodeDeallocationOption = taskcompletion;
686703
```
687704

688705
### Example 4: Setting an initial pool size
706+
689707
This example shows a C# code snippet with an autoscale formula that sets the pool size to a specified number of nodes for an initial time period. Then it adjusts the pool size based on the number of running and active tasks after the initial time period has elapsed.
690708

691709
The formula in the following code snippet:
@@ -710,6 +728,7 @@ string formula = string.Format(@"
710728
```
711729

712730
## Next steps
731+
713732
* [Maximize Azure Batch compute resource usage with concurrent node tasks](batch-parallel-node-tasks.md) contains details about how you can execute multiple tasks simultaneously on the compute nodes in your pool. In addition to autoscaling, this feature may help to lower job duration for some workloads, saving you money.
714733
* For another efficiency booster, ensure that your Batch application queries the Batch service in the most optimal way. See [Query the Azure Batch service efficiently](batch-efficient-list-queries.md) to learn how to limit the amount of data that crosses the wire when you query the status of potentially thousands of compute nodes or tasks.
715734

0 commit comments

Comments
 (0)