You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/machine-learning/how-to-autoscale-endpoints.md
+36-30Lines changed: 36 additions & 30 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,7 +9,7 @@ author: msakande
9
9
ms.author: mopeakande
10
10
ms.reviewer: sehan
11
11
ms.custom: devplatv2, cliv2, update-code
12
-
ms.date: 07/29/2024
12
+
ms.date: 08/07/2024
13
13
14
14
#customer intent: As a developer, I want to autoscale online endpoints in Azure Machine Learning so I can control resource usage in my deployment based on metrics or schedules.
The autoscale process lets you automatically run the right amount of resources to handle the load on your application. [Online endpoints](concept-endpoints.md) in Azure Machine Learning support autoscaling through integration with the Azure Monitor autoscale feature.
21
+
In this article, you learn to manage resource usage in a deployment by configuring autoscaling based on metrics and schedules. The autoscale process lets you automatically run the right amount of resources to handle the load on your application. [Online endpoints](concept-endpoints.md) in Azure Machine Learning support autoscaling through integration with the autoscale feature in Azure Monitor.
22
22
23
-
Azure Monitor autoscaling provides a rich set of rules. You can configure metrics-based scaling (such as CPU utilization greater than 70%), schedule-based scaling (such as scaling rules for peak business hours), or a combination. For more information, see [Overview of autoscale in Microsoft Azure](../azure-monitor/autoscale/autoscale-overview.md).
23
+
Azure Monitor autoscale allows you to set rules that trigger one or more autoscale actions when conditions of the rules are met. You can configure metrics-based scaling (such as CPU utilization greater than 70%), schedule-based scaling (such as scaling rules for peak business hours), or a combination of the two. For more information, see [Overview of autoscale in Microsoft Azure](../azure-monitor/autoscale/autoscale-overview.md).
24
24
25
25
:::image type="content" source="media/how-to-autoscale-endpoints/concept-autoscale.png" border="false" alt-text="Diagram that shows how autoscale adds and removes instances as needed.":::
26
26
27
-
You can currently manage autoscaling by using the Azure CLI, the REST APIs, Azure Resource Manager, or the browser-based Azure portal.
27
+
You can currently manage autoscaling by using the Azure CLI, the REST APIs, Azure Resource Manager, the Python SDK, or the browser-based Azure portal.
28
28
29
29
## Prerequisites
30
30
31
31
- A deployed endpoint. For more information, see [Deploy and score a machine learning model by using an online endpoint](how-to-deploy-online-endpoints.md).
32
32
33
33
- To use autoscale, the role `microsoft.insights/autoscalesettings/write` must be assigned to the identity that manages autoscale. You can use any built-in or custom roles that allow this action. For general guidance on managing roles for Azure Machine Learning, see [Manage users and roles](how-to-assign-roles.md). For more on autoscale settings from Azure Monitor, see [Microsoft.Insights autoscalesettings](/azure/templates/microsoft.insights/autoscalesettings).
34
34
35
+
- To use the Python SDK to manage the Azure Monitor service, install the `azure-mgmt-monitor` package with the following command:
36
+
37
+
```console
38
+
pip install azure-mgmt-monitor
39
+
```
40
+
35
41
## Define autoscale profile
36
42
37
-
To enable autoscale for a Machine Learning endpoint, you first define an autoscale profile. The profile specifies the default, minimum, and maximum scale set capacity. The following example shows how to set the number of virtual machine (VM) instances for the default, minimum, and maximum scale capacity.
43
+
To enable autoscale for an online endpoint, you first define an autoscale profile. The profile specifies the default, minimum, and maximum scale set capacity. The following example shows how to set the number of virtual machine (VM) instances for the default, minimum, and maximum scale capacity.
@@ -160,11 +166,11 @@ Leave the configuration pane open. In the next section, you configure the __Rule
160
166
161
167
---
162
168
163
-
## Create scaleout rule with deployment metrics
169
+
## Create scale-out rule based on deployment metrics
164
170
165
-
A common scaling out rule increases the number of VM instances when the average CPU load is high. The following example shows how to allocate two more nodes (up to the maximum) if the CPU average load is greater than 70%for5 minutes:
171
+
A common scale-out rule is to increase the number of VM instances when the average CPU load is high. The following example shows how to allocate two more nodes (up to the maximum) if the CPU average load is greater than 70%for5 minutes:
@@ -232,7 +238,7 @@ The rule is part of the `my-scale-settings` profile, where `autoscale-name` matc
232
238
233
239
# [Studio](#tab/azure-studio)
234
240
235
-
The following steps continuewith the autoscaling configuration.
241
+
The following steps continuewith the autoscale configuration.
236
242
237
243
1. For the __Rules__ option, select the __Add a rule__ link. The __Scale rule__ page opens.
238
244
@@ -254,17 +260,17 @@ Leave the configuration pane open. In the next section, you adjust the __Rules__
254
260
255
261
---
256
262
257
-
## Create scalein rule with deployment metrics
263
+
## Create scale-in rule based on deployment metrics
258
264
259
-
When the average CPU load is light, a scalein rule can reduce the number of VM instances. The following example shows how to release a single node, down to a minimum of two, if the CPU load is less than 30%for5 minutes.
265
+
When the average CPU load is light, a scale-in rule can reduce the number of VM instances. The following example shows how to release a single node down to a minimum of two, if the CPU load is less than 30%for5 minutes.
@@ -338,7 +344,7 @@ The following steps adjust the __Rules__ configuration to support a scale in rul
338
344
339
345
:::image type="content" source="media/how-to-autoscale-endpoints/scale-in-rule.png" lightbox="media/how-to-autoscale-endpoints/scale-in-rule.png" alt-text="Screenshot that shows how to configure the scale in rule for less than 30% CPU for 5 minutes.":::
340
346
341
-
If you configure both scale-out and scalein rules, your rules look similar to the following screenshot. The rules specify that if average CPU load exceeds 70%for5 minutes, two more nodes should be allocated, up to the limit of five. If CPU load is less than 30%for5 minutes, a single node should be released, down to the minimum of two.
347
+
If you configure both scale-out and scale-in rules, your rules look similar to the following screenshot. The rules specify that if average CPU load exceeds 70%for5 minutes, two more nodes should be allocated, up to the limit of five. If CPU load is less than 30%for5 minutes, a single node should be released, down to the minimum of two.
342
348
343
349
:::image type="content" source="media/how-to-autoscale-endpoints/autoscale-rules-final.png" lightbox="media/how-to-autoscale-endpoints/autoscale-rules-final.png" alt-text="Screenshot that shows the autoscale settings including the scale in and scale-out rules.":::
344
350
@@ -348,15 +354,15 @@ Leave the configuration pane open. In the next section, you specify other scale
348
354
349
355
## Create scale rule based on endpoint metrics
350
356
351
-
In the previous examples, you created rules to scale inor out based on deployment metrics. You can also create a rule that applies to the deployment endpoint. The following example shows how to allocate another node when the request latency is greater than an average of 70 milliseconds for5 minutes.
357
+
In the previous sections, you created rules to scale inor out based on deployment metrics. You can also create a rule that applies to the deployment endpoint. In this section, you learn how to allocate another node when the request latency is greater than an average of 70 milliseconds for5 minutes.
@@ -440,21 +446,21 @@ The following steps continue the rule configuration on the __Custom autoscale__
440
446
441
447
---
442
448
443
-
## Find supported Metrics IDs
449
+
## Find IDs for supported metrics
444
450
445
451
If you want to use other metrics in code to set up autoscale rules by using the Azure CLIor the SDK, see the table in [Available metrics](how-to-monitor-online-endpoints.md#available-metrics).
446
452
447
453
## Create scale rule based on schedule
448
454
449
-
You can also create rules that apply only on certain days or at certain times. The following example creates a rule that sets the node count to 2 on the weekend.
455
+
You can also create rules that apply only on certain days or at certain times. In this section, you create a rule that sets the node count to 2 on the weekends.
0 commit comments