You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/machine-learning/how-to-autoscale-endpoints.md
+6-2Lines changed: 6 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -150,7 +150,7 @@ Under __Choose how to scale your resources__, select __Custom autoscale__ to beg
150
150
151
151
---
152
152
153
-
## Create a rule to scale out using metrics
153
+
## Create a rule to scale out using deployment metrics
154
154
155
155
A common scaling out rule is one that increases the number of VM instances when the average CPU load is high. The following example will allocate two more nodes (up to the maximum) if the CPU average a load of greater than 70% for five minutes::
156
156
@@ -234,7 +234,7 @@ Finally, select the __Add__ button to create the rule.
234
234
235
235
---
236
236
237
-
## Create a rule to scale in using metrics
237
+
## Create a rule to scale in using deployment metrics
238
238
239
239
When load is light, a scaling in rule can reduce the number of VM instances. The following example will release a single node, down to a minimum of 2, if the CPU load is less than 30% for 5 minutes:
240
240
@@ -401,6 +401,10 @@ Select __Scale based on metric__, and then select __Add a rule__. The __Scale ru
401
401
402
402
---
403
403
404
+
## Find supported Metrics IDs
405
+
406
+
If you want to use other metrics in code (either CLI or SDK) to set up autoscale rules, see the table in [Available metrics](how-to-monitor-online-endpoints.md#available-metrics).
407
+
404
408
## Create scaling rules based on a schedule
405
409
406
410
You can also create rules that apply only on certain days or at certain times. In this example, the node count is set to 2 on the weekend.
Copy file name to clipboardExpand all lines: articles/machine-learning/how-to-monitor-online-endpoints.md
+57-20Lines changed: 57 additions & 20 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -67,20 +67,31 @@ Depending on the resource that you select, the metrics that you see will be diff
67
67
68
68
#### Metrics at endpoint scope
69
69
70
-
- Request Latency
71
-
- Request Latency P50 (Request latency at the 50th percentile)
72
-
- Request Latency P90 (Request latency at the 90th percentile)
73
-
- Request Latency P95 (Request latency at the 95th percentile)
74
-
- Requests per minute
75
-
- New connections per second
76
-
- Active connection count
77
-
- Network bytes
70
+
-__Traffic__
78
71
79
-
Split on the following dimensions:
72
+
| Metric ID | Unit | Description | Aggregate Method | Splittable By | Example Metric Alerts |
73
+
| ---- | --- | --- | --- | --- | --- |
74
+
| RequestsPerMinute | Count | The number of requests sent to Endpoint within a minute | Average | Deployment, ModelStatusCode, StatusCode, StatusCodeClass | Alert me when I have <= 0 transactions in the system |
75
+
| RequestLatency | Milliseconds | The complete interval of time taken for a request to be responded | Average | Deployment | Alert me when average latency > 2 sec |
76
+
| RequestLatency_P50 | Milliseconds | The request latency at the 50th percentile aggregated by all request latency values collected over a period of 60 seconds | Average | Deployment | Alert me when average latency > 2 sec |
77
+
| RequestLatency_P90 | Milliseconds | The request latency at the 90th percentile aggregated by all request latency values collected over a period of 60 seconds | Average | Deployment | Alert me when average latency > 2 sec |
78
+
| RequestLatency_P95 | Milliseconds | The request latency at the 95th percentile aggregated by all request latency values collected over a period of 60 seconds | Average | Deployment | Alert me when average latency > 2 sec |
79
+
| RequestLatency_P99 | Milliseconds | The request latency at the 99th percentile aggregated by all request latency values collected over a period of 60 seconds | Average | Deployment | Alert me when average latency > 2 sec |
80
80
81
-
- Deployment
82
-
- Status Code
83
-
- Status Code Class
81
+
-__Network__
82
+
83
+
| Metric ID | Unit | Description | Aggregate Method | Splittable By | Example Metric Alerts |
84
+
| ---- | --- | --- | --- | --- | --- |
85
+
| NetworkBytes | Bytes per second | The bytes per second served for the endpoint | Average | - | - |
86
+
| ConnectionsActive | Count | The total number of concurrent TCP connections active from clients | Average | - | - |
87
+
| NewConnectionsPerSecond | Count | The average number of new TCP connections per second established from clients | Average | - | - |
88
+
89
+
-__Model Data Collection__
90
+
91
+
| Metric ID | Unit | Description | Aggregate Method | Splittable By | Example Metric Alerts |
92
+
| ---- | --- | --- | --- | --- | --- |
93
+
| DataCollectionEventsPerMinute | Count | The number of data collection events processed per minute | Average | Deployment, Type | - |
94
+
| DataCollectionErrorsPerMinute | Count | The number of data collection events dropped per minute | Average | Deployment, Type, Reason | - |
84
95
85
96
For example, you can split along the deployment dimension to compare the request latency of different deployments under an endpoint.
86
97
@@ -93,16 +104,39 @@ For more information, see [Bandwidth limit issues](how-to-troubleshoot-online-en
93
104
94
105
#### Metrics at deployment scope
95
106
96
-
- CPU Utilization Percentage
97
-
- Deployment Capacity (the number of instances of the requested instance type)
98
-
- Disk Utilization
99
-
- GPU Memory Utilization (only applicable to GPU instances)
100
-
- GPU Utilization (only applicable to GPU instances)
101
-
- Memory Utilization Percentage
107
+
-__Saturation__
108
+
109
+
| Metric ID | Unit | Description | Aggregate Method | Splittable By | Example Metric Alerts |
110
+
| ---- | --- | --- | --- | --- | --- |
111
+
| CpuUtilizationPercentage | Percent | How much percentage of CPU was utilized | Minimun, Maximum, Average | InstanceId | Alert me when % Capacity Used > 75% |
112
+
| CpuMemoryUtilizationPercentage | Percent | How much percent of Memory was utilized | Minimun, Maximum, Average | InstanceId ||
113
+
| DiskUtilization | Percent | How much disk space was utilized | Minimun, Maximum, Average | InstanceId, Disk ||
114
+
| GpuUtilizationPercentage | Percent | Percentage of GPU utilization on an instance - Utilization is reported at one minute intervals | Minimun, Maximum, Average | InstanceId ||
115
+
| GpuMemoryUtilizationPercentage | Percent | Percentage of GPU memory utilization on an instance - Utilization is reported at one minute intervals | Minimun, Maximum, Average | InstanceId ||
116
+
| GpuEnergyJoules | Joule | Interval energy in Joules on a GPU node - Energy is reported at one minute intervals | Minimun, Maximum, Average | InstanceId ||
117
+
118
+
-__Availability__
102
119
103
-
Split on the following dimension:
120
+
| Metric ID | Unit | Description | Aggregate Method | Splittable By | Example Metric Alerts |
121
+
| ---- | --- | --- | --- | --- | --- |
122
+
| DeploymentCapacity | Count | The number of instances in the deployment | Minimum, Maximum, Average | InstanceId, State | Alert me when the % Availability of my service drops below 100% |
104
123
105
-
- Instance Id
124
+
-__Traffic__
125
+
126
+
| Metric ID | Unit | Description | Aggregate Method | Splittable By | Example Metric Alerts |
127
+
| ---- | --- | --- | --- | --- | --- |
128
+
| RequestsPerMinute | Count | The number of requests sent to online deployment within a minute | Average | StatusCode | Alert me when I have <= 0 transactions in the system |
129
+
| RequestLatency_P50 | Milliseconds | The average P50 request latency aggregated by all request latency values collected over the selected time period | Average | - | Alert me when average latency > 2 sec |
130
+
| RequestLatency_P90 | Milliseconds | The average P90 request latency aggregated by all request latency values collected over the selected time period | Average | - | Alert me when average latency > 2 sec |
131
+
| RequestLatency_P95 | Milliseconds | The average P95 request latency aggregated by all request latency values collected over the selected time period | Average | - | Alert me when average latency > 2 sec |
132
+
| RequestLatency_P99 | Milliseconds | The average P99 request latency aggregated by all request latency values collected over the selected time period | Average | - | Alert me when average latency > 2 sec |
133
+
134
+
-__Model Data Collection__
135
+
136
+
| Metric ID | Unit | Description | Aggregate Method | Splittable By | Example Metric Alerts |
137
+
| ---- | --- | --- | --- | --- | --- |
138
+
| DataCollectionEventsPerMinute | Count | The number of data collection events processed per minute | Average | InstanceId, Type | - |
139
+
| DataCollectionErrorsPerMinute | Count | The number of data collection events dropped per minute | Average | InstanceId, Type, Reason | - |
106
140
107
141
For instance, you can compare CPU and/or memory utilization between difference instances for an online deployment.
108
142
@@ -132,6 +166,9 @@ You can also create custom alerts to notify you of important status updates to y
132
166
133
167
For more information, see [Create Azure Monitor alert rules](../azure-monitor/alerts/alerts-create-new-alert-rule.md).
134
168
169
+
### Enable autoscale based on metrics
170
+
171
+
You can enable autoscale of deployments using metrics using UI or code. When you use code (either CLI or SDK), you can use Metrics IDs listed in the table of [available metrics](#available-metrics) in condition for triggering autoscaling. For more information, see [Autoscaling online endpoints](how-to-autoscale-endpoints.md).
0 commit comments