Skip to content

Commit 6bd27b8

Browse files
authored
Update available metrics as table
1 parent 23f3aa8 commit 6bd27b8

File tree

1 file changed

+32
-24
lines changed

1 file changed

+32
-24
lines changed

articles/machine-learning/how-to-monitor-online-endpoints.md

Lines changed: 32 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -67,20 +67,19 @@ Depending on the resource that you select, the metrics that you see will be diff
6767

6868
#### Metrics at endpoint scope
6969

70-
- Request Latency
71-
- Request Latency P50 (Request latency at the 50th percentile)
72-
- Request Latency P90 (Request latency at the 90th percentile)
73-
- Request Latency P95 (Request latency at the 95th percentile)
74-
- Requests per minute
75-
- New connections per second
76-
- Active connection count
77-
- Network bytes
78-
79-
Split on the following dimensions:
80-
81-
- Deployment
82-
- Status Code
83-
- Status Code Class
70+
| Metric Category | Metric ID | Unit | Description | Aggregate Method | Splittable By | Example Metric Alerts |
71+
| --------------- | ---- | --- | --- | --- | --- | --- |
72+
| Traffic | RequestsPerMinute | Count | The number of requests sent to Endpoint within a minute | Average | Deployment, ModelStatusCode, StatusCode, StatusCodeClass | Alert me when I have <= 0 transactions in the system |
73+
| | RequestLatency | Milliseconds | The complete interval of time taken for a request to be responded | Average | Deployment | Alert me when average latency > 2 sec |
74+
| | RequestLatency_P50 | Milliseconds | The request latency at the 50th percentile aggregated by all request latency values collected over a period of 60 seconds | Average | Deployment | Alert me when average latency > 2 sec |
75+
| | RequestLatency_P90 | Milliseconds | The request latency at the 90th percentile aggregated by all request latency values collected over a period of 60 seconds | Average | Deployment | Alert me when average latency > 2 sec |
76+
| | RequestLatency_P95 | Milliseconds | The request latency at the 95th percentile aggregated by all request latency values collected over a period of 60 seconds | Average | Deployment | Alert me when average latency > 2 sec |
77+
| | RequestLatency_P99 | Milliseconds | The request latency at the 99th percentile aggregated by all request latency values collected over a period of 60 seconds | Average | Deployment | Alert me when average latency > 2 sec |
78+
| Network | NetworkBytes | Bytes per second | The bytes per second served for the endpoint | Average | - | - |
79+
| | ConnectionsActive | Count | The total number of concurrent TCP connections active from clients | Average | - | - |
80+
| | NewConnectionsPerSecond | Count | The average number of new TCP connections per second established from clients | Average | - | - |
81+
| Model Data Collection | DataCollectionEventsPerMinute | Count | The number of data collection events processed per minute | Average | Deployment, Type | - |
82+
| | DataCollectionErrorsPerMinute | Count | The number of data collection events dropped per minute | Average | Deployment, Type, Reason | - |
8483

8584
For example, you can split along the deployment dimension to compare the request latency of different deployments under an endpoint.
8685

@@ -93,16 +92,22 @@ For more information, see [Bandwidth limit issues](how-to-troubleshoot-online-en
9392

9493
#### Metrics at deployment scope
9594

96-
- CPU Utilization Percentage
97-
- Deployment Capacity (the number of instances of the requested instance type)
98-
- Disk Utilization
99-
- GPU Memory Utilization (only applicable to GPU instances)
100-
- GPU Utilization (only applicable to GPU instances)
101-
- Memory Utilization Percentage
102-
103-
Split on the following dimension:
104-
105-
- Instance Id
95+
| Metric Category | Metric ID | Unit | Description | Aggregate Method | Splittable By | Example Metric Alerts |
96+
| --------------- | ---- | --- | --- | --- | --- | --- |
97+
| Saturation | CpuUtilizationPercentage | Percent | How much percentage of CPU was utilized | Minimun, Maximum, Average | InstanceId | Alert me when % Capacity Used > 75% |
98+
| | CpuMemoryUtilizationPercentage | Percent | How much percent of Memory was utilized | Minimun, Maximum, Average | InstanceId | |
99+
| | DiskUtilization | Percent | How much disk space was utilized | Minimun, Maximum, Average | InstanceId, Disk | |
100+
| | GpuUtilizationPercentage | Percent | Percentage of GPU utilization on an instance - Utilization is reported at one minute intervals | Minimun, Maximum, Average | InstanceId | |
101+
| | GpuMemoryUtilizationPercentage | Percent | Percentage of GPU memory utilization on an instance - Utilization is reported at one minute intervals | Minimun, Maximum, Average | InstanceId | |
102+
| | GpuEnergyJoules | Joule | Interval energy in Joules on a GPU node - Energy is reported at one minute intervals | Minimun, Maximum, Average | InstanceId | |
103+
| Availability | DeploymentCapacity | Count | The number of instances in the deployment | Minimum, Maximum, Average | InstanceId, State | Alert me when the % Availability of my service drops below 100% |
104+
| Traffic | RequestsPerMinute | Count | The number of requests sent to online deployment within a minute | Average | StatusCode | Alert me when I have <= 0 transactions in the system |
105+
| | RequestLatency_P50 | Milliseconds | The average P50 request latency aggregated by all request latency values collected over the selected time period | Average | - | Alert me when average latency > 2 sec |
106+
| | RequestLatency_P90 | Milliseconds | The average P90 request latency aggregated by all request latency values collected over the selected time period | Average | - | Alert me when average latency > 2 sec |
107+
| | RequestLatency_P95 | Milliseconds | The average P95 request latency aggregated by all request latency values collected over the selected time period | Average | - | Alert me when average latency > 2 sec |
108+
| | RequestLatency_P99 | Milliseconds | The average P99 request latency aggregated by all request latency values collected over the selected time period | Average | - | Alert me when average latency > 2 sec |
109+
| Model Data Collection | DataCollectionEventsPerMinute | Count | The number of data collection events processed per minute | Average | InstanceId, Type | - |
110+
| | DataCollectionErrorsPerMinute | Count | The number of data collection events dropped per minute | Average | InstanceId, Type, Reason | - |
106111

107112
For instance, you can compare CPU and/or memory utilization between difference instances for an online deployment.
108113

@@ -132,6 +137,9 @@ You can also create custom alerts to notify you of important status updates to y
132137

133138
For more information, see [Create Azure Monitor alert rules](../azure-monitor/alerts/alerts-create-new-alert-rule.md).
134139

140+
### Enable autoscale based on metrics
141+
142+
You can enable autoscale of deployments using metrics using UI or code. When you use code (either CLI or SDK), you can use Metrics IDs listed in the table of [available metrics](#available-metrics) in condition for triggering autoscaling. For more information, see [Autoscaling online endpoints](how-to-autoscale-endpoints.md).
135143

136144
## Logs
137145

0 commit comments

Comments
 (0)