You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
| Traffic | RequestsPerMinute | Count | The number of requests sent to Endpoint within a minute | Average | Deployment, ModelStatusCode, StatusCode, StatusCodeClass | Alert me when I have <= 0 transactions in the system |
73
-
|| RequestLatency | Milliseconds | The complete interval of time taken for a request to be responded | Average | Deployment | Alert me when average latency > 2 sec |
74
-
|| RequestLatency_P50 | Milliseconds | The request latency at the 50th percentile aggregated by all request latency values collected over a period of 60 seconds | Average | Deployment | Alert me when average latency > 2 sec |
75
-
|| RequestLatency_P90 | Milliseconds | The request latency at the 90th percentile aggregated by all request latency values collected over a period of 60 seconds | Average | Deployment | Alert me when average latency > 2 sec |
76
-
|| RequestLatency_P95 | Milliseconds | The request latency at the 95th percentile aggregated by all request latency values collected over a period of 60 seconds | Average | Deployment | Alert me when average latency > 2 sec |
77
-
|| RequestLatency_P99 | Milliseconds | The request latency at the 99th percentile aggregated by all request latency values collected over a period of 60 seconds | Average | Deployment | Alert me when average latency > 2 sec |
78
-
| Network | NetworkBytes | Bytes per second | The bytes per second served for the endpoint | Average | - | - |
79
-
|| ConnectionsActive | Count | The total number of concurrent TCP connections active from clients | Average | - | - |
80
-
|| NewConnectionsPerSecond | Count | The average number of new TCP connections per second established from clients | Average | - | - |
81
-
| Model Data Collection | DataCollectionEventsPerMinute | Count | The number of data collection events processed per minute | Average | Deployment, Type | - |
82
-
|| DataCollectionErrorsPerMinute | Count | The number of data collection events dropped per minute | Average | Deployment, Type, Reason | - |
70
+
-__Traffic__
71
+
72
+
| Metric ID | Unit | Description | Aggregate Method | Splittable By | Example Metric Alerts |
73
+
| ---- | --- | --- | --- | --- | --- |
74
+
| RequestsPerMinute | Count | The number of requests sent to Endpoint within a minute | Average | Deployment, ModelStatusCode, StatusCode, StatusCodeClass | Alert me when I have <= 0 transactions in the system |
75
+
| RequestLatency | Milliseconds | The complete interval of time taken for a request to be responded | Average | Deployment | Alert me when average latency > 2 sec |
76
+
| RequestLatency_P50 | Milliseconds | The request latency at the 50th percentile aggregated by all request latency values collected over a period of 60 seconds | Average | Deployment | Alert me when average latency > 2 sec |
77
+
| RequestLatency_P90 | Milliseconds | The request latency at the 90th percentile aggregated by all request latency values collected over a period of 60 seconds | Average | Deployment | Alert me when average latency > 2 sec |
78
+
| RequestLatency_P95 | Milliseconds | The request latency at the 95th percentile aggregated by all request latency values collected over a period of 60 seconds | Average | Deployment | Alert me when average latency > 2 sec |
79
+
| RequestLatency_P99 | Milliseconds | The request latency at the 99th percentile aggregated by all request latency values collected over a period of 60 seconds | Average | Deployment | Alert me when average latency > 2 sec |
80
+
81
+
-__Network__
82
+
83
+
| Metric ID | Unit | Description | Aggregate Method | Splittable By | Example Metric Alerts |
84
+
| ---- | --- | --- | --- | --- | --- |
85
+
| NetworkBytes | Bytes per second | The bytes per second served for the endpoint | Average | - | - |
86
+
| ConnectionsActive | Count | The total number of concurrent TCP connections active from clients | Average | - | - |
87
+
| NewConnectionsPerSecond | Count | The average number of new TCP connections per second established from clients | Average | - | - |
88
+
89
+
-__Model Data Collection__
90
+
91
+
| Metric ID | Unit | Description | Aggregate Method | Splittable By | Example Metric Alerts |
92
+
| ---- | --- | --- | --- | --- | --- |
93
+
| DataCollectionEventsPerMinute | Count | The number of data collection events processed per minute | Average | Deployment, Type | - |
94
+
| DataCollectionErrorsPerMinute | Count | The number of data collection events dropped per minute | Average | Deployment, Type, Reason | - |
83
95
84
96
For example, you can split along the deployment dimension to compare the request latency of different deployments under an endpoint.
85
97
@@ -92,22 +104,39 @@ For more information, see [Bandwidth limit issues](how-to-troubleshoot-online-en
92
104
93
105
#### Metrics at deployment scope
94
106
95
-
| Metric Category | Metric ID | Unit | Description | Aggregate Method | Splittable By | Example Metric Alerts |
| Saturation | CpuUtilizationPercentage | Percent | How much percentage of CPU was utilized | Minimun, Maximum, Average | InstanceId | Alert me when % Capacity Used > 75% |
98
-
|| CpuMemoryUtilizationPercentage | Percent | How much percent of Memory was utilized | Minimun, Maximum, Average | InstanceId ||
99
-
|| DiskUtilization | Percent | How much disk space was utilized | Minimun, Maximum, Average | InstanceId, Disk ||
100
-
|| GpuUtilizationPercentage | Percent | Percentage of GPU utilization on an instance - Utilization is reported at one minute intervals | Minimun, Maximum, Average | InstanceId ||
101
-
|| GpuMemoryUtilizationPercentage | Percent | Percentage of GPU memory utilization on an instance - Utilization is reported at one minute intervals | Minimun, Maximum, Average | InstanceId ||
102
-
|| GpuEnergyJoules | Joule | Interval energy in Joules on a GPU node - Energy is reported at one minute intervals | Minimun, Maximum, Average | InstanceId ||
103
-
| Availability | DeploymentCapacity | Count | The number of instances in the deployment | Minimum, Maximum, Average | InstanceId, State | Alert me when the % Availability of my service drops below 100% |
104
-
| Traffic | RequestsPerMinute | Count | The number of requests sent to online deployment within a minute | Average | StatusCode | Alert me when I have <= 0 transactions in the system |
105
-
|| RequestLatency_P50 | Milliseconds | The average P50 request latency aggregated by all request latency values collected over the selected time period | Average | - | Alert me when average latency > 2 sec |
106
-
|| RequestLatency_P90 | Milliseconds | The average P90 request latency aggregated by all request latency values collected over the selected time period | Average | - | Alert me when average latency > 2 sec |
107
-
|| RequestLatency_P95 | Milliseconds | The average P95 request latency aggregated by all request latency values collected over the selected time period | Average | - | Alert me when average latency > 2 sec |
108
-
|| RequestLatency_P99 | Milliseconds | The average P99 request latency aggregated by all request latency values collected over the selected time period | Average | - | Alert me when average latency > 2 sec |
109
-
| Model Data Collection | DataCollectionEventsPerMinute | Count | The number of data collection events processed per minute | Average | InstanceId, Type | - |
110
-
|| DataCollectionErrorsPerMinute | Count | The number of data collection events dropped per minute | Average | InstanceId, Type, Reason | - |
107
+
-__Saturation__
108
+
109
+
| Metric ID | Unit | Description | Aggregate Method | Splittable By | Example Metric Alerts |
110
+
| ---- | --- | --- | --- | --- | --- |
111
+
| CpuUtilizationPercentage | Percent | How much percentage of CPU was utilized | Minimun, Maximum, Average | InstanceId | Alert me when % Capacity Used > 75% |
112
+
| CpuMemoryUtilizationPercentage | Percent | How much percent of Memory was utilized | Minimun, Maximum, Average | InstanceId ||
113
+
| DiskUtilization | Percent | How much disk space was utilized | Minimun, Maximum, Average | InstanceId, Disk ||
114
+
| GpuUtilizationPercentage | Percent | Percentage of GPU utilization on an instance - Utilization is reported at one minute intervals | Minimun, Maximum, Average | InstanceId ||
115
+
| GpuMemoryUtilizationPercentage | Percent | Percentage of GPU memory utilization on an instance - Utilization is reported at one minute intervals | Minimun, Maximum, Average | InstanceId ||
116
+
| GpuEnergyJoules | Joule | Interval energy in Joules on a GPU node - Energy is reported at one minute intervals | Minimun, Maximum, Average | InstanceId ||
117
+
118
+
-__Availability__
119
+
120
+
| Metric ID | Unit | Description | Aggregate Method | Splittable By | Example Metric Alerts |
121
+
| ---- | --- | --- | --- | --- | --- |
122
+
| DeploymentCapacity | Count | The number of instances in the deployment | Minimum, Maximum, Average | InstanceId, State | Alert me when the % Availability of my service drops below 100% |
123
+
124
+
-__Traffic__
125
+
126
+
| Metric ID | Unit | Description | Aggregate Method | Splittable By | Example Metric Alerts |
127
+
| ---- | --- | --- | --- | --- | --- |
128
+
| RequestsPerMinute | Count | The number of requests sent to online deployment within a minute | Average | StatusCode | Alert me when I have <= 0 transactions in the system |
129
+
| RequestLatency_P50 | Milliseconds | The average P50 request latency aggregated by all request latency values collected over the selected time period | Average | - | Alert me when average latency > 2 sec |
130
+
| RequestLatency_P90 | Milliseconds | The average P90 request latency aggregated by all request latency values collected over the selected time period | Average | - | Alert me when average latency > 2 sec |
131
+
| RequestLatency_P95 | Milliseconds | The average P95 request latency aggregated by all request latency values collected over the selected time period | Average | - | Alert me when average latency > 2 sec |
132
+
| RequestLatency_P99 | Milliseconds | The average P99 request latency aggregated by all request latency values collected over the selected time period | Average | - | Alert me when average latency > 2 sec |
133
+
134
+
-__Model Data Collection__
135
+
136
+
| Metric ID | Unit | Description | Aggregate Method | Splittable By | Example Metric Alerts |
137
+
| ---- | --- | --- | --- | --- | --- |
138
+
| DataCollectionEventsPerMinute | Count | The number of data collection events processed per minute | Average | InstanceId, Type | - |
139
+
| DataCollectionErrorsPerMinute | Count | The number of data collection events dropped per minute | Average | InstanceId, Type, Reason | - |
111
140
112
141
For instance, you can compare CPU and/or memory utilization between difference instances for an online deployment.
0 commit comments