Skip to content

Commit 71e8505

Browse files
authored
Table reformatting for better readability
1 parent 67b3ecc commit 71e8505

File tree

1 file changed

+58
-29
lines changed

1 file changed

+58
-29
lines changed

articles/machine-learning/how-to-monitor-online-endpoints.md

Lines changed: 58 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -67,19 +67,31 @@ Depending on the resource that you select, the metrics that you see will be diff
6767

6868
#### Metrics at endpoint scope
6969

70-
| Metric Category | Metric ID | Unit | Description | Aggregate Method | Splittable By | Example Metric Alerts |
71-
| --------------- | ---- | --- | --- | --- | --- | --- |
72-
| Traffic | RequestsPerMinute | Count | The number of requests sent to Endpoint within a minute | Average | Deployment, ModelStatusCode, StatusCode, StatusCodeClass | Alert me when I have <= 0 transactions in the system |
73-
| | RequestLatency | Milliseconds | The complete interval of time taken for a request to be responded | Average | Deployment | Alert me when average latency > 2 sec |
74-
| | RequestLatency_P50 | Milliseconds | The request latency at the 50th percentile aggregated by all request latency values collected over a period of 60 seconds | Average | Deployment | Alert me when average latency > 2 sec |
75-
| | RequestLatency_P90 | Milliseconds | The request latency at the 90th percentile aggregated by all request latency values collected over a period of 60 seconds | Average | Deployment | Alert me when average latency > 2 sec |
76-
| | RequestLatency_P95 | Milliseconds | The request latency at the 95th percentile aggregated by all request latency values collected over a period of 60 seconds | Average | Deployment | Alert me when average latency > 2 sec |
77-
| | RequestLatency_P99 | Milliseconds | The request latency at the 99th percentile aggregated by all request latency values collected over a period of 60 seconds | Average | Deployment | Alert me when average latency > 2 sec |
78-
| Network | NetworkBytes | Bytes per second | The bytes per second served for the endpoint | Average | - | - |
79-
| | ConnectionsActive | Count | The total number of concurrent TCP connections active from clients | Average | - | - |
80-
| | NewConnectionsPerSecond | Count | The average number of new TCP connections per second established from clients | Average | - | - |
81-
| Model Data Collection | DataCollectionEventsPerMinute | Count | The number of data collection events processed per minute | Average | Deployment, Type | - |
82-
| | DataCollectionErrorsPerMinute | Count | The number of data collection events dropped per minute | Average | Deployment, Type, Reason | - |
70+
- __Traffic__
71+
72+
| Metric ID | Unit | Description | Aggregate Method | Splittable By | Example Metric Alerts |
73+
| ---- | --- | --- | --- | --- | --- |
74+
| RequestsPerMinute | Count | The number of requests sent to Endpoint within a minute | Average | Deployment, ModelStatusCode, StatusCode, StatusCodeClass | Alert me when I have <= 0 transactions in the system |
75+
| RequestLatency | Milliseconds | The complete interval of time taken for a request to be responded | Average | Deployment | Alert me when average latency > 2 sec |
76+
| RequestLatency_P50 | Milliseconds | The request latency at the 50th percentile aggregated by all request latency values collected over a period of 60 seconds | Average | Deployment | Alert me when average latency > 2 sec |
77+
| RequestLatency_P90 | Milliseconds | The request latency at the 90th percentile aggregated by all request latency values collected over a period of 60 seconds | Average | Deployment | Alert me when average latency > 2 sec |
78+
| RequestLatency_P95 | Milliseconds | The request latency at the 95th percentile aggregated by all request latency values collected over a period of 60 seconds | Average | Deployment | Alert me when average latency > 2 sec |
79+
| RequestLatency_P99 | Milliseconds | The request latency at the 99th percentile aggregated by all request latency values collected over a period of 60 seconds | Average | Deployment | Alert me when average latency > 2 sec |
80+
81+
- __Network__
82+
83+
| Metric ID | Unit | Description | Aggregate Method | Splittable By | Example Metric Alerts |
84+
| ---- | --- | --- | --- | --- | --- |
85+
| NetworkBytes | Bytes per second | The bytes per second served for the endpoint | Average | - | - |
86+
| ConnectionsActive | Count | The total number of concurrent TCP connections active from clients | Average | - | - |
87+
| NewConnectionsPerSecond | Count | The average number of new TCP connections per second established from clients | Average | - | - |
88+
89+
- __Model Data Collection__
90+
91+
| Metric ID | Unit | Description | Aggregate Method | Splittable By | Example Metric Alerts |
92+
| ---- | --- | --- | --- | --- | --- |
93+
| DataCollectionEventsPerMinute | Count | The number of data collection events processed per minute | Average | Deployment, Type | - |
94+
| DataCollectionErrorsPerMinute | Count | The number of data collection events dropped per minute | Average | Deployment, Type, Reason | - |
8395

8496
For example, you can split along the deployment dimension to compare the request latency of different deployments under an endpoint.
8597

@@ -92,22 +104,39 @@ For more information, see [Bandwidth limit issues](how-to-troubleshoot-online-en
92104

93105
#### Metrics at deployment scope
94106

95-
| Metric Category | Metric ID | Unit | Description | Aggregate Method | Splittable By | Example Metric Alerts |
96-
| --------------- | ---- | --- | --- | --- | --- | --- |
97-
| Saturation | CpuUtilizationPercentage | Percent | How much percentage of CPU was utilized | Minimun, Maximum, Average | InstanceId | Alert me when % Capacity Used > 75% |
98-
| | CpuMemoryUtilizationPercentage | Percent | How much percent of Memory was utilized | Minimun, Maximum, Average | InstanceId | |
99-
| | DiskUtilization | Percent | How much disk space was utilized | Minimun, Maximum, Average | InstanceId, Disk | |
100-
| | GpuUtilizationPercentage | Percent | Percentage of GPU utilization on an instance - Utilization is reported at one minute intervals | Minimun, Maximum, Average | InstanceId | |
101-
| | GpuMemoryUtilizationPercentage | Percent | Percentage of GPU memory utilization on an instance - Utilization is reported at one minute intervals | Minimun, Maximum, Average | InstanceId | |
102-
| | GpuEnergyJoules | Joule | Interval energy in Joules on a GPU node - Energy is reported at one minute intervals | Minimun, Maximum, Average | InstanceId | |
103-
| Availability | DeploymentCapacity | Count | The number of instances in the deployment | Minimum, Maximum, Average | InstanceId, State | Alert me when the % Availability of my service drops below 100% |
104-
| Traffic | RequestsPerMinute | Count | The number of requests sent to online deployment within a minute | Average | StatusCode | Alert me when I have <= 0 transactions in the system |
105-
| | RequestLatency_P50 | Milliseconds | The average P50 request latency aggregated by all request latency values collected over the selected time period | Average | - | Alert me when average latency > 2 sec |
106-
| | RequestLatency_P90 | Milliseconds | The average P90 request latency aggregated by all request latency values collected over the selected time period | Average | - | Alert me when average latency > 2 sec |
107-
| | RequestLatency_P95 | Milliseconds | The average P95 request latency aggregated by all request latency values collected over the selected time period | Average | - | Alert me when average latency > 2 sec |
108-
| | RequestLatency_P99 | Milliseconds | The average P99 request latency aggregated by all request latency values collected over the selected time period | Average | - | Alert me when average latency > 2 sec |
109-
| Model Data Collection | DataCollectionEventsPerMinute | Count | The number of data collection events processed per minute | Average | InstanceId, Type | - |
110-
| | DataCollectionErrorsPerMinute | Count | The number of data collection events dropped per minute | Average | InstanceId, Type, Reason | - |
107+
- __Saturation__
108+
109+
| Metric ID | Unit | Description | Aggregate Method | Splittable By | Example Metric Alerts |
110+
| ---- | --- | --- | --- | --- | --- |
111+
| CpuUtilizationPercentage | Percent | How much percentage of CPU was utilized | Minimun, Maximum, Average | InstanceId | Alert me when % Capacity Used > 75% |
112+
| CpuMemoryUtilizationPercentage | Percent | How much percent of Memory was utilized | Minimun, Maximum, Average | InstanceId | |
113+
| DiskUtilization | Percent | How much disk space was utilized | Minimun, Maximum, Average | InstanceId, Disk | |
114+
| GpuUtilizationPercentage | Percent | Percentage of GPU utilization on an instance - Utilization is reported at one minute intervals | Minimun, Maximum, Average | InstanceId | |
115+
| GpuMemoryUtilizationPercentage | Percent | Percentage of GPU memory utilization on an instance - Utilization is reported at one minute intervals | Minimun, Maximum, Average | InstanceId | |
116+
| GpuEnergyJoules | Joule | Interval energy in Joules on a GPU node - Energy is reported at one minute intervals | Minimun, Maximum, Average | InstanceId | |
117+
118+
- __Availability__
119+
120+
| Metric ID | Unit | Description | Aggregate Method | Splittable By | Example Metric Alerts |
121+
| ---- | --- | --- | --- | --- | --- |
122+
| DeploymentCapacity | Count | The number of instances in the deployment | Minimum, Maximum, Average | InstanceId, State | Alert me when the % Availability of my service drops below 100% |
123+
124+
- __Traffic__
125+
126+
| Metric ID | Unit | Description | Aggregate Method | Splittable By | Example Metric Alerts |
127+
| ---- | --- | --- | --- | --- | --- |
128+
| RequestsPerMinute | Count | The number of requests sent to online deployment within a minute | Average | StatusCode | Alert me when I have <= 0 transactions in the system |
129+
| RequestLatency_P50 | Milliseconds | The average P50 request latency aggregated by all request latency values collected over the selected time period | Average | - | Alert me when average latency > 2 sec |
130+
| RequestLatency_P90 | Milliseconds | The average P90 request latency aggregated by all request latency values collected over the selected time period | Average | - | Alert me when average latency > 2 sec |
131+
| RequestLatency_P95 | Milliseconds | The average P95 request latency aggregated by all request latency values collected over the selected time period | Average | - | Alert me when average latency > 2 sec |
132+
| RequestLatency_P99 | Milliseconds | The average P99 request latency aggregated by all request latency values collected over the selected time period | Average | - | Alert me when average latency > 2 sec |
133+
134+
- __Model Data Collection__
135+
136+
| Metric ID | Unit | Description | Aggregate Method | Splittable By | Example Metric Alerts |
137+
| ---- | --- | --- | --- | --- | --- |
138+
| DataCollectionEventsPerMinute | Count | The number of data collection events processed per minute | Average | InstanceId, Type | - |
139+
| DataCollectionErrorsPerMinute | Count | The number of data collection events dropped per minute | Average | InstanceId, Type, Reason | - |
111140

112141
For instance, you can compare CPU and/or memory utilization between difference instances for an online deployment.
113142

0 commit comments

Comments
 (0)