You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
title: 'High latency (P99) for model {{ $labels.model_name }}'
116
-
description: 'The 99th percentile request duration for model {{ $labels.model_name }}and target model {{ $labels.target_model_name }} has been consistently above 10.0 seconds for 5 minutes.'
115
+
title: 'High latency (P99) for model {% raw %}{{ $labels.model_name }}{% endraw %}'
116
+
description: 'The 99th percentile request duration for model {% raw %}{{ $labels.model_name }}{% endraw %} and target model {% raw %}{{ $labels.target_model_name }}{% endraw %} has been consistently above 10.0 seconds for 5 minutes.'
117
117
labels:
118
118
severity: 'warning'
119
119
```
120
120
121
121
#### High Inference Error Rate
122
122
123
-
```
123
+
```yaml
124
124
alert: HighInferenceErrorRate
125
125
expr: sum by (model_name) (rate(inference_model_request_error_total[5m])) / sum by (model_name) (rate(inference_model_request_total[5m])) > 0.05 # Adjust threshold as needed (e.g., 5% error rate)
126
126
for: 5m
127
127
annotations:
128
-
title: 'High error rate for model {{ $labels.model_name }}'
129
-
description: 'The error rate for model {{ $labels.model_name }}and target model {{ $labels.target_model_name }} has been consistently above 5% for 5 minutes.'
128
+
title: 'High error rate for model {% raw %}{{ $labels.model_name }}{% endraw %}'
129
+
description: 'The error rate for model {% raw %}{{ $labels.model_name }}{% endraw %} and target model {% raw %}{{ $labels.target_model_name }}{% endraw %} has been consistently above 5% for 5 minutes.'
130
130
labels:
131
131
severity: 'critical'
132
132
impact: 'availability'
133
133
```
134
134
135
135
#### High Inference Pool Queue Average Size
136
136
137
-
```
137
+
```yaml
138
138
alert: HighInferencePoolAvgQueueSize
139
139
expr: inference_pool_average_queue_size > 50 # Adjust threshold based on expected queue size
140
140
for: 5m
141
141
annotations:
142
-
title: 'High average queue size for inference pool {{ $labels.name }}'
143
-
description: 'The average number of requests pending in the queue for inference pool {{ $labels.name }} has been consistently above 50 for 5 minutes.'
142
+
title: 'High average queue size for inference pool {% raw %}{{ $labels.name }}{% endraw %}'
143
+
description: 'The average number of requests pending in the queue for inference pool {% raw %}{{ $labels.name }}{% endraw %} has been consistently above 50 for 5 minutes.'
title: 'High KV cache utilization for inference pool {{ $labels.name }}'
157
-
description: 'The average KV cache utilization for inference pool {{ $labels.name }} has been consistently above 90% for 5 minutes, indicating potential resource exhaustion.'
156
+
title: 'High KV cache utilization for inference pool {% raw %}{{ $labels.name }}{% endraw %}'
157
+
description: 'The average KV cache utilization for inference pool {% raw %}{{ $labels.name }}{% endraw %} has been consistently above 90% for 5 minutes, indicating potential resource exhaustion.'
0 commit comments