Skip to content

Commit cd83f1c

Browse files
authored
fix: Mark alert block as yaml to fix syntax error (#954)
TESTED=preview with local mkdocs
1 parent e958cbd commit cd83f1c

File tree

1 file changed

+12
-12
lines changed

1 file changed

+12
-12
lines changed

site-src/guides/metrics.md

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -107,54 +107,54 @@ A template alert rule is available at [alert.yaml](../../tools/alerts/alert.yaml
107107

108108
#### High Inference Request Latency P99
109109

110-
```
110+
```yaml
111111
alert: HighInferenceRequestLatencyP99
112112
expr: histogram_quantile(0.99, rate(inference_model_request_duration_seconds_bucket[5m])) > 10.0 # Adjust threshold as needed (e.g., 10.0 seconds)
113113
for: 5m
114114
annotations:
115-
title: 'High latency (P99) for model {{ $labels.model_name }}'
116-
description: 'The 99th percentile request duration for model {{ $labels.model_name }} and target model {{ $labels.target_model_name }} has been consistently above 10.0 seconds for 5 minutes.'
115+
title: 'High latency (P99) for model {% raw %}{{ $labels.model_name }}{% endraw %}'
116+
description: 'The 99th percentile request duration for model {% raw %}{{ $labels.model_name }}{% endraw %} and target model {% raw %}{{ $labels.target_model_name }}{% endraw %} has been consistently above 10.0 seconds for 5 minutes.'
117117
labels:
118118
severity: 'warning'
119119
```
120120
121121
#### High Inference Error Rate
122122
123-
```
123+
```yaml
124124
alert: HighInferenceErrorRate
125125
expr: sum by (model_name) (rate(inference_model_request_error_total[5m])) / sum by (model_name) (rate(inference_model_request_total[5m])) > 0.05 # Adjust threshold as needed (e.g., 5% error rate)
126126
for: 5m
127127
annotations:
128-
title: 'High error rate for model {{ $labels.model_name }}'
129-
description: 'The error rate for model {{ $labels.model_name }} and target model {{ $labels.target_model_name }} has been consistently above 5% for 5 minutes.'
128+
title: 'High error rate for model {% raw %}{{ $labels.model_name }}{% endraw %}'
129+
description: 'The error rate for model {% raw %}{{ $labels.model_name }}{% endraw %} and target model {% raw %}{{ $labels.target_model_name }}{% endraw %} has been consistently above 5% for 5 minutes.'
130130
labels:
131131
severity: 'critical'
132132
impact: 'availability'
133133
```
134134
135135
#### High Inference Pool Queue Average Size
136136
137-
```
137+
```yaml
138138
alert: HighInferencePoolAvgQueueSize
139139
expr: inference_pool_average_queue_size > 50 # Adjust threshold based on expected queue size
140140
for: 5m
141141
annotations:
142-
title: 'High average queue size for inference pool {{ $labels.name }}'
143-
description: 'The average number of requests pending in the queue for inference pool {{ $labels.name }} has been consistently above 50 for 5 minutes.'
142+
title: 'High average queue size for inference pool {% raw %}{{ $labels.name }}{% endraw %}'
143+
description: 'The average number of requests pending in the queue for inference pool {% raw %}{{ $labels.name }}{% endraw %} has been consistently above 50 for 5 minutes.'
144144
labels:
145145
severity: 'critical'
146146
impact: 'performance'
147147
```
148148
149149
#### High Inference Pool Average KV Cache
150150
151-
```
151+
```yaml
152152
alert: HighInferencePoolAvgKVCacheUtilization
153153
expr: inference_pool_average_kv_cache_utilization > 0.9 # 90% utilization
154154
for: 5m
155155
annotations:
156-
title: 'High KV cache utilization for inference pool {{ $labels.name }}'
157-
description: 'The average KV cache utilization for inference pool {{ $labels.name }} has been consistently above 90% for 5 minutes, indicating potential resource exhaustion.'
156+
title: 'High KV cache utilization for inference pool {% raw %}{{ $labels.name }}{% endraw %}'
157+
description: 'The average KV cache utilization for inference pool {% raw %}{{ $labels.name }}{% endraw %} has been consistently above 90% for 5 minutes, indicating potential resource exhaustion.'
158158
labels:
159159
severity: 'critical'
160160
impact: 'resource_exhaustion'

0 commit comments

Comments
 (0)