@@ -173,6 +173,12 @@ semantic_router_cache_size 1247
173
173
# Security metrics
174
174
semantic_router_pii_detections_total{action="block"} 23
175
175
semantic_router_jailbreak_attempts_total{action="block"} 5
176
+
177
+ # Error metrics
178
+ llm_request_errors_total{model="gpt-4",reason="timeout"} 12
179
+ llm_request_errors_total{model="claude-3",reason="upstream_5xx"} 3
180
+ llm_request_errors_total{model="phi4",reason="upstream_4xx"} 5
181
+ llm_request_errors_total{model="phi4",reason="pii_policy_denied"} 8
176
182
```
177
183
178
184
### Reasoning Mode Metrics
@@ -247,6 +253,35 @@ sum by (model) (increase(llm_model_cost_total{currency="USD"}[1h]))
247
253
sum by (reason_code) (increase(llm_routing_reason_codes_total[15m]))
248
254
```
249
255
256
+ ### Request Error Metrics
257
+
258
+ The router tracks request-level failures by model and reason so you can monitor both absolute error throughput and the share of requests that fail.
259
+
260
+ - ` llm_request_errors_total{model, reason} `
261
+ - Description: Total number of request errors categorized by failure reason
262
+ - Labels:
263
+ - model: target model name for the failed request
264
+ - reason: error category (timeout, upstream_4xx, upstream_5xx, pii_policy_denied, jailbreak_block, parse_error, serialization_error, cancellation, classification_failed, unknown)
265
+
266
+ Example PromQL queries:
267
+
268
+ ``` prometheus
269
+ # Total errors by reason over the last hour
270
+ sum by (reason) (increase(llm_request_errors_total[1h]))
271
+
272
+ # Error throughput (errors/sec) by model over the last 15 minutes.
273
+ # Helpful for incident response because it shows how many failing requests are impacting users.
274
+ sum by (model) (rate(llm_request_errors_total[15m]))
275
+
276
+ # Error ratio (% of requests failing) by model over the last 15 minutes.
277
+ # Use increase() to align numerator and denominator with the same lookback window.
278
+ 100 * sum by (model) (increase(llm_request_errors_total[15m])) /
279
+ sum by (model) (increase(llm_model_requests_total[15m]))
280
+
281
+ # PII policy blocks over the last 24 hours
282
+ sum(increase(llm_request_errors_total{reason="pii_policy_denied"}[24h]))
283
+ ```
284
+
250
285
### Pricing Configuration
251
286
252
287
Provide per-1M pricing for your models so the router can compute request cost and emit metrics/logs.
0 commit comments