docs(router.md): add error metrics and example queries for llm_request_errors_total (vllm-project#156)

samzong · tao12345666333 · rootfs · yossiovadia · commit e6ee1440eeb1 · 2025-10-08T13:44:38.000-07:00
* docs(router): add error metrics and example queries for llm_request_errors_total

Signed-off-by: samzong &lt;samzong.lu@gmail.com&gt;

* Update website/docs/api/router.md

Co-authored-by: Jintao Zhang &lt;zhangjintao9020@gmail.com&gt;
Signed-off-by: samzong &lt;samzong.lu@gmail.com&gt;

---------

Signed-off-by: samzong &lt;samzong.lu@gmail.com&gt;
Co-authored-by: Jintao Zhang &lt;zhangjintao9020@gmail.com&gt;
Co-authored-by: Huamin Chen &lt;rootfs@users.noreply.github.com&gt;
diff --git a/website/docs/api/router.md b/website/docs/api/router.md
@@ -173,6 +173,12 @@ semantic_router_cache_size 1247
 # Security metrics
 semantic_router_pii_detections_total{action="block"} 23
 semantic_router_jailbreak_attempts_total{action="block"} 5
+
+# Error metrics
+llm_request_errors_total{model="gpt-4",reason="timeout"} 12
+llm_request_errors_total{model="claude-3",reason="upstream_5xx"} 3
+llm_request_errors_total{model="phi4",reason="upstream_4xx"} 5
+llm_request_errors_total{model="phi4",reason="pii_policy_denied"} 8
 ```
 
 ### Reasoning Mode Metrics
@@ -247,6 +253,35 @@ sum by (model) (increase(llm_model_cost_total{currency="USD"}[1h]))
 sum by (reason_code) (increase(llm_routing_reason_codes_total[15m]))
 ```
 
+### Request Error Metrics
+
+The router tracks request-level failures by model and reason so you can monitor both absolute error throughput and the share of requests that fail.
+
+- `llm_request_errors_total{model, reason}`
+  - Description: Total number of request errors categorized by failure reason
+  - Labels:
+    - model: target model name for the failed request
+    - reason: error category (timeout, upstream_4xx, upstream_5xx, pii_policy_denied, jailbreak_block, parse_error, serialization_error, cancellation, classification_failed, unknown)
+
+Example PromQL queries:
+
+```prometheus
+# Total errors by reason over the last hour
+sum by (reason) (increase(llm_request_errors_total[1h]))
+
+# Error throughput (errors/sec) by model over the last 15 minutes.
+# Helpful for incident response because it shows how many failing requests are impacting users.
+sum by (model) (rate(llm_request_errors_total[15m]))
+
+# Error ratio (% of requests failing) by model over the last 15 minutes.
+# Use increase() to align numerator and denominator with the same lookback window.
+100 * sum by (model) (increase(llm_request_errors_total[15m])) /
+    sum by (model) (increase(llm_model_requests_total[15m]))
+
+# PII policy blocks over the last 24 hours
+sum(increase(llm_request_errors_total{reason="pii_policy_denied"}[24h]))
+```
+
 ### Pricing Configuration
 
 Provide per-1M pricing for your models so the router can compute request cost and emit metrics/logs.