Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 35 additions & 0 deletions website/docs/api/router.md
Original file line number Diff line number Diff line change
Expand Up @@ -173,6 +173,12 @@ semantic_router_cache_size 1247
# Security metrics
semantic_router_pii_detections_total{action="block"} 23
semantic_router_jailbreak_attempts_total{action="block"} 5

# Error metrics
llm_request_errors_total{model="gpt-4",reason="timeout"} 12
llm_request_errors_total{model="claude-3",reason="upstream_5xx"} 3
llm_request_errors_total{model="phi4",reason="upstream_4xx"} 5
llm_request_errors_total{model="phi4",reason="pii_policy_denied"} 8
```

### Reasoning Mode Metrics
Expand Down Expand Up @@ -247,6 +253,35 @@ sum by (model) (increase(llm_model_cost_total{currency="USD"}[1h]))
sum by (reason_code) (increase(llm_routing_reason_codes_total[15m]))
```

### Request Error Metrics

The router tracks request-level failures by model and reason so you can monitor both absolute error throughput and the share of requests that fail.

- `llm_request_errors_total{model, reason}`
- Description: Total number of request errors categorized by failure reason
- Labels:
- model: target model name for the failed request
- reason: error category (timeout, upstream_4xx, upstream_5xx, pii_policy_denied, jailbreak_block, parse_error, serialization_error, cancellation, classification_failed, unknown)

Example PromQL queries:

```prometheus
# Total errors by reason over the last hour
sum by (reason) (increase(llm_request_errors_total[1h]))

# Error throughput (errors/sec) by model over the last 15 minutes.
# Helpful for incident response because it shows how many failing requests are impacting users.
sum by (model) (rate(llm_request_errors_total[15m]))

# Error ratio (% of requests failing) by model over the last 15 minutes.
# Use increase() to align numerator and denominator with the same lookback window.
100 * sum by (model) (increase(llm_request_errors_total[15m])) /
sum by (model) (increase(llm_model_requests_total[15m]))

# PII policy blocks over the last 24 hours
sum(increase(llm_request_errors_total{reason="pii_policy_denied"}[24h]))
```

### Pricing Configuration

Provide per-1M pricing for your models so the router can compute request cost and emit metrics/logs.
Expand Down
Loading