Skip to content

Metrics and Algorithms to Observe and Classify LLM Backend Load #87

@rootfs

Description

@rootfs

Is your feature request related to a problem? Please describe.
Explore and expose additional Prometheus metrics from the Semantic Router to describe workload characteristics (prompt/response sizes, category distribution, cache hit ratio) and backend load (endpoint utilization, token throughput, TTFT, TPOT). These metrics enable the control plane to implement algorithms that adapt router configs dynamically for latency, accuracy, and cost objectives.

Metadata

Metadata

Type

No type

Projects

Status

Backlog

Relationships

None yet

Development

No branches or pull requests

Issue actions