Skip to content

Metrics and Algorithms to Observe and Classify LLM Backend Load #87

@rootfs

Description

@rootfs

Is your feature request related to a problem? Please describe.
Explore and expose additional Prometheus metrics from the Semantic Router to describe workload characteristics (prompt/response sizes, category distribution, cache hit ratio) and backend load (endpoint utilization, token throughput, TTFT, TPOT). These metrics enable the control plane to implement algorithms that adapt router configs dynamically for latency, accuracy, and cost objectives.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions