Is your feature request related to a problem? Please describe.
#85 The control plane needs “prompt/response length distributions” and “classified category mix.” Current CategoryClassifications is a Gauge, which is suboptimal.
Describe the solution you'd like
• Add request-level token histograms:
• llm_prompt_tokens_per_request (HistogramVec; labels: model)
• llm_completion_tokens_per_request (HistogramVec; labels: model)
• Introduce a Counter for category mix:
• llm_category_classifications_count (CounterVec; labels: category)
• Mark existing Gauge llm_category_classifications_total as deprecated in docs. (Or we can remove it)
Describe alternatives you've considered
Additional context