Skip to content

Prometheus metrics endpoint for sessions, tokens, and tool usage #28

@djthorpe

Description

@djthorpe

Summary

Add a Prometheus metrics endpoint to the LLM server for monitoring sessions, token usage, request latency, tool calls, and provider health.

Requirements

Metrics Endpoint

  • Expose /metrics endpoint on the server in Prometheus exposition format
  • Use the standard prometheus/client_golang library
  • Endpoint should be available alongside the existing API on the same port

Metrics to Expose

Session Metrics

  • llm_sessions_active (gauge) — number of active sessions
  • llm_sessions_created_total (counter) — total sessions created
  • llm_sessions_deleted_total (counter) — total sessions deleted (including pruned)

Token Usage

  • llm_tokens_input_total (counter, labels: provider, model) — total input tokens consumed
  • llm_tokens_output_total (counter, labels: provider, model) — total output tokens generated
  • llm_tokens_thinking_total (counter, labels: provider, model) — total thinking tokens (for reasoning models)
  • llm_tokens_cached_total (counter, labels: provider, model) — total cached input tokens

Request Metrics

  • llm_requests_total (counter, labels: provider, model, status) — total chat completion requests
  • llm_request_duration_seconds (histogram, labels: provider, model) — request latency
  • llm_request_errors_total (counter, labels: provider, model, error_type) — failed requests

Tool Metrics

  • llm_tool_calls_total (counter, labels: tool) — total tool invocations
  • llm_tool_call_duration_seconds (histogram, labels: tool) — tool execution latency
  • llm_tool_call_errors_total (counter, labels: tool) — failed tool calls

Provider Metrics

  • llm_provider_up (gauge, labels: provider) — provider health status (1=up, 0=down)
  • llm_models_available (gauge, labels: provider) — number of models available per provider

Telegram Metrics (when telegram sidecar is running)

  • llm_telegram_messages_received_total (counter) — messages received from users
  • llm_telegram_messages_sent_total (counter) — responses sent to users
  • llm_telegram_active_users (gauge) — unique active users

Implementation

  • Instrument the existing request pipeline (agent/manager, session, tool execution) with metric collectors
  • Metrics should have minimal performance overhead
  • Token counts are available from provider responses (UsageMetadata / usage fields)
  • Consider a --metrics flag to enable/disable the endpoint (default: enabled)

Grafana Dashboard

  • Provide a sample Grafana dashboard JSON for visualizing the metrics
  • Panels: token usage over time, request rate by provider, latency percentiles, active sessions, tool call breakdown, error rate

Integration

  • Add the metrics endpoint as a Prometheus scrape target in the existing prometheus.tf configuration
  • Example scrape config target: llm-prod-lan.default.nomad.:8085

Motivation

Observability is essential for production LLM deployments. Token usage drives cost, latency affects user experience, and error rates indicate provider issues. Prometheus metrics integrate with the existing Grafana/Prometheus stack already running in the infrastructure.

Running go-llm tools as an MCP server allows any MCP-compatible client (Claude Desktop, Cursor, other LLM agents) to use Home Assistant, weather, news, and custom tools without reimplementing them. Combined with skills-as-prompts, this makes go-llm a reusable tool/prompt hub for the broader LLM ecosystem.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions