-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Summary
Add a Prometheus metrics endpoint to the LLM server for monitoring sessions, token usage, request latency, tool calls, and provider health.
Requirements
Metrics Endpoint
- Expose
/metricsendpoint on the server in Prometheus exposition format - Use the standard
prometheus/client_golanglibrary - Endpoint should be available alongside the existing API on the same port
Metrics to Expose
Session Metrics
llm_sessions_active(gauge) — number of active sessionsllm_sessions_created_total(counter) — total sessions createdllm_sessions_deleted_total(counter) — total sessions deleted (including pruned)
Token Usage
llm_tokens_input_total(counter, labels:provider,model) — total input tokens consumedllm_tokens_output_total(counter, labels:provider,model) — total output tokens generatedllm_tokens_thinking_total(counter, labels:provider,model) — total thinking tokens (for reasoning models)llm_tokens_cached_total(counter, labels:provider,model) — total cached input tokens
Request Metrics
llm_requests_total(counter, labels:provider,model,status) — total chat completion requestsllm_request_duration_seconds(histogram, labels:provider,model) — request latencyllm_request_errors_total(counter, labels:provider,model,error_type) — failed requests
Tool Metrics
llm_tool_calls_total(counter, labels:tool) — total tool invocationsllm_tool_call_duration_seconds(histogram, labels:tool) — tool execution latencyllm_tool_call_errors_total(counter, labels:tool) — failed tool calls
Provider Metrics
llm_provider_up(gauge, labels:provider) — provider health status (1=up, 0=down)llm_models_available(gauge, labels:provider) — number of models available per provider
Telegram Metrics (when telegram sidecar is running)
llm_telegram_messages_received_total(counter) — messages received from usersllm_telegram_messages_sent_total(counter) — responses sent to usersllm_telegram_active_users(gauge) — unique active users
Implementation
- Instrument the existing request pipeline (agent/manager, session, tool execution) with metric collectors
- Metrics should have minimal performance overhead
- Token counts are available from provider responses (
UsageMetadata/ usage fields) - Consider a
--metricsflag to enable/disable the endpoint (default: enabled)
Grafana Dashboard
- Provide a sample Grafana dashboard JSON for visualizing the metrics
- Panels: token usage over time, request rate by provider, latency percentiles, active sessions, tool call breakdown, error rate
Integration
- Add the metrics endpoint as a Prometheus scrape target in the existing
prometheus.tfconfiguration - Example scrape config target:
llm-prod-lan.default.nomad.:8085
Motivation
Observability is essential for production LLM deployments. Token usage drives cost, latency affects user experience, and error rates indicate provider issues. Prometheus metrics integrate with the existing Grafana/Prometheus stack already running in the infrastructure.
Running go-llm tools as an MCP server allows any MCP-compatible client (Claude Desktop, Cursor, other LLM agents) to use Home Assistant, weather, news, and custom tools without reimplementing them. Combined with skills-as-prompts, this makes go-llm a reusable tool/prompt hub for the broader LLM ecosystem.