This project integrates Prometheus metrics functionality for monitoring various aspects of chat completion requests.
- Automatically collects metrics data during log processing
- Reports metrics after
uploadToLoki - Based on data from
model.ChatLog
chat_rag_requests_total: Total number of chat completion requests- Labels:
client_id,client_ide,model,user,login_from,category
- Labels:
chat_rag_original_tokens_total: Total number of original tokens processed- Labels:
client_id,client_ide,model,user,login_from,token_scope(system/user/all)
- Labels:
chat_rag_compressed_tokens_total: Total number of compressed tokens processed- Labels:
client_id,client_ide,model,user,login_from,token_scope(system/user/all)
- Labels:
chat_rag_compression_ratio: Distribution of compression ratios (buckets: 0.1, 0.2, ..., 1.0)- Labels:
client_id,client_ide,model,user,login_from
- Labels:
chat_rag_user_prompt_compressed_total: Total number of requests where user prompt was compressed- Labels:
client_id,client_ide,model,user,login_from
- Labels:
chat_rag_semantic_latency_ms: Semantic processing latency in milliseconds (buckets: 10, 50, 100, 200, 500, 1000, 2000, 5000)- Labels:
client_id,client_ide,model,user,login_from
- Labels:
chat_rag_summary_latency_ms: Summary processing latency in milliseconds (buckets: 10, 50, 100, 200, 500, 1000, 2000, 5000)- Labels:
client_id,client_ide,model,user,login_from
- Labels:
chat_rag_main_model_latency_ms: Main model processing latency in milliseconds (buckets: 100, 500, 1000, 2000, 5000, 10000, 20000)- Labels:
client_id,client_ide,model,user,login_from
- Labels:
chat_rag_total_latency_ms: Total processing latency in milliseconds (buckets: 100, 500, 1000, 2000, 5000, 10000, 20000, 30000)- Labels:
client_id,client_ide,model,user,login_from
- Labels:
chat_rag_response_tokens_total: Total number of response tokens generated- Labels:
client_id,client_ide,model,user,login_from
- Labels:
chat_rag_errors_total: Total number of errors encountered- Labels:
client_id,client_ide,model,user,login_from,error_type(from log.Error field)
- Labels:
After starting the service, Prometheus metrics can be accessed via:
GET http://localhost:8080/metrics
Add the following job to your Prometheus configuration file:
scrape_configs:
- job_name: "chat-rag"
static_configs:
- targets: ["localhost:8080"]
metrics_path: "/metrics"
scrape_interval: 15schat_rag_requests_total
histogram_quantile(0.95, chat_rag_compression_ratio_bucket)
rate(chat_rag_total_latency_ms_sum[5m]) / rate(chat_rag_total_latency_ms_count[5m])
sum(rate(chat_rag_requests_total[5m])) by (client_id)
- MetricsService: Defines and records Prometheus metrics
- LoggerService: Integrates with MetricsService, automatically reporting metrics during log processing
- MetricsHandler: Provides the
/metricsHTTP endpoint
- Initialize
MetricsServiceinServiceContext - Inject
MetricsServiceintoLoggerService - Call
metricsService.RecordChatLog()inLoggerService.processLogs()after successfuluploadToLoki - Expose metrics to Prometheus via
/metricsendpoint
- Performance Impact: Metrics collection has minimal performance impact, but monitor memory usage in high-concurrency scenarios
- Label Cardinality: Avoid high-cardinality labels (e.g., request_id) to prevent memory leaks
- Data Retention: Prometheus defaults to 15-day retention (configurable)
- Security: Implement access control for
/metricsendpoint in production
-
Metrics Not Updating
- Verify LoggerService is running
- Check if log files are being processed correctly
- Confirm successful Loki upload
-
High Memory Usage
- Check for high label cardinality
- Consider reducing number of histogram buckets
-
Prometheus Scrape Failure
- Verify service port
- Check firewall settings
- Confirm
/metricsendpoint is accessible