This document explains how to use the monitoring system in the KernelCI Dashboard project.
docker compose -f docker-compose.monitoring.yml up -d# Add monitoring configuration to .env.backend
echo "PROMETHEUS_METRICS_ENABLED=true" >> .env.backend
echo "PROMETHEUS_METRICS_PORT=8001" >> .env.backend
echo "PROMETHEUS_MULTIPROC_DIR=/tmp/prometheus_multiproc_dir" >> .env.backend
# Start the backend with monitoring enabled
docker compose up -d backend
# The backend will automatically expose port 8001 for metrics when PROMETHEUS_METRICS_ENABLED=trueNote: For detailed backend setup instructions, see backend/README.md
# Enable dedicated metrics server (default: False)
export PROMETHEUS_METRICS_ENABLED=True
# Set custom metrics port (default: 8001)
# IMPORTANT: This port must match the port in monitoring/prometheus.yml
export PROMETHEUS_METRICS_PORT=8001
# The metrics server will automatically start on the specified port
# Access metrics at: http://localhost:8001/metrics/
# Use 0.0.0.0:8000 to allow connections from Docker containers (Prometheus)
# This binds to all IPv4 addresses, enabling Prometheus to scrape metrics
# --noreload is required because Django's auto-reloader conflicts with the dedicated metrics thread
poetry run python manage.py runserver 0.0.0.0:8000 --noreload- Go to http://localhost:3000
- Login: admin / admin
- Add data source
- Select "Prometheus". URL:
http://prometheus:9090 - Import Dashboard by JSON File
- Select:
monitoring/dashboard.jsonfor API metrics - Select:
monitoring/aggregation_process.jsonfor Aggregation Process metrics
- Prometheus: http://localhost:9090 (show targets)
- Grafana: http://localhost:3000 (show your dashboard)
- Metrics: http://localhost:8001/metrics/ (show raw metrics)
After importing the dashboard, you'll have:
- Average Response Time by Endpoint - Shows response time per endpoint
- Total Calls by Endpoint - Shows total requests per endpoint
- Endpoint Performance Summary - Table with:
- Method (GET, POST, etc.)
- Endpoint name
- Total Calls
- Average Response Time
- Total Time (cumulative time per endpoint)
This dashboard provides visibility into the process_pending_aggregations command:
- Records Written Rate: Rate of records written to
tree_listing,hardware_status, andprocessed_itemstables. - Health Status: Time since the last successful batch processing (alerts if > 5 minutes).
- Batch Duration Percentiles: p50, p95, and p99 duration of batch processing.
- Error Rate: Rate of errors encountered during processing.
The monitoring system supports multi-worker Gunicorn deployments using Prometheus' multiprocess mode.
- Each Gunicorn worker writes metrics to shared files in a designated directory (
PROMETHEUS_MULTIPROC_DIR). - A separate process (
utils/prometheus_aggregator.py) reads these files and exposes aggregated metrics via HTTP.
PROMETHEUS_METRICS_ENABLED: Set totrueto enable Prometheus metrics (default:false)PROMETHEUS_METRICS_PORT: Port for the metrics aggregator (default:8001)PROMETHEUS_MULTIPROC_DIR: Directory for multiprocess metric files (default:/tmp/prometheus_multiproc_dir)
- Target:
host.docker.internal:8001(backend running locally) - Metrics Path:
/metrics/ - Scrape Interval: 15 seconds