Monitoring and Metrics

This document explains how to use the monitoring system in the KernelCI Dashboard project.

Quick Start

1. Start Monitoring Services

docker compose -f docker-compose.monitoring.yml up -d

2. Start Your Backend

Option A: Using Docker Compose (Recommended)

# Add monitoring configuration to .env.backend
echo "PROMETHEUS_METRICS_ENABLED=true" >> .env.backend
echo "PROMETHEUS_METRICS_PORT=8001" >> .env.backend
echo "PROMETHEUS_MULTIPROC_DIR=/tmp/prometheus_multiproc_dir" >> .env.backend

# Start the backend with monitoring enabled
docker compose up -d backend

# The backend will automatically expose port 8001 for metrics when PROMETHEUS_METRICS_ENABLED=true

Note: For detailed backend setup instructions, see backend/README.md

Option B: Local Development

# Enable dedicated metrics server (default: False)
export PROMETHEUS_METRICS_ENABLED=True

# Set custom metrics port (default: 8001)
# IMPORTANT: This port must match the port in monitoring/prometheus.yml
export PROMETHEUS_METRICS_PORT=8001

# The metrics server will automatically start on the specified port
# Access metrics at: http://localhost:8001/metrics/
# Use 0.0.0.0:8000 to allow connections from Docker containers (Prometheus)
# This binds to all IPv4 addresses, enabling Prometheus to scrape metrics
# --noreload is required because Django's auto-reloader conflicts with the dedicated metrics thread
poetry run python manage.py runserver 0.0.0.0:8000 --noreload

3. Import Dashboard in Grafana

Go to http://localhost:3000
Login: admin / admin
Add data source
Select "Prometheus". URL: http://prometheus:9090
Import Dashboard by JSON File
Select: monitoring/dashboard.json for API metrics
Select: monitoring/aggregation_process.json for Aggregation Process metrics

4. Verify Everything Works

Prometheus: http://localhost:9090 (show targets)
Grafana: http://localhost:3000 (show your dashboard)
Metrics: http://localhost:8001/metrics/ (show raw metrics)

Dashboard Features

API Dashboard

After importing the dashboard, you'll have:

Average Response Time by Endpoint - Shows response time per endpoint
Total Calls by Endpoint - Shows total requests per endpoint
Endpoint Performance Summary - Table with:
- Method (GET, POST, etc.)
- Endpoint name
- Total Calls
- Average Response Time
- Total Time (cumulative time per endpoint)

Aggregation Process Dashboard

This dashboard provides visibility into the process_pending_aggregations command:

Records Written Rate: Rate of records written to tree_listing, hardware_status, and processed_items tables.
Health Status: Time since the last successful batch processing (alerts if > 5 minutes).
Batch Duration Percentiles: p50, p95, and p99 duration of batch processing.
Error Rate: Rate of errors encountered during processing.

Implementation Details

Multi-Worker Gunicorn Support

The monitoring system supports multi-worker Gunicorn deployments using Prometheus' multiprocess mode.

Each Gunicorn worker writes metrics to shared files in a designated directory (PROMETHEUS_MULTIPROC_DIR).
A separate process (utils/prometheus_aggregator.py) reads these files and exposes aggregated metrics via HTTP.

Configuration

Environment Variables

PROMETHEUS_METRICS_ENABLED: Set to true to enable Prometheus metrics (default: false)
PROMETHEUS_METRICS_PORT: Port for the metrics aggregator (default: 8001)
PROMETHEUS_MULTIPROC_DIR: Directory for multiprocess metric files (default: /tmp/prometheus_multiproc_dir)

`prometheus.yml`

Target: host.docker.internal:8001 (backend running locally)
Metrics Path: /metrics/
Scrape Interval: 15 seconds

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Monitoring and Metrics

Quick Start

1. Start Monitoring Services

2. Start Your Backend

Option A: Using Docker Compose (Recommended)

Option B: Local Development

3. Import Dashboard in Grafana

4. Verify Everything Works

Dashboard Features

API Dashboard

Aggregation Process Dashboard

Implementation Details

Multi-Worker Gunicorn Support

Configuration

Environment Variables

`prometheus.yml`

FilesExpand file tree

monitoring.md

Latest commit

History

monitoring.md

File metadata and controls

Monitoring and Metrics

Quick Start

1. Start Monitoring Services

2. Start Your Backend

Option A: Using Docker Compose (Recommended)

Option B: Local Development

3. Import Dashboard in Grafana

4. Verify Everything Works

Dashboard Features

API Dashboard

Aggregation Process Dashboard

Implementation Details

Multi-Worker Gunicorn Support

Configuration

Environment Variables

prometheus.yml

`prometheus.yml`