This script periodically checks all Prometheus exporters to ensure they are:
- Responding to requests
- Returning metrics
- Returning relevant/fresh data
- Single Check: Run once and exit
- Continuous Monitoring: Run at specified intervals
- Prometheus Exporter Mode: Expose health check results as Prometheus metrics
- Detailed Logging: See exactly what's happening with each exporter
- Smart Validation:
- Verifies expected metrics are present
- Checks that metrics contain actual data (not just empty responses)
- Validates metric freshness based on exporter type
python3 check_exporters_health.pypython3 check_exporters_health.py --interval 60python3 check_exporters_health.py --port 9600Then add to Prometheus scrape config:
scrape_configs:
- job_name: 'exporter-health'
static_configs:
- targets: ['exporter-health-checker:9600']python3 check_exporters_health.py --debugYou can run this script inside an existing container (e.g., Prometheus):
# Copy script to Prometheus container
docker cp check_exporters_health.py prometheus:/tmp/
# Execute inside container
docker exec prometheus python3 /tmp/check_exporters_health.pyOr add as a separate service in docker-compose.yml:
exporter-health-checker:
build:
context: .
dockerfile: Dockerfile.health-checker
container_name: exporter-health-checker
ports:
- "9600:9600"
restart: unless-stopped
networks:
- streaming-net
command: ["--port", "9600"]================================================================================
EXPORTER HEALTH SUMMARY
================================================================================
Exporter Status Metrics Samples Notes
--------------------------------------------------------------------------------
nginx-rtmp-exporter ✓ OK 5 12
rapl-exporter ✓ OK 2 8
docker-stats-exporter ✓ OK 4 15
node-exporter ✓ OK 850 2143
cadvisor ✓ OK 120 456
results-exporter ⚠ FAIL 0 0 No metrics found
qoe-exporter ✓ OK 3 9
cost-exporter ✓ OK 4 12
--------------------------------------------------------------------------------
Total: 7/8 healthy
================================================================================
When running in exporter mode, the following metrics are exposed:
# Health status (1=healthy, 0=unhealthy)
exporter_health_status{exporter="nginx-rtmp-exporter"} 1
# Reachability (1=reachable, 0=unreachable)
exporter_reachable{exporter="nginx-rtmp-exporter"} 1
# Number of unique metrics
exporter_metric_count{exporter="nginx-rtmp-exporter"} 5
# Number of samples
exporter_sample_count{exporter="nginx-rtmp-exporter"} 12
# Has data (1=yes, 0=no)
exporter_has_data{exporter="nginx-rtmp-exporter"} 1
The script checks the following exporters:
| Exporter | Port | Expected Metrics |
|---|---|---|
| nginx-rtmp-exporter | 9728 | nginx_rtmp_connections, nginx_rtmp_streams |
| rapl-exporter | 9500 | rapl_power_watts, rapl_energy_joules_total |
| docker-stats-exporter | 9501 | docker_engine_cpu_percent, docker_container_cpu_percent |
| node-exporter | 9100 | node_cpu_seconds_total, node_memory_MemAvailable_bytes |
| cadvisor | 8080 | container_cpu_usage_seconds_total, container_memory_usage_bytes |
| dcgm-exporter | 9400 | DCGM_FI_DEV_GPU_UTIL, DCGM_FI_DEV_POWER_USAGE |
| results-exporter | 9502 | scenario_, scenario_duration_seconds |
| qoe-exporter | 9503 | qoe_, quality_ |
| cost-exporter | 9504 | cost_exporter_alive, cost_total_load_aware, cost_energy_load_aware |
To add or modify exporters, edit the EXPORTERS list in check_exporters_health.py:
EXPORTERS = [
ExporterConfig(
name="my-custom-exporter",
url="http://my-exporter:9999/metrics",
job_name="my-exporter",
expected_metrics=["my_metric_1", "my_metric_2"],
data_freshness_seconds=30
),
# ... other exporters
]You can set up Prometheus alerts based on the health check metrics:
groups:
- name: exporter_health
rules:
- alert: ExporterUnhealthy
expr: exporter_health_status == 0
for: 5m
labels:
severity: warning
annotations:
summary: "Exporter {{ $labels.exporter }} is unhealthy"
description: "The exporter {{ $labels.exporter }} has been unhealthy for more than 5 minutes."
- alert: ExporterNoData
expr: exporter_has_data == 0
for: 10m
labels:
severity: warning
annotations:
summary: "Exporter {{ $labels.exporter }} has no data"
description: "The exporter {{ $labels.exporter }} is not returning any data."If an exporter shows as "Not reachable":
- Check if the container is running:
docker ps - Check container logs:
docker logs <container-name> - Verify network connectivity:
docker exec prometheus curl http://exporter:port/metrics
If an exporter shows "No metrics found":
- The exporter may still be initializing
- The exporter may need configuration or data to generate metrics
- Check exporter logs for errors
If expected metrics are missing:
- The exporter configuration may have changed
- The feature producing those metrics may be disabled
- Update the expected metrics list in the script