This guide covers common issues and their solutions.
Symptoms: Containers exit immediately or show "Exited (1)" status
Check:
# Check container status
make ps
# Check logs for specific service
make logs SERVICE=prometheus
make logs SERVICE=rapl-exporterCommon causes:
- Port already in use
- Configuration file syntax error
- Missing dependencies
- Permission issues
Solutions:
# Restart specific service
docker compose restart <service>
# Rebuild and restart everything
make down
make up-buildSymptoms: Error like "bind: address already in use"
Check:
# Find what's using the port
sudo lsof -i :9090 # Replace with your portSolutions:
- Stop the conflicting service
- Change the port in
docker-compose.yml
Symptoms: Build failures, containers crashing
Check:
# Check disk usage
df -h
du -sh test_results/
docker system dfSolutions:
# Clean old test results
rm test_results/test_results_2023*.json
# Clean Docker cache
docker system prune -a
# Clean unused volumes
docker volume pruneCause: RAPL interface not available or not accessible
Verify RAPL availability:
# Check if RAPL interface exists
ls -la /sys/class/powercap/intel-rapl:0/
# Check energy counter
cat /sys/class/powercap/intel-rapl:0/energy_ujSolutions:
-
Intel CPU check:
cat /proc/cpuinfo | grep "model name" # RAPL requires Intel Sandy Bridge (2011) or newer
-
Grant permissions:
sudo chmod -R a+r /sys/class/powercap/
-
Load kernel module:
sudo modprobe intel_rapl_msr lsmod | grep rapl
Check:
# Test exporter directly
curl http://localhost:9500/metrics | grep rapl_power_watts
# Check VictoriaMetrics targets
# Open http://localhost:8428/targetsSolutions:
- Verify container has privileged access
- Check
/sys/class/powercapis mounted correctly - Restart rapl-exporter:
docker compose restart rapl-exporter
Check VictoriaMetrics targets: http://localhost:8428/targets
Common causes:
- Exporter container not running
- Network connectivity issue
- Wrong port/URL in victoriametrics.yml
Solutions:
# Check if exporter is running
docker compose ps | grep exporter
# Test exporter directly
curl http://localhost:9500/metrics
# Check container network
docker compose exec prometheus ping rapl-exporter
# Restart Prometheus
docker compose restart prometheusCheck:
-
VictoriaMetrics UI: http://localhost:8428
- Query:
rapl_power_watts - Should return data
- Query:
-
Grafana datasource:
- Settings → Data Sources → VictoriaMetrics
- Click "Test" - should show "Data source is working"
Solutions:
# Restart Grafana
docker compose restart grafana
# Check Grafana logs
make logs SERVICE=grafanaSymptoms: "out of disk space", "too many samples"
Check:
docker exec prometheus du -sh /prometheusSolutions:
- Reduce retention time in docker-compose.yml
- Increase disk space
- Delete old data:
docker compose down && docker volume rm ffmpeg-rtmp_prometheus-data && docker compose up -d
Symptoms:
Connection refused
Failed to open rtmp://localhost:1935/live
Check:
# Check nginx-rtmp is running
docker compose ps nginx-rtmp
# Check nginx status endpoint
curl http://localhost:8080/stat
# Check RTMP port
nc -zv localhost 1935Solutions:
# Restart nginx-rtmp
docker compose restart nginx-rtmp
# Check nginx logs
make logs SERVICE=nginx-rtmp
# Wait for health check to pass
docker compose ps nginx-rtmp # Should show "healthy"Symptoms: Grafana dashboards show "No data"
Check:
# Verify test results exist
ls -la test_results/
# Check results-exporter
curl http://localhost:9502/metrics | grep results_
# Check results-exporter logs
make logs SERVICE=results-exporterSolutions:
# Ensure directory exists and has proper permissions
mkdir -p test_results
chmod 755 test_results
# Restart results-exporter
docker compose restart results-exporter
# Wait 30 seconds for next scrape, then check Grafana"FFmpeg not found":
# Install FFmpeg
# Ubuntu/Debian:
sudo apt-get update && sudo apt-get install ffmpeg
# macOS:
brew install ffmpeg"Permission denied" on scripts:
chmod +x scripts/*.py scripts/*.shPython import errors:
# Install dependencies
pip install -r requirements.txtSymptoms: dcgm-exporter shows "Exited (1)"
Requirements check:
# Check NVIDIA driver
nvidia-smi
# Check nvidia-container-toolkit
docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smiSolutions:
-
Install nvidia-container-toolkit:
# Ubuntu/Debian distribution=$(. /etc/os-release;echo $ID$VERSION_ID) curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list sudo apt-get update && sudo apt-get install -y nvidia-docker2 sudo systemctl restart docker
-
Start with NVIDIA profile:
make nvidia-up-build
Symptoms: analyze_results.py reports no results
Check:
ls -la test_results/Solutions:
- Run a test first:
python3 scripts/run_tests.py single --name test --bitrate 1000k --duration 60 - Specify results file explicitly:
python3 scripts/analyze_results.py test_results/test_results_*.json
Symptoms: Prediction errors, "not enough data"
Requirements:
- At least 3 test scenarios for basic model
- At least 10 scenarios recommended for multivariate model
Solutions:
# Run batch test to generate more data
python3 scripts/run_tests.py batch --file batch_stress_matrix.json
# Retrain models
python3 scripts/retrain_models.py --results-dir ./test_resultsCheck resource usage:
docker statsSolutions:
-
Increase Docker resources (Docker Desktop → Settings → Resources)
- Recommended: 4+ CPU cores, 8+ GB RAM
-
Reduce concurrent services:
# Stop GPU monitoring if not needed docker compose stop dcgm-exporter
Check which container:
docker stats --no-stream | sort -k3 -hCommon culprits:
- cAdvisor (monitoring all containers)
- Prometheus (during large queries)
- FFmpeg tests (expected during tests)
Solutions:
- Reduce Prometheus scrape frequency in prometheus.yml
- Reduce cAdvisor collection interval
Check:
docker stats --format "table {{.Name}}\t{{.MemUsage}}"Solutions:
- Reduce Prometheus retention time
- Limit memory in docker-compose.yml:
services: prometheus: deploy: resources: limits: memory: 2G
Check:
# Test each service
curl -I http://localhost:3000 # Grafana
curl -I http://localhost:8428 # VictoriaMetrics
curl -I http://localhost:9093 # AlertmanagerSolutions:
- Check firewall:
sudo ufw status - Check if ports are exposed:
docker compose ps - Try localhost vs 127.0.0.1 vs machine IP
Check network:
docker compose exec prometheus ping nginx-rtmpSolutions:
# Recreate network
docker compose down
docker network prune
docker compose up -d# System info
uname -a
docker --version
docker compose version
# Container status
docker compose ps
# Logs (save to file)
docker compose logs > logs.txt
# Check disk space
df -h
docker system dfAdd to docker-compose.yml:
services:
prometheus:
command:
- "--log.level=debug"- Check the Architecture documentation to understand the system
- Review Getting Started Guide for setup steps
- Search existing issues: https://github.com/psantana5/ffmpeg-rtmp/issues
- Open a new issue with:
- Description of the problem
- Steps to reproduce
- Relevant logs
- System information