This Python script monitors Docker containers and sends status reports to Slack with real-time alerting capabilities.
- π³ Comprehensive Container Monitoring: Track all containers with detailed metrics
- β‘ Real-time Monitoring: Instant alerts when containers go down, restart, or change status
- π Performance Analytics: CPU, memory, network, and disk I/O statistics
- π Rich Slack Integration: Beautiful formatted notifications with status indicators
- π Advanced Restart Detection: Detects both manual and automatic container restarts
- π Scheduled Reports: Daily summary reports at configured times
- βοΈ Flexible Configuration: Environment-based configuration with sensible defaults
- π Multiple Execution Modes: One-time, scheduled, continuous, or real-time monitoring
- π§ͺ Built-in Testing: Connection testing and validation tools
- π― Container Filtering: Regex-based container name filtering
This project follows clean architecture principles with proper separation of concerns:
docker-services-monitoring/
βββ docker_monitor/ # Main package
β βββ core/ # Core business logic
β β βββ docker_client.py # Thread-safe Docker daemon interaction
β β βββ docker_monitor.py # Main orchestrator for scheduled
β β βββ realtime_monitor.py # Real-time monitoring orchestrator
β β βββ state_tracker.py # Container state persistence and retrieval
β β βββ change_detector.py # State difference analysis and change
β β βββ notification_formatter.py # Message creation and formatting
β β βββ notification_manager.py # Notification coordination and
β β βββ cooldown_manager.py # Notification timing and rate limiting
β β βββ monitoring_thread.py # Background monitoring loop management
β βββ integrations/ # External service integrations
β β βββ slack.py # Slack notifications
β βββ utils/ # Utilities and helpers
β β βββ config.py # Configuration management
β β βββ formatters.py # Data formatting utilities
β β βββ logging_config.py # Logging setup
β βββ cli/ # Command-line interface
β β βββ main.py # CLI entry point
β βββ exceptions.py # Custom exception hierarchy
β βββ docker_monitor.py # Legacy compatibility module
βββ scripts/ # Executable scripts
β βββ run_monitor.py # Main execution script
βββ config/ # Configuration templates
β βββ env.example # Environment configuration template
βββ tests/ # Test suite
β βββ test_config.py # Configuration tests
β βββ test_restart_detection.py # Restart detection tests
β βββ test_slack_integration.py # Slack integration tests
β βββ test_threading.py # Threading safety tests
βββ docker-compose.yml # Docker Compose configuration
βββ Dockerfile # Docker image definition
βββ setup.sh # Universal setup script
βββ requirements.txt # Python dependencies
π State Management:
StateTracker
: Manages container state persistence, retrieval, and historical trackingChangeDetector
: Analyzes state differences and classifies change types (start/stop/restart)
π Notification System:
NotificationFormatter
: Creates and formats notification messages for different event typesNotificationManager
: Coordinates notification delivery and handles business logicCooldownManager
: Manages notification timing, rate limiting, and prevents spam
π Monitoring Engine:
MonitoringThread
: Handles background monitoring loops with proper thread managementRealTimeMonitor
: Orchestrates real-time monitoring componentsDockerMonitor
: Orchestrates scheduled monitoring workflows
π³ Docker Integration:
DockerClient
: Thread-safe Docker daemon interaction with connection pooling
- Continuous container monitoring every 10 seconds (configurable)
- Instant Slack alerts for container status changes
- Smart restart detection distinguishing manual vs automatic restarts
- Thread-safe operations with proper locking and resource cleanup
π¨ Critical Alerts:
- Container failures (
running
βexited
/stopped
/dead
) - Unexpected container removal
- Health check failures
- Container restart events
- Status transitions
# Real-time monitoring with immediate alerts
docker compose --profile realtime up -d docker-monitor-realtime
# Custom check interval (seconds)
python3 scripts/run_monitor.py --realtime 15
# Combined with daily reports
docker compose --profile realtime up -d # Runs both services
The system automatically detects:
- Manual restarts:
docker restart <container>
commands - Automatic restarts: Docker policy-based restarts (on-failure, unless-stopped)
- Failed restarts: When containers don't come back up
π¨ Container Status Alert - CRITICAL
Container: nginx-web
Status Change: running β exited
Time: 2024-01-15 14:23:45
π Container Restart Detected
Container: api-service
Type: Automatic restart
Status: running β
Get monitoring running in minutes:
# 1. Clone the project
git clone <repo> docker-services-monitoring
cd docker-services-monitoring
# 2. Run the universal setup
./setup.sh
The setup script automatically handles:
- β Environment Detection - Works on local dev, cloud VMs, production servers
- β Docker Installation - Installs Docker if missing
- β Configuration Setup - Guides through Slack webhook setup with validation
- β Container Deployment - Builds and deploys with restart policies
- β Testing - Verifies monitoring and Slack integration work
# 1. Install dependencies
pip install -r requirements.txt
# 2. Configure environment
cp config/env.example .env
# Edit .env with your Slack webhook URL
# 3. Choose your monitoring approach:
# Real-time monitoring (recommended for production)
docker compose --profile realtime up -d
# Daily reports only
docker compose up -d docker-monitor
# Test the setup
python3 scripts/run_monitor.py --test
Before running the monitoring system, you need a Slack webhook URL to receive notifications.
Step 1: Create a Slack App
- Go to https://api.slack.com/apps
- Click the big green "Create New App" button
- Select "From scratch"
- Enter app name:
Docker Monitor
- Choose your Slack workspace from dropdown
- Click "Create App"
Step 2: Enable Incoming Webhooks
- In your new app's settings, find "Incoming Webhooks" in the left menu
- Click the toggle switch to turn it ON (it should turn green)
- Click the "Add New Webhook to Workspace" button
Step 3: Choose Channel & Authorize
- Select the channel where you want alerts (create
#docker-alerts
if needed) - Click "Allow" to give the app permission
Step 4: Copy Your Webhook URL
- You'll see a webhook URL that looks like this:
https://hooks.slack.com/services/T1234567890/B1234567890/abcdefghijklmnopqrstuvwx
- Copy this entire URL - you'll need it for configuration
Once you have the webhook URL, test it:
# Test with curl
curl -X POST -H 'Content-type: application/json' \
--data '{"text":"π§ͺ Docker Monitor Test - Webhook is working!"}' \
YOUR_WEBHOOK_URL
# Or use the built-in test
python3 scripts/run_monitor.py --test
Add your webhook URL to the .env
file:
# In your .env file
SLACK_WEBHOOK_URL=https://hooks.slack.com/services/T00000000/B00000000/XXXXXXXXXXXXXXXXXXXXXXXX
π‘ Security Tip: Never commit webhook URLs to version control. Always use environment variables or .env
files (which should be in .gitignore
).
The project includes different types of test files:
These require pytest to run:
# Install pytest if not already installed
pip install pytest
# Run pytest-based test files
python3 -m pytest tests/test_config.py -v # Configuration tests
python3 -m pytest tests/test_slack_integration.py -v # Slack integration tests
These can be run directly with Python:
# Test restart detection functionality
python3 tests/test_restart_detection.py
# Test threading safety improvements
python3 tests/test_threading.py
Test the complete monitoring pipeline:
# Test Docker connection and basic monitoring
python3 scripts/run_monitor.py --test
# Test Slack webhook integration
python3 scripts/run_monitor.py --test-notification
# Test inside Docker container
docker-compose exec docker-monitor python3 scripts/run_monitor.py --test
β― python3 -m pytest tests/test_config.py -v
========================================= test session starts ==========================================
collected 5 items
tests/test_config.py::TestConfig::test_config_initialization_with_required_env PASSED [ 20%]
tests/test_config.py::TestConfig::test_config_missing_required_env_raises_error PASSED [ 40%]
tests/test_config.py::TestConfig::test_default_values PASSED [ 60%]
tests/test_config.py::TestConfig::test_custom_values PASSED [ 80%]
tests/test_config.py::TestConfig::test_get_all_returns_dict PASSED [100%]
========================================== 5 passed in 0.09s ===========================================
Note: The test files use pytest framework and must be run with python3 -m pytest
rather than direct Python execution.
The setup script offers three monitoring modes to suit different needs:
- β Best for: Most users, development environments, regular health checks
- π Frequency: Daily reports at specified time (default: 9:00 AM)
- π¬ Notifications: Comprehensive daily status reports
- π Resource Usage: Minimal - only runs once per day
# Runs daily at 9 AM
docker compose up -d docker-monitor
- β Best for: Production environments, critical services, immediate alerts
- β‘ Frequency: Continuous monitoring every 10 seconds
- π¨ Notifications: Immediate alerts when containers go down, restart, or fail
- π Resource Usage: Low - efficient state change detection
# Real-time monitoring with immediate alerts
docker compose --profile realtime up -d docker-monitor-realtime
- β Best for: Comprehensive monitoring
- π Combines: Daily reports + immediate failure alerts
- πͺ Coverage: Complete monitoring solution
- π Resource Usage: Moderate - runs both services
# Run both scheduled and real-time monitoring
docker compose --profile realtime up -d
When using real-time monitoring, you'll receive immediate Slack notifications for:
Critical Alerts (π¨):
- Container goes from
running
βexited
- Container goes from
running
βstopped
- Container goes from
running
βdead
- Container is unexpectedly removed
- Container restart fails (container doesn't come back up)
Warning Alerts (
- Container status becomes
restarting
- Container goes from
healthy
βunhealthy
- Container restarts successfully (manual or automatic)
Restart Detection: The system automatically detects and notifies about:
- π Manual Restarts: When someone runs
docker restart <container>
- π Automatic Restarts: When Docker restarts a container due to restart policies
- π¨ Failed Restarts: When restart attempts fail and container doesn't recover
Sample Real-time Alerts:
Container Failure:
π¨ Container Status Alert - CRITICAL
Container: nginx-web
Status Change: running β exited
Image: nginx:latest
Time: 2024-01-15 14:23:45
Ports: 80β80/tcp, 443β443/tcp
Container Restart:
π¨ Container Removed - CRITICAL
Container: nginx-web
Previous Status: running
Time: 2024-01-15 14:25:10
βΉοΈ Container Added
Container: nginx-web
Status: running
Image: nginx:latest
Time: 2024-01-15 14:25:15
Health Check Failure:
β οΈ Container Status Alert - WARNING
Container: api-server
Status Change: running β unhealthy
Image: myapp:latest
Time: 2024-01-15 14:30:22
The monitoring system automatically detects:
- Manual restarts:
docker restart <container>
commands - Automatic restarts: Docker policy-based restarts (on-failure, unless-stopped)
- Failed restarts: When containers don't come back up
Get monitoring running in minutes:
# 1. Clone the project
git clone <repo> docker-services-monitoring
cd docker-services-monitoring
# 2. Run the universal setup
./setup.sh
The setup script automatically handles:
- β Environment Detection - Works on local dev, cloud VMs, production servers
- β Docker Installation - Installs Docker if missing
- β Configuration Setup - Guides through Slack webhook setup with validation
- β Container Deployment - Builds and deploys with restart policies
- β Testing - Verifies monitoring and Slack integration work
# 1. Install dependencies
pip install -r requirements.txt
# 2. Configure environment
cp config/env.example .env
# Edit .env with your Slack webhook URL
# 3. Choose your monitoring approach:
# Real-time monitoring (recommended for production)
docker compose --profile realtime up -d
# Daily reports only
docker compose up -d docker-monitor
# Test the setup
python3 scripts/run_monitor.py --test
The setup script offers three monitoring modes to suit different needs:
- β Best for: Most users, development environments, regular health checks
- π Frequency: Daily reports at specified time (default: 9:00 AM)
- π¬ Notifications: Comprehensive daily status reports
- π Resource Usage: Minimal - only runs once per day
# Runs daily at 9 AM
docker compose up -d docker-monitor
- β Best for: Production environments, critical services, immediate alerts
- β‘ Frequency: Continuous monitoring every 10 seconds
- π¨ Notifications: Immediate alerts when containers go down, restart, or fail
- π Resource Usage: Low - efficient state change detection
# Real-time monitoring with immediate alerts
docker compose --profile realtime up -d docker-monitor-realtime
- β Best for: Comprehensive monitoring
- π Combines: Daily reports + immediate failure alerts
- πͺ Coverage: Complete monitoring solution
- π Resource Usage: Moderate - runs both services
# Run both scheduled and real-time monitoring
docker compose --profile realtime up -d
When using real-time monitoring, you'll receive immediate Slack notifications for:
Critical Alerts (π¨):
- Container goes from
running
βexited
- Container goes from
running
βstopped
- Container goes from
running
βdead
- Container is unexpectedly removed
- Container restart fails (container doesn't come back up)
Warning Alerts (
- Container status becomes
restarting
- Container goes fromhealthy
βunhealthy
- Container restarts successfully (manual or automatic)
Restart Detection: The system automatically detects and notifies about:
- π Manual Restarts: When someone runs
docker restart <container>
- π Automatic Restarts: When Docker restarts a container due to restart policies
- π¨ Failed Restarts: When restart attempts fail and container doesn't recover
Sample Real-time Alerts:
Container Failure:
π¨ Container Status Alert - CRITICAL
Container: nginx-web
Status Change: running β exited
Image: nginx:latest
Time: 2024-01-15 14:23:45
Ports: 80β80/tcp, 443β443/tcp
Container Restart:
π¨ Container Removed - CRITICAL
Container: nginx-web
Previous Status: running
Time: 2024-01-15 14:25:10
βΉοΈ Container Added
Container: nginx-web
Status: running
Image: nginx:latest
Time: 2024-01-15 14:25:15
Health Check Failure:
β οΈ Container Status Alert - WARNING
Container: api-server
Status Change: running β unhealthy
Image: myapp:latest
Time: 2024-01-15 14:30:22
Variable | Default | Description |
---|---|---|
SLACK_WEBHOOK_URL |
Required | Slack incoming webhook URL |
DAILY_CHECK_TIME |
09:00 |
Daily check time (HH:MM format) |
REALTIME_CHECK_INTERVAL |
10 |
Real-time monitoring interval (seconds) |
LOG_LEVEL |
INFO |
Logging level (DEBUG, INFO, WARNING, ERROR) |
DOCKER_SOCKET |
unix://var/run/docker.sock |
Docker daemon socket |
NOTIFICATION_ENABLED |
true |
Enable/disable Slack notifications |
INCLUDE_STOPPED_CONTAINERS |
true |
Include stopped containers in reports |
CONTAINER_NAME_FILTER |
- | Regex pattern to filter container names |
TIMEZONE |
UTC |
Timezone for scheduling |
# Edit crontab
crontab -e
# Add this line for daily 9 AM reports:
0 9 * * * cd $HOME/docker-services-monitoring && python3 scripts/run_monitor.py --once
Benefits of this approach:
- β Works on any system with any username
- β
Uses environment variable
$HOME
- β Easy to deploy across different servers
# Every day at 8:30 AM
30 8 * * * cd $HOME/docker-services-monitoring && python3 scripts/run_monitor.py --once
# Every Monday at 9 AM
0 9 * * 1 cd $HOME/docker-services-monitoring && python3 scripts/run_monitor.py --once
# Every 6 hours
0 */6 * * * cd $HOME/docker-services-monitoring && python3 scripts/run_monitor.py --once
# Twice daily: 9 AM and 6 PM
0 9,18 * * * cd $HOME/docker-services-monitoring && python3 scripts/run_monitor.py --once
The easiest way to deploy in production is using Docker Compose with automatic restarts:
# Ensure you have your .env file configured
cp config/env.example .env
nano .env # Add your Slack webhook URL
# Create logs directory
mkdir -p logs
# Build and start the service
docker-compose up -d
# View logs
docker-compose logs -f docker-monitor
# Check status
docker-compose ps
# Stop the service
docker-compose down
# Restart the service
docker-compose restart docker-monitor
# Rebuild after code changes
docker-compose up -d --build
# View real-time logs
docker-compose logs -f docker-monitor
# Run one-time check
docker-compose exec docker-monitor python3 scripts/run_monitor.py --once
# Test notifications
docker-compose exec docker-monitor python3 scripts/run_monitor.py --test-notification
The Docker Compose setup includes:
- β
Automatic restarts with
restart: unless-stopped
- β Health checks to ensure service is running properly
- β Docker socket mounting for container monitoring
- β
Persistent logs in
./logs
directory - β
Environment variable support from
.env
file - β Isolated network for security
You can customize the deployment by editing docker-compose.yml
:
# Change the schedule or run mode
services:
docker-monitor:
# ... other config ...
command: ["python3", "scripts/run_monitor.py", "--continuous", "30"] # Every 30 minutes
# OR
command: ["python3", "scripts/run_monitor.py", "--once"] # Run once and exit