The health monitoring system provides automated monitoring of the Discord bot and sends alerts when issues are detected.
- Automated Health Checks: Periodic monitoring of bot status, database connectivity, memory usage, and websocket connection
- Discord Webhook Alerts: Sends alerts to Discord when health checks fail
- Log Attachments: Includes the last 100 log lines with alerts to help diagnose issues
- Alert Cooldown: Prevents spam by limiting alert frequency
- Recovery Notifications: Notifies when the bot recovers from health issues
- Docker Health Checks: Integrated with Docker for container orchestration
Add the following to your dev.env and prod.env files:
# Health Monitoring Configuration
HEALTH_WEBHOOK_URL=https://discord.com/api/webhooks/YOUR_WEBHOOK_ID/YOUR_WEBHOOK_TOKEN
HEALTH_CHECK_INTERVAL=60 # Check interval in seconds (default: 60)
HEALTH_ALERT_COOLDOWN=1800 # Alert cooldown in seconds (default: 1800 = 30 minutes)
HEALTH_LOG_LINES=100 # Number of log lines to include in alerts (default: 100)
HEALTH_MAX_FAILURES=3 # Number of failures before sending alert (default: 3)- Go to your Discord server settings
- Navigate to Integrations → Webhooks
- Click Create Webhook
- Configure the webhook:
- Name:
Drawbridge Health Monitor - Channel: Choose a channel for health alerts (e.g.,
#bot-statusor#alerts) - Avatar: Optional bot avatar
- Name:
- Copy the webhook URL
- Add the URL to your environment file as
HEALTH_WEBHOOK_URL
Run the test script to verify your webhook configuration:
python test_health_webhook.pyThis will send a test message to your Discord channel to confirm the webhook is working.
Access the health status via HTTP:
GET /api/health
Response (200 = healthy, 503 = unhealthy):
{
"status": "healthy",
"timestamp": "2025-10-16T12:00:00",
"uptime_seconds": 3600,
"consecutive_failures": 0,
"metrics": {
"bot_ready": true,
"database_connected": true,
"websocket_connected": true,
"memory_usage_mb": 45.2,
"uptime_seconds": 3600,
"last_command_time": "2025-10-16T11:59:30"
}
}The Docker container includes an automated health check that runs every 30 seconds:
HEALTHCHECK --interval=30s --timeout=10s --start-period=30s --retries=3 \
CMD python healthcheck.py || exit 1The health monitoring system tracks:
- Bot Ready State: Whether the Discord bot is logged in and ready
- WebSocket Connection: Status of the Discord gateway connection
- Database Connectivity: Ability to query the database
- Memory Usage: Current memory consumption (requires
psutil) - Command Activity: Timestamp of last command execution
- Heartbeat: Regular heartbeat updates from bot events
Alerts are triggered when:
- Bot is not ready or websocket is disconnected
- Database queries fail
- Memory usage exceeds 1GB
- No heartbeat received for 2× the check interval
- Consecutive failures exceed the configured threshold
Health alerts include:
- Severity: Color-coded embed (red for alerts, green for recovery)
- Issue Details: List of specific problems detected
- System Metrics: Current bot status and resource usage
- Log Attachment: Recent log entries to help diagnose issues
- Developer Ping: Mentions the developer role for immediate attention
Modify HEALTH_CHECK_INTERVAL to change how often health checks run:
- Lower values (30-60s) provide faster detection but use more resources
- Higher values (120-300s) reduce overhead but delay problem detection
Adjust HEALTH_ALERT_COOLDOWN to control alert frequency:
- Shorter cooldowns provide more frequent updates during outages
- Longer cooldowns reduce notification spam but may delay important updates
Set HEALTH_MAX_FAILURES to control alert sensitivity:
- Lower values (1-2) trigger alerts quickly but may cause false alarms
- Higher values (3-5) reduce false alarms but delay legitimate alerts
- Verify the webhook URL is correct and accessible
- Check Discord server permissions for the webhook
- Run
python test_health_webhook.pyto test connectivity - Check the logs for webhook-related errors
- Increase
HEALTH_MAX_FAILURESto require more consecutive failures - Adjust
HEALTH_CHECK_INTERVALif checks are too frequent - Review system resources if memory alerts are frequent
- Check that
HEALTH_WEBHOOK_URLis configured - Verify the webhook channel is accessible
- Check alert cooldown hasn't suppressed recent alerts
- Review logs for health monitoring errors
Example docker-compose.yml health check configuration:
version: '3.8'
services:
drawbridge:
build: .
environment:
- HEALTH_WEBHOOK_URL=https://discord.com/api/webhooks/...
healthcheck:
test: ["CMD", "python", "healthcheck.py"]
interval: 30s
timeout: 10s
retries: 3
start_period: 30s