The backup scripts include comprehensive safety checks and error handling to prevent common failure scenarios.
- Prevents multiple backup jobs from running simultaneously
- Uses system-level file locks (
flock) - Automatically releases lock on exit/crash
- Automated backup:
/var/run/docker-stack-backup.lock - Manual backup:
/var/run/docker-stack-backup-manual.lock
First instance: Acquires lock → Runs backup
Second instance: Detects lock → Exits with error message
If a backup crashes and leaves a stale lock:
# Check if backup is actually running
ps aux | grep docker-stack-backup
# If not running, remove lock file
sudo rm /var/run/docker-stack-backup.lockPrevents:
- Corrupted backups from simultaneous writes
- Multiple stacks stopped at once
- Race conditions on shared resources
- Backup destination conflicts
- Verifies Docker service is running
- Tests Docker API responsiveness
- Prevents operations if Docker is down
Failure Example:
[ERROR] Docker daemon is not running
[ERROR] Start with: systemctl start docker
- Dockhand directory:
/path/to/dockhand/hostname/ - Appdata directory:
/mnt/datastor/appdata/
Failure Example:
[ERROR] Dockhand directory not found: /opt/dockhand/debian-docker
- Directory exists or can be created
- Write permissions verified
- Mount point check (if applicable)
Failure Example:
[ERROR] Cannot write to /mnt/backup/docker-backups
[ERROR] Check permissions and mount status
- Requires at least 5GB free by default
- Checks actual available space
- Prevents mid-backup disk full errors
Failure Example:
[ERROR] Insufficient disk space on /mnt/backup/docker-backups
[ERROR] Available: 2GB, Required: 5GB
Automated Backup:
=========================================
Running pre-flight checks...
=========================================
✓ Docker daemon is running
✓ Dockhand directory exists: /opt/dockhand/debian-docker
✓ Appdata directory exists: /mnt/datastor/appdata
✓ /mnt/backup/docker-backups is accessible (local filesystem)
✓ Disk space: 250GB available on /mnt/backup/docker-backups
=========================================
✓ All pre-flight checks passed
=========================================
Manual Backup:
Running pre-flight checks...
✓ Docker daemon running
✓ Dockhand directory found
✓ Appdata directory found
✓ Backup destination writable
✓ Disk space: 250GB available
✓ All checks passed
Edit the script to change minimum required space:
# In check_disk_space function call
check_disk_space "$BACKUP_DEST" 10 # Require 10GB instead of 5GBWhen containers fail to start after backup:
- Attempt 1: Try to start containers
- Wait 5 seconds (configurable)
- Attempt 2: Retry
- Wait 5 seconds
- Attempt 3: Final attempt
- Verify: Check containers actually started
MAX_RESTART_ATTEMPTS=3 # Number of retry attempts
RESTART_RETRY_DELAY=5 # Seconds between retries- Containers start successfully (not just command succeeds)
- Expected number of containers are running
- Status checked 2 seconds after start
If restart fails after all retries:
- Logs detailed error:
[ERROR] Failed to restart stack after 3 attempts
[ERROR] Stack: plex
[ERROR] Containers that should be running: plex
[ERROR] Manual intervention required!
[ERROR] To restart manually: cd /path/to/stack && docker compose up -d plex
- Sends CRITICAL notification (regardless of normal notification settings):
CRITICAL: Stack Failed to Restart - debian-docker
⚠️ IMMEDIATE ACTION REQUIRED ⚠️
Stack: plex
Host: debian-docker
Containers: plex
Manual restart command:
cd /opt/dockhand/debian-docker/plex && docker compose up -d plex
Check logs:
/var/log/docker-backup.log
Notification Priority:
- Ntfy:
urgentpriority (with warning emoji) - Pushover: Priority
1(high) - Email: Subject line marked
[CRITICAL]
Interactive feedback during retries:
└─ Starting containers (attempt 1/3)...
⚠️ Container failed to start
└─ Waiting 5s before retry...
└─ Starting containers (attempt 2/3)...
✓ All containers started
Port Conflicts:
- Another service claimed the port while stack was down
- Solution: Stop conflicting service, restart stack
Dependency Issues:
- Dependent stack/service not available
- Solution: Start dependencies first
Resource Exhaustion:
- Out of memory/CPU
- Solution: Free resources, restart stack
Configuration Errors:
- Compose file or env file issues
- Solution: Check Docker logs, fix config
If restart fails, manually restart the stack:
# Navigate to stack directory
cd /opt/dockhand/debian-docker/plex
# Check what's wrong
docker compose logs
# Try starting again
docker compose up -d
# Or restart specific services
docker compose up -d plex| Check | Action on Failure |
|---|---|
| Another backup running | Exit immediately |
| Docker not running | Exit with error |
| Paths missing | Exit with error |
| No disk space | Exit with error |
| Cannot write to destination | Exit with error |
| Issue | Action |
|---|---|
| Stack has no appdata | Skip, log warning |
| Appdata directory missing | Skip, log warning |
| Failed to stop stack | Abort backup, log error |
| Failed to create backup | Abort, try to restart stack, log error |
| Issue | Action |
|---|---|
| Restart fails (attempt 1) | Wait, retry |
| Restart fails (attempt 2) | Wait, retry |
| Restart fails (attempt 3) | Log critical error, send urgent notification |
-
Review logs regularly:
tail -100 /var/log/docker-backup.log
-
Check for failed restarts:
grep "Failed to restart" /var/log/docker-backup.log -
Verify disk space trends:
df -h /mnt/backup/docker-backups
- Test notifications - Ensure critical alerts actually reach you
- Check lock files - Remove stale locks if needed
- Monitor disk usage - Clean old backups before space runs out
- Test restarts - Verify stacks can actually restart
- Schedule backups during low-usage hours - Minimize impact of downtime
- Stagger backups - Don't backup all hosts at 2 AM
- Increase disk space threshold - For large backup volumes
- Document manual recovery - So you know what to do at 3 AM
# Check if process is actually running
ps aux | grep docker-stack-backup
# If not, manually remove lock
sudo rm /var/run/docker-stack-backup.lock# Check Docker
sudo systemctl status docker
sudo docker info
# Check paths
ls -la /opt/dockhand/$(hostname)
ls -la /mnt/datastor/appdata
# Check disk space
df -h /mnt/backup/docker-backups
# Check write permissions
sudo touch /mnt/backup/docker-backups/test
sudo rm /mnt/backup/docker-backups/test# Check Docker logs
cd /path/to/stack
docker compose logs
# Check for port conflicts
sudo netstat -tlnp | grep <port>
# Check resource usage
docker stats
# Try starting with verbose output
docker compose up -d --verbose# Check notification settings
grep "ENABLED" /usr/local/bin/docker-stack-backup.sh
# Test notification manually
curl -d "Test" https://ntfy.sh/your-topic
# Check logs for notification errors
grep -i "notification" /var/log/docker-backup.log# Require lots of free space
check_disk_space "$BACKUP_DEST" 50 # 50GB minimum
# More restart attempts
MAX_RESTART_ATTEMPTS=5
RESTART_RETRY_DELAY=10 # Wait longer between attempts# Less free space required
check_disk_space "$BACKUP_DEST" 2 # 2GB minimum
# Fewer attempts, faster retries
MAX_RESTART_ATTEMPTS=2
RESTART_RETRY_DELAY=2# Default settings are good
MAX_RESTART_ATTEMPTS=3
RESTART_RETRY_DELAY=5
check_disk_space "$BACKUP_DEST" 5Three layers of protection:
- Pre-Flight Checks - Verify environment before starting
- File Locking - Prevent concurrent operations
- Retry Logic - Recover from transient failures
Result: Robust, reliable backups with minimal chance of leaving systems in a broken state.