# Clone the repository
git clone <repo-url>
cd ffmpeg-rtmp
# Run the interactive deployment wizard
./deployment/deployment-wizard.sh
The wizard will guide you through:
- Deployment type - Master, worker, or both
- Environment - Development, staging, or production
- Pre-flight checks - System validation
- Configuration - TLS, database, resources
- Deployment - Automated installation
- Verification - Health checks
That's it! The wizard handles everything automatically.
# 1. Clone and build
git clone <repo-url>
cd ffmpeg-rtmp
make build-master build-agent build-cli
# 2. Run pre-flight checks
./deployment/checks/preflight-check.sh --master
# 3. Deploy master node
sudo ./deploy.sh --master --non-interactive
# 4. Verify deployment
./deployment/checks/health-check.sh --master
# 5. Test it
./bin/ffrtmp jobs submit --scenario test
./bin/ffrtmp jobs list
System will be running with:
- Master node on port 8080 (HTTPS)
- Worker agent registered and polling
- API authentication enabled
- Metrics on ports 9090-9091
- SQLite or PostgreSQL database
- Health checks passed
- Systemd services configured
Best for: First-time deployments, manual setup
./deployment/deployment-wizard.sh
Features:
- Step-by-step guided deployment
- Automatic system validation
- Configuration generation
- Health checks
- User-friendly prompts
Best for: Automated deployments, scripts, CI/CD
# Deploy master node
sudo ./deploy.sh --master --non-interactive
# Deploy worker node
sudo ./deploy.sh --worker \
--master-url https://master.example.com:8080 \
--api-key YOUR_API_KEY \
--worker-id worker-01
# Deploy both on same server
sudo ./deploy.sh --both --non-interactive
# With TLS certificate generation
sudo ./deploy.sh --master \
--generate-certs \
--master-ip 10.0.1.10 \
--master-host master.example.com
Options:
--master- Deploy master node--worker- Deploy worker node--both- Deploy both on single server--master-url URL- Master server URL (for workers)--api-key KEY- Master API key (for workers)--worker-id ID- Worker identifier--generate-certs- Generate self-signed TLS certificates--master-ip IP- Master server IP for certificates--master-host HOST- Master server hostname for certificates--non-interactive- Skip prompts (for automation)--skip-build- Use existing binaries
Best for: Production master updates with no downtime
# 1. Deploy new version to inactive environment
sudo ./deployment/orchestration/blue-green-deploy.sh \
--deploy --master --version v2.0.0
# 2. Test new version (it's not active yet)
# Health checks run automatically
# 3. Switch traffic to new version
sudo ./deployment/orchestration/blue-green-deploy.sh \
--switch --master
# 4. If something goes wrong, instant rollback
sudo ./deployment/orchestration/blue-green-deploy.sh \
--rollback --master
How it works:
- Maintains two parallel environments (blue and green)
- Deploys to inactive environment
- Tests before switching
- Symlink switch for instant activation
- Previous version ready for immediate rollback
Best for: Updating multiple workers safely
./deployment/orchestration/rolling-update.sh \
--workers worker1,worker2,worker3 \
--version v2.0.0 \
--master-url https://master.example.com:8080 \
--api-key YOUR_API_KEY \
--max-parallel 2 \
--ssh-user root
Process:
- Drains each worker (stops accepting new jobs)
- Waits for running jobs to complete
- Creates backup of current installation
- Deploys new version
- Runs health checks
- Activates worker
- Moves to next worker
Options:
--workers- Comma-separated list of worker hosts--version- Version tag--max-parallel- Update multiple workers simultaneously--drain-timeout- Seconds to wait for jobs (default: 300)--ssh-user- SSH user (default: root)--ssh-key- Path to SSH private key
Best for: Large-scale deployments, infrastructure as code
# Configure inventory
cd ansible/
cp inventory/production.ini.example inventory/production.ini
vim inventory/production.ini
# Deploy everything
ansible-playbook -i inventory/production.ini playbooks/site.yml
# Deploy only master
ansible-playbook -i inventory/production.ini playbooks/master.yml
# Deploy only workers
ansible-playbook -i inventory/production.ini playbooks/workers.yml
See ansible/ANSIBLE_GUIDE.md for detailed instructions.
Always run pre-flight checks before deployment:
# Check master node requirements
./deployment/checks/preflight-check.sh --master
# Check worker node requirements
./deployment/checks/preflight-check.sh --worker \
--master-url https://master.example.com:8080
Validates:
- Operating system compatibility (Ubuntu, Debian, Rocky, AlmaLinux)
- CPU cores (2+ for master, 2+ for worker)
- Memory (4GB+ for master, 8GB+ for worker)
- Disk space (20GB+ root, 10GB+ /var for master, 100GB+ for worker)
- Port availability (8080 for master, 1935 optional)
- Required commands (curl, wget, git, tar, gzip)
- Go version (1.24+)
- FFmpeg installation and codecs (workers)
- Cgroups v2 support
- Network connectivity and DNS
- Firewall configuration
- SELinux status (RHEL-based systems)
Validate configuration files before deployment:
# Validate master config
./deployment/checks/config-validator.sh /etc/ffrtmp-master/config.yaml
# Validate worker config
./deployment/checks/config-validator.sh /etc/ffrtmp/worker.env
# Validate watch daemon config
./deployment/checks/config-validator.sh /etc/ffrtmp/watch-config.yaml
Checks:
- YAML/ENV syntax correctness
- Required fields present
- Sensitive data configured
- File permissions secure (600/640)
- Type-specific validation
Verify deployment success with comprehensive health checks:
# Check master node health
./deployment/checks/health-check.sh --master
# Check worker node health
./deployment/checks/health-check.sh --worker \
--url https://master.example.com:8080 \
--api-key YOUR_API_KEY
Verifies:
- Service status (systemd)
- Port listening (8080 for master)
- HTTP endpoints responding
- API authentication working
- File and directory structure
- Configuration files present
- Disk space available
- Log files accessible
- No critical errors in logs
- Database connectivity (master)
- FFmpeg installation (workers)
- Cgroups v2 enabled (workers)
- Master connectivity (workers)
- Worker registration successful
Output:
═══════════════════════════════════════
Health Check Summary
═══════════════════════════════════════
Passed: 25
Warnings: 2
Failed: 0
Pre-configured templates for different environments:
deployment/configs/
├── master-dev.yaml # Development master
├── master-prod.yaml # Production master
├── worker-dev.env # Development worker
└── worker-prod.env # Production worker
Development:
- SQLite database
- Debug logging
- Relaxed rate limits
- TLS optional
- Local paths
Production:
- PostgreSQL database with SSL
- JSON structured logging
- Strict rate limits
- TLS required with client verification
- Monitoring enabled
- Backup configured
- Alert webhooks
# Copy and customize for your environment
sudo cp deployment/configs/master-prod.yaml /etc/ffrtmp-master/config.yaml
sudo vim /etc/ffrtmp-master/config.yaml
# Validate before use
./deployment/checks/config-validator.sh /etc/ffrtmp-master/config.yaml
# Deploy with custom config
sudo ./deploy.sh --master --config /etc/ffrtmp-master/config.yaml
Use Case: Development, testing, small deployments (<100 jobs/day)
# Interactive wizard (easiest)
./deployment/deployment-wizard.sh
# Or manual deployment
make build-master build-agent build-cli
sudo ./deploy.sh --both --non-interactive
./deployment/checks/health-check.sh --master
# Access
curl http://localhost:8080/health
./bin/ffrtmp jobs submit --scenario test
Pros: Simple, easy setup, single command
Cons: Single point of failure, limited scale
Use Case: Production workloads, high availability, horizontal scaling
# 1. Pre-flight checks
./deployment/checks/preflight-check.sh --master
# 2. Deploy master
sudo ./deploy.sh --master \
--generate-certs \
--master-ip 10.0.1.10 \
--master-host master.example.com \
--non-interactive
# 3. Health check
./deployment/checks/health-check.sh --master
# 4. Get API key
sudo cat /etc/ffrtmp-master/.api-key
# Deploy to multiple workers with rolling update
./deployment/orchestration/rolling-update.sh \
--workers worker1,worker2,worker3 \
--version v1.0.0 \
--master-url https://10.0.1.10:8080 \
--api-key <API_KEY_FROM_MASTER> \
--ssh-user root
# Or deploy individually
ssh worker1
sudo ./deploy.sh --worker \
--master-url https://10.0.1.10:8080 \
--api-key <API_KEY> \
--worker-id worker1
# Verify
./deployment/checks/health-check.sh --worker \
--url https://10.0.1.10:8080 \
--api-key <API_KEY>
Pros: Scalable, fault-tolerant, independent worker updates
Cons: Requires multiple servers, more complex setup
Use Case: Mission-critical deployments, zero-downtime updates
# Initial deployment
sudo ./deploy.sh --master --non-interactive
# Later: Deploy update with zero downtime
sudo ./deployment/orchestration/blue-green-deploy.sh \
--deploy --master --version v2.0.0
# Test in inactive environment
# (automatic health checks run)
# Switch when ready
sudo ./deployment/orchestration/blue-green-deploy.sh \
--switch --master
# Instant rollback if issues
sudo ./deployment/orchestration/blue-green-deploy.sh \
--rollback --master
Pros: Zero downtime, instant rollback, safe testing
Cons: Requires double disk space for environments
Use Case: Geographic distribution, 10+ servers, infrastructure as code
# 1. Configure inventory
cd ansible/
cp inventory/production.ini.example inventory/production.ini
# Edit with your servers:
# [master]
# master.us-east.example.com
#
# [workers]
# worker1.us-east.example.com
# worker2.us-east.example.com
# worker3.us-west.example.com
# 2. Configure variables
vim group_vars/all.yml # Global settings
vim group_vars/master.yml # Master-specific
vim group_vars/workers.yml # Worker-specific
# 3. Deploy everything
ansible-playbook -i inventory/production.ini playbooks/site.yml
# 4. Or deploy incrementally
ansible-playbook -i inventory/production.ini playbooks/master.yml
ansible-playbook -i inventory/production.ini playbooks/workers.yml --limit us-east
Pros: Repeatable, version controlled, multi-region support
Cons: Ansible knowledge required, initial setup complexity
See ansible/ANSIBLE_GUIDE.md for complete Ansible documentation.
# Generate self-signed certificates (development/testing)
sudo ./deployment/generate-certs.sh \
--type master \
--ip 10.0.1.10 \
--hostname master.example.com \
--output /etc/ffrtmp-master/certs
# Generate CA and client certificates (mTLS)
sudo ./deployment/generate-certs.sh \
--type ca \
--output /etc/ffrtmp-master/certs
sudo ./deployment/generate-certs.sh \
--type worker \
--output /etc/ffrtmp/certs \
--ca-cert /etc/ffrtmp-master/certs/ca.crt \
--ca-key /etc/ffrtmp-master/certs/ca.key
For production, use CA-signed certificates:
# 1. Generate CSR
openssl req -new -key master.key -out master.csr
# 2. Get certificate from CA (Let's Encrypt, etc.)
# 3. Install certificates
sudo cp master.crt /etc/ffrtmp-master/certs/
sudo cp master.key /etc/ffrtmp-master/certs/
sudo chmod 600 /etc/ffrtmp-master/certs/master.key
# 4. Configure master to use them
sudo ./deploy.sh --master \
--master-ip $(hostname -I | awk '{print $1}')
See docs/TLS_SETUP_GUIDE.md for complete TLS documentation.
Best for: Development, testing, <1000 jobs/day
# Automatically configured in development mode
./deployment/deployment-wizard.sh
# Select: Development environment
# Or manually
sudo ./deploy.sh --master --non-interactive
# Database location: /var/lib/ffrtmp-master/master.db
Best for: Production, >1000 jobs/day, >10 workers
# 1. Install PostgreSQL
sudo apt-get install postgresql postgresql-contrib
# Or with Docker
docker run -d \
--name ffrtmp-postgres \
-e POSTGRES_DB=ffrtmp \
-e POSTGRES_USER=ffrtmp_user \
-e POSTGRES_PASSWORD=secure_password \
-p 5432:5432 \
postgres:15
# 2. Create database and user
sudo -u postgres psql << EOF
CREATE DATABASE ffrtmp;
CREATE USER ffrtmp_user WITH PASSWORD 'secure_password';
GRANT ALL PRIVILEGES ON DATABASE ffrtmp TO ffrtmp_user;
\q
EOF
# 3. Run migrations
psql -U ffrtmp_user -d ffrtmp -f shared/pkg/store/migrations/001_initial_schema.sql
# 4. Configure master
sudo cp deployment/configs/master-prod.yaml /etc/ffrtmp-master/config.yaml
sudo vim /etc/ffrtmp-master/config.yaml
# Update database section:
# database:
# type: postgres
# host: localhost
# port: 5432
# database: ffrtmp
# user: ffrtmp_user
# password: secure_password
# ssl_mode: require
# 5. Restart master
sudo systemctl restart ffrtmp-master
See deployment/postgres/README.md for PostgreSQL high-availability setup.
# Master metrics
curl http://localhost:9090/metrics
# Worker metrics
curl http://localhost:9091/metrics
# Available metrics:
# - ffrtmp_jobs_total
# - ffrtmp_jobs_duration_seconds
# - ffrtmp_workers_active
# - ffrtmp_queue_length
# - go_goroutines, go_memstats_*
# - process_cpu_seconds_total
# Master health
curl http://localhost:8080/health
# Returns:
# {"status":"healthy","database":"ok","workers":3,"version":"2.0.0"}
# Detailed status
curl http://localhost:8080/api/v1/status
# 1. Start monitoring stack
cd deployment/grafana
docker-compose up -d
# 2. Access Grafana
open http://localhost:3000
# Credentials: admin/admin
# 3. Pre-configured dashboards:
# - FFmpeg-RTMP Overview
# - Job Processing Metrics
# - Worker Node Status
# - System Resources
Dashboards auto-loaded from deployment/grafana/dashboards/
# Prometheus config at: deployment/prometheus/prometheus.yml
# Scrapes:
# - Master: localhost:9090
# - Workers: auto-discovery or static config
# - Node Exporter: 9100 (if installed)
# View targets
open http://localhost:9090/targets
# View logs
sudo journalctl -u ffrtmp-master -f
sudo journalctl -u ffrtmp-worker -f
sudo journalctl -u ffrtmp-watch -f
# Or file-based logs
tail -f /var/log/ffrtmp/master.log
tail -f /var/log/ffrtmp/worker.log
# Search for errors
grep -i error /var/log/ffrtmp/*.log
Log rotation configured automatically:
- Daily rotation
- 14-day retention
- Gzip compression
- Location:
/var/log/ffrtmp/
See deployment/logrotate/ for logrotate configs.
Run automated security checks:
# Included in pre-flight checks
./deployment/checks/preflight-check.sh --master
-
Strong API Keys
# Generate cryptographically secure key export MASTER_API_KEY="$(openssl rand -hex 32)" # Or let deployment script generate one sudo ./deploy.sh --master --non-interactive # Key saved to: /etc/ffrtmp-master/.api-key -
TLS Encryption
# Production: Use CA-signed certificates sudo cp /path/to/ca-signed.crt /etc/ffrtmp-master/certs/server.crt sudo cp /path/to/ca-signed.key /etc/ffrtmp-master/certs/server.key # Development: Auto-generate self-signed sudo ./deploy.sh --master --generate-certs \ --master-ip 10.0.1.10 \ --master-host master.example.com -
Firewall Rules
# UFW (Ubuntu/Debian) sudo ufw allow 8080/tcp # Master API sudo ufw allow from 10.0.0.0/8 to any port 9090 # Metrics (internal only) sudo ufw enable # Firewalld (RHEL/Rocky) sudo firewall-cmd --permanent --add-port=8080/tcp sudo firewall-cmd --permanent --add-rich-rule='rule family="ipv4" source address="10.0.0.0/8" port port="9090" protocol="tcp" accept' sudo firewall-cmd --reload -
File Permissions
# Automatically set by deployment scripts # Verify: ls -la /etc/ffrtmp-master/ # Expect: 600 or 640 for sensitive files # Fix if needed: sudo chmod 600 /etc/ffrtmp-master/config.yaml sudo chmod 600 /etc/ffrtmp-master/.api-key sudo chown ffrtmp-master:ffrtmp-master /etc/ffrtmp-master/* -
Database Security
# PostgreSQL: Require SSL # In config.yaml: database: ssl_mode: require # Restrict PostgreSQL access sudo vim /etc/postgresql/*/main/pg_hba.conf # Add: hostssl ffrtmp ffrtmp_user 10.0.0.0/8 md5 -
Rate Limiting
# Already enabled by default in production config # Adjust in config.yaml: api: rate_limit: 1000 # requests per minute burst: 2000 # burst allowance -
SELinux (RHEL-based)
# If SELinux is enforcing, set contexts sudo semanage fcontext -a -t bin_t "/opt/ffrtmp(-master)?/bin(/.*)?" sudo restorecon -Rv /opt/ffrtmp(-master)?/bin
# Run security audit
./deployment/checks/security-audit.sh
# Check for exposed secrets
grep -r "password\|secret\|key" /etc/ffrtmp* 2>/dev/null | grep -v "CHANGE_ME"
# Verify TLS
openssl s_client -connect localhost:8080 -showcerts
# Test API authentication
curl -X GET http://localhost:8080/api/v1/jobs # Should return 401
curl -X GET -H "X-API-Key: wrong-key" http://localhost:8080/api/v1/jobs # Should return 403
- CPU: 2 cores
- RAM: 4 GB
- Disk: 10 GB
- OS: Linux (Ubuntu 20.04+, Debian 10+, Rocky Linux 8+)
- Network: Internet connectivity
- CPU: 4-8 cores
- RAM: 8-16 GB
- Disk: 50-100 GB SSD
- OS: Ubuntu 22.04 LTS or Rocky Linux 9
- Network: 1 Gbps, low latency to workers
- CPU: 8-16 cores (more for video encoding)
- RAM: 16-32 GB
- Disk: 200-500 GB SSD (for temporary files)
- GPU: NVIDIA GPU with NVENC (optional, recommended)
- OS: Ubuntu 22.04 LTS or Rocky Linux 9
- Network: 1 Gbps minimum, 10 Gbps for 4K workloads
- Go: 1.24+ (for building from source)
- FFmpeg: 4.4+ with libx264, libx265
- PostgreSQL: 15+ (production master)
- Docker: 20.10+ (optional, for monitoring stack)
- Ansible: 2.15+ (optional, for automated deployment)
- NVIDIA GPU + drivers - For NVENC hardware encoding
- Intel QSV - Intel Quick Sync Video encoding
- VAAPI - Intel/AMD hardware acceleration
- Prometheus - Metrics collection
- Grafana - Monitoring dashboards
- Victoria Metrics - Long-term metrics storage
ffmpeg-rtmp/
├── deployment/ # Deployment scripts and tools
│ ├── checks/ # Validation and health checks
│ │ ├── health-check.sh # Post-deployment verification
│ │ ├── preflight-check.sh # Pre-deployment validation
│ │ ├── config-validator.sh # Configuration validation
│ │ └── security-audit.sh # Security checks
│ ├── configs/ # Configuration templates
│ │ ├── master-dev.yaml # Development master config
│ │ ├── master-prod.yaml # Production master config
│ │ ├── worker-dev.env # Development worker config
│ │ └── worker-prod.env # Production worker config
│ ├── orchestration/ # Deployment orchestration
│ │ ├── blue-green-deploy.sh # Zero-downtime deployment
│ │ └── rolling-update.sh # Worker rolling updates
│ ├── systemd/ # Systemd service files
│ │ ├── ffrtmp-master.service
│ │ ├── ffrtmp-worker.service
│ │ ├── ffrtmp-watch.service
│ │ └── *.env.example
│ ├── logrotate/ # Log rotation configs
│ ├── grafana/ # Grafana dashboards
│ ├── prometheus/ # Prometheus configuration
│ ├── postgres/ # PostgreSQL setup scripts
│ ├── generate-certs.sh # TLS certificate generator
│ ├── validate-and-rollback.sh # Backup and restore
│ ├── deployment-wizard.sh # Interactive deployment
│ ├── test-deployment-scripts.sh # Deployment tests
│ └── simulate-deployment.sh # Deployment simulation
├── ansible/ # Ansible automation
│ ├── playbooks/ # Ansible playbooks
│ ├── roles/ # Ansible roles
│ ├── inventory/ # Inventory files
│ ├── group_vars/ # Variable files
│ └── ANSIBLE_GUIDE.md # Ansible documentation
├── deploy.sh # Unified deployment script
├── bin/ # Compiled binaries
│ ├── master # Master server binary
│ ├── agent # Worker agent binary
│ └── ffrtmp # CLI tool with watch daemon
├── docs/ # Documentation
│ ├── DEPLOYMENT_IMPROVEMENTS.md # Deployment guide
│ ├── TLS_SETUP_GUIDE.md # TLS/SSL guide
│ ├── PRODUCTION_CHECKLIST.md # Production checklist
│ ├── API.md # API documentation
│ ├── ARCHITECTURE.md # System architecture
│ └── ...
├── /etc/ffrtmp-master/ # Master configuration
│ ├── config.yaml # Main configuration
│ ├── .api-key # API key (generated)
│ └── certs/ # TLS certificates
│ ├── server.crt
│ ├── server.key
│ └── ca.crt
├── /etc/ffrtmp/ # Worker configuration
│ ├── worker.env # Environment variables
│ ├── watch-config.yaml # Watch daemon config
│ └── certs/ # TLS certificates
├── /var/lib/ffrtmp-master/ # Master data
│ ├── master.db # SQLite database
│ └── archive/ # Archived jobs
├── /var/lib/ffrtmp/ # Worker data
│ └── results/ # Job output files
├── /var/log/ffrtmp/ # Application logs
│ ├── master.log
│ ├── worker.log
│ ├── watch.log
│ └── *.log.[1-14].gz # Rotated logs
├── /var/backups/ffrtmp/ # Automatic backups
│ ├── master-*.tar.gz
│ └── worker-*.tar.gz
└── /opt/ffrtmp(-master)/ # Installation directory
└── bin/ # Binaries
# Blue-Green Deployment Structure
/opt/ffrtmp-blue/ # Blue environment
/opt/ffrtmp-green/ # Green environment
/opt/ffrtmp -> blue # Current active (symlink)
# Run comprehensive health check
./deployment/checks/health-check.sh --master
# Check service status
sudo systemctl status ffrtmp-master
sudo systemctl status ffrtmp-worker
sudo systemctl status ffrtmp-watch
# View recent logs
sudo journalctl -u ffrtmp-master -n 100 --no-pager
sudo journalctl -u ffrtmp-worker -n 100 --no-pager
# Check connectivity
curl -k https://localhost:8080/health
Symptom: preflight-check.sh reports errors
Solutions:
# Insufficient memory
free -g # Check available memory
# Add swap or upgrade server
# Port in use
sudo ss -tlnp | grep :8080
sudo systemctl stop <conflicting-service>
# Missing dependencies
sudo apt-get install curl wget git # Debian/Ubuntu
sudo yum install curl wget git # RHEL/Rocky
# Go version too old
# Install Go 1.24+ from https://go.dev/dl/
# Cgroups v2 not enabled
sudo grubby --update-kernel=ALL --args="systemd.unified_cgroup_hierarchy=1"
sudo reboot
Symptom: systemctl status ffrtmp-master shows failed
Solutions:
# Check logs
sudo journalctl -u ffrtmp-master -n 50 --no-pager
# Common causes:
# 1. Port already in use
sudo ss -tlnp | grep :8080
sudo lsof -i:8080
# 2. Database migration failed
ls -la /var/lib/ffrtmp-master/master.db
sudo -u ffrtmp-master sqlite3 /var/lib/ffrtmp-master/master.db ".schema"
# 3. Permission issues
sudo chown -R ffrtmp-master:ffrtmp-master /var/lib/ffrtmp-master
sudo chmod 755 /var/lib/ffrtmp-master
# 4. Configuration syntax error
./deployment/checks/config-validator.sh /etc/ffrtmp-master/config.yaml
# 5. Certificate issues
ls -la /etc/ffrtmp-master/certs/
openssl x509 -in /etc/ffrtmp-master/certs/server.crt -text -noout
Symptom: Worker logs show "failed to register" or connection errors
Solutions:
# 1. Check master is reachable
curl -k https://master.example.com:8080/health
# 2. Verify API key matches
sudo cat /etc/ffrtmp/worker.env | grep API_KEY
# Compare with master:
sudo cat /etc/ffrtmp-master/.api-key
# 3. TLS certificate verification issues
# Test with curl
curl -k https://master.example.com:8080/health # -k skips verification
curl https://master.example.com:8080/health # Should work if certs valid
# 4. Firewall blocking
sudo ufw status
sudo firewall-cmd --list-all
# 5. Network connectivity
ping master.example.com
telnet master.example.com 8080
Symptom: Jobs stay in "pending" state
Solutions:
# 1. Check workers are registered
./bin/ffrtmp nodes list
# Should show your workers
# 2. Check worker service is running
sudo systemctl status ffrtmp-worker
sudo journalctl -u ffrtmp-worker -f
# 3. Check worker capacity
# In worker logs, look for:
# - "max concurrent jobs reached"
# - "worker is draining"
# 4. Check job queue
./bin/ffrtmp jobs list --status pending
# 5. Check for errors in master logs
sudo grep -i error /var/log/ffrtmp/master.log
Symptom: health-check.sh reports failures
Solutions:
# Re-run with verbose output
./deployment/checks/health-check.sh --master --verbose
# Address each failed check:
# - Service not running: sudo systemctl start ffrtmp-master
# - Port not listening: Check service logs
# - HTTP endpoint error: Check TLS certificates
# - Database error: Check PostgreSQL is running
# - Disk space: Clean up old files
# If multiple checks fail, consider rollback:
sudo ./deployment/orchestration/blue-green-deploy.sh --rollback --master
Symptom: Deployment script stops responding
Solutions:
# 1. Check system resources
top
df -h
# 2. Check for prompts waiting for input
# Use --non-interactive flag:
sudo ./deploy.sh --master --non-interactive
# 3. Check network connectivity (if downloading packages)
ping 8.8.8.8
ping github.com
# 4. Increase timeouts
# Edit script or use:
export DEPLOY_TIMEOUT=600
# Instant rollback to previous version
sudo ./deployment/orchestration/blue-green-deploy.sh --rollback --master
# Manual rollback
sudo systemctl stop ffrtmp-master
sudo rm /opt/ffrtmp
sudo ln -s /opt/ffrtmp-green /opt/ffrtmp # Or blue
sudo systemctl start ffrtmp-master
# SSH to worker
ssh worker1
# Find backup
ls -lt /opt/ffrtmp.backup-*
# Restore
sudo systemctl stop ffrtmp-worker ffrtmp-watch
sudo rm -rf /opt/ffrtmp
sudo mv /opt/ffrtmp.backup-20260107-103000 /opt/ffrtmp
sudo systemctl start ffrtmp-worker ffrtmp-watch
# Verify
./deployment/checks/health-check.sh --worker \
--url https://master.example.com:8080
-
Check logs first:
sudo journalctl -u ffrtmp-* -n 200 --no-pager -
Run diagnostics:
./deployment/checks/health-check.sh --master ./deployment/checks/preflight-check.sh --master -
Enable debug logging:
# Edit config.yaml logging: level: debug sudo systemctl restart ffrtmp-master -
Check documentation:
docs/DEPLOYMENT_IMPROVEMENTS.md- Complete deployment guidedocs/TLS_SETUP_GUIDE.md- TLS troubleshootingansible/ANSIBLE_GUIDE.md- Ansible-specific issuesDEPLOY_QUICKREF.md- Quick reference
-
Report issues:
- GitHub Issues: Include logs, config (redacted), system info
- Provide output from health checks
- Monitor service health
- Check disk space
- Review error logs
# Automated daily health check
./deployment/checks/health-check.sh --master | tee /var/log/ffrtmp/health-$(date +%Y%m%d).log
- Review job success rates
- Check worker capacity
- Database maintenance
# SQLite maintenance
sudo sqlite3 /var/lib/ffrtmp-master/master.db "VACUUM; ANALYZE;"
# PostgreSQL maintenance
sudo -u postgres psql ffrtmp -c "VACUUM ANALYZE;"
# Clean up old job results
find /var/lib/ffrtmp/results -mtime +30 -delete
- Review and rotate API keys
- Update system packages
- Test backup restoration
- Review security logs
# System updates
sudo apt-get update && sudo apt-get upgrade -y # Debian/Ubuntu
sudo yum update -y # RHEL/Rocky
# Test backups
./deployment/validate-and-rollback.sh --validate
Automatic log rotation (configured during deployment):
# Configuration
/etc/logrotate.d/ffrtmp-master
/etc/logrotate.d/ffrtmp-worker
/etc/logrotate.d/ffrtmp-watch
# Settings:
# - Daily rotation
# - 14-day retention
# - Gzip compression
# - Copytruncate mode (no service restart needed)
# Test logrotate
sudo logrotate -d /etc/logrotate.d/ffrtmp-master # Dry run
sudo logrotate -f /etc/logrotate.d/ffrtmp-master # Force rotation
# View rotated logs
ls -lh /var/log/ffrtmp/
zcat /var/log/ffrtmp/master.log.1.gz | less
# Backups created automatically during deployment
ls -lh /var/backups/ffrtmp/
# Backup locations:
# /var/backups/ffrtmp/master-YYYYMMDD-HHMMSS.tar.gz
# /var/backups/ffrtmp/worker-YYYYMMDD-HHMMSS.tar.gz
# Master node backup
sudo tar -czf /var/backups/ffrtmp/master-$(date +%Y%m%d).tar.gz \
/etc/ffrtmp-master \
/var/lib/ffrtmp-master \
/opt/ffrtmp-master/bin
# Worker node backup
sudo tar -czf /var/backups/ffrtmp/worker-$(date +%Y%m%d).tar.gz \
/etc/ffrtmp \
/opt/ffrtmp/bin
# Database-only backup
sudo cp /var/lib/ffrtmp-master/master.db /var/backups/ffrtmp/master-$(date +%Y%m%d).db
# Or PostgreSQL:
pg_dump -U ffrtmp_user ffrtmp > /var/backups/ffrtmp/ffrtmp-$(date +%Y%m%d).sql
# Interactive rollback wizard
./deployment/validate-and-rollback.sh --rollback
# Or manual restore
sudo systemctl stop ffrtmp-master
sudo tar -xzf /var/backups/ffrtmp/master-20260107.tar.gz -C /
sudo systemctl start ffrtmp-master
./deployment/checks/health-check.sh --master
# Compact database
sudo -u ffrtmp-master sqlite3 /var/lib/ffrtmp-master/master.db "VACUUM;"
# Analyze for query optimization
sudo -u ffrtmp-master sqlite3 /var/lib/ffrtmp-master/master.db "ANALYZE;"
# Check integrity
sudo -u ffrtmp-master sqlite3 /var/lib/ffrtmp-master/master.db "PRAGMA integrity_check;"
# View database size
du -sh /var/lib/ffrtmp-master/master.db
# Vacuum and analyze
sudo -u postgres psql ffrtmp << EOF
VACUUM VERBOSE ANALYZE;
REINDEX DATABASE ffrtmp;
EOF
# Check bloat
sudo -u postgres psql ffrtmp -c "SELECT schemaname, tablename, pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size FROM pg_tables ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC LIMIT 10;"
# Backup
pg_dump -U ffrtmp_user -F c ffrtmp > /var/backups/ffrtmp/ffrtmp-$(date +%Y%m%d).dump
# Clean completed jobs older than 30 days
./bin/ffrtmp jobs cleanup --older-than 30d
# Or manually
sudo sqlite3 /var/lib/ffrtmp-master/master.db << EOF
DELETE FROM jobs WHERE status='completed' AND completed_at < datetime('now', '-30 days');
DELETE FROM jobs WHERE status='failed' AND completed_at < datetime('now', '-7 days');
VACUUM;
EOF
# Clean old log files (beyond logrotate retention)
find /var/log/ffrtmp -name "*.log.*" -mtime +30 -delete
# Clean old job output files
find /var/lib/ffrtmp/results -type f -mtime +30 -delete
# 1. Build new version
git pull origin main
make build-master build-agent build-cli
# 2. Deploy to inactive environment
sudo ./deployment/orchestration/blue-green-deploy.sh \
--deploy --master --version v2.1.0
# 3. Health checks run automatically
# Review output
# 4. Switch to new version
sudo ./deployment/orchestration/blue-green-deploy.sh \
--switch --master
# 5. Verify new version
./deployment/checks/health-check.sh --master
./bin/ffrtmp version
# 6. If issues, instant rollback
sudo ./deployment/orchestration/blue-green-deploy.sh \
--rollback --master
# Upgrade all workers with zero downtime
./deployment/orchestration/rolling-update.sh \
--workers worker1,worker2,worker3 \
--version v2.1.0 \
--master-url https://master.example.com:8080 \
--api-key YOUR_API_KEY \
--max-parallel 1
# Workers are updated one at a time:
# 1. Drains worker
# 2. Waits for jobs to complete
# 3. Backs up current version
# 4. Deploys new version
# 5. Verifies health
# 6. Moves to next worker
# 1. Backup everything
sudo ./deployment/validate-and-rollback.sh --validate
# 2. Stop services
sudo systemctl stop ffrtmp-master ffrtmp-worker ffrtmp-watch
# 3. Backup database
sudo cp /var/lib/ffrtmp-master/master.db /var/backups/ffrtmp/master-pre-upgrade-$(date +%Y%m%d).db
# 4. Pull and build
git pull origin main
make clean build-master build-agent build-cli
# 5. Run any database migrations
# Check CHANGELOG.md for migration steps
# 6. Redeploy
sudo ./deploy.sh --master --non-interactive
# 7. Verify
./deployment/checks/health-check.sh --master
./bin/ffrtmp version
Before upgrading:
- Read CHANGELOG.md for breaking changes
- Backup database
- Test upgrade in development environment
- Schedule maintenance window (if not using blue-green)
- Notify users of potential downtime
- Verify disk space for new version
After upgrading:
- Run health checks
- Verify API functionality
- Check worker registration
- Submit test job
- Monitor logs for errors
- Update monitoring dashboards (if needed)
# Production configuration (deployment/configs/master-prod.yaml)
# 1. Use PostgreSQL for high throughput
database:
type: postgres
max_connections: 100
max_idle_connections: 25
connection_max_lifetime: 3600s
# 2. Increase API rate limits
api:
rate_limit: 5000 # requests per minute
burst: 10000 # burst capacity
# 3. Tune scheduler
scheduler:
interval: 5s # How often to check for jobs
batch_size: 100 # Jobs to schedule per cycle
# 4. Increase worker heartbeat tolerance
worker:
heartbeat_timeout: 30s
max_missed_heartbeats: 5
cleanup_interval: 60s
# Worker configuration (deployment/configs/worker-prod.env)
# 1. Maximize concurrent jobs (based on CPU cores)
MAX_CONCURRENT_JOBS=16 # Typically cores * 2
# 2. Faster heartbeat for responsiveness
HEARTBEAT_INTERVAL=30s # Default: 30s (3 missed = 90s timeout)
# 3. Enable hardware acceleration
ENABLE_NVENC=true # NVIDIA GPU
ENABLE_VAAPI=true # Intel/AMD
ENABLE_QSV=true # Intel Quick Sync
# 4. FFmpeg threading
FFMPEG_THREADS=0 # Auto-detect (recommended)
# 5. Resource limits (adjust based on hardware)
MAX_MEMORY_MB=32768 # 32GB
MAX_CPU_CORES=16
Critical for 50+ workers: Tune connection pool to prevent bottlenecks.
# config-postgres.yaml
database:
max_open_conns: 25 # Total connections to database
max_idle_conns: 5 # Idle connections kept warm
conn_max_lifetime: 5m # Recycle old connections
conn_max_idle_time: 1m # Close idle connections
Performance:
- 25 connections @ 2s poll rate = 50 workers max
- Each query takes ~500ms including lock contention
# config-postgres.yaml
database:
max_open_conns: 50 # 2× default
max_idle_conns: 10 # 2× idle pool
conn_max_lifetime: 5m # Keep existing
conn_max_idle_time: 1m # Keep existing
Performance:
- 50 connections @ 2s poll rate = 100 workers max
- Handles burst load during job assignments
# config-postgres.yaml
database:
max_open_conns: 100 # High concurrency
max_idle_conns: 20 # Larger warm pool
conn_max_lifetime: 3m # Faster recycling
conn_max_idle_time: 30s # Aggressive cleanup
Performance:
- 100 connections @ 2s poll rate = 200 workers max
- Requires PostgreSQL
max_connections = 150+
Match your connection pool to PostgreSQL limits:
# Edit /etc/postgresql/15/main/postgresql.conf
max_connections = 150 # 100 (app) + 50 (admin/monitoring)
shared_buffers = 4GB # 25% of RAM (16GB server)
effective_cache_size = 12GB # 75% of RAM
work_mem = 64MB # Per-query memory
maintenance_work_mem = 512MB # For VACUUM/ANALYZE
# Restart PostgreSQL
sudo systemctl restart postgresql
Required Connections = (Workers × Poll Frequency) / Avg Query Time
Example:
- 100 workers
- Poll every 2 seconds
- Avg query time: 500ms (0.5s)
Connections = (100 workers × 0.5 queries/sec) / (1 / 0.5s)
= 50 queries/sec / 2 queries/conn/sec
= 25 connections minimum
Add 20% overhead: 25 × 1.2 = 30 connections
# 1. Check PostgreSQL active connections
sudo -u postgres psql -c "
SELECT state, COUNT(*)
FROM pg_stat_activity
WHERE datname = 'ffrtmp'
GROUP BY state;
"
# 2. Watch connection pool metrics (Prometheus)
curl localhost:9090/metrics | grep database_connections
# Expected output:
# database_connections_open 18
# database_connections_idle 4
# database_connections_in_use 14
# database_connections_wait_duration_ms 2.5
# 3. Check for connection starvation
# If wait_duration > 100ms, increase max_open_conns
Connection starvation signs:
- Workers report "timeout waiting for connection"
- Increased job assignment latency (>5s)
- Prometheus metric:
database_connections_wait_duration_ms > 100 - PostgreSQL logs: "remaining connection slots reserved"
Fix: Increase max_open_conns by 50%
Connection exhaustion signs:
- PostgreSQL error: "FATAL: sorry, too many clients already"
- Master crashes with "connection refused"
- Zero idle connections in pool
Fix: Increase PostgreSQL max_connections in postgresql.conf
Do:
- Start with defaults (25 conns) and scale up as needed
- Monitor
database_connections_wait_duration_msmetric - Use
conn_max_lifetimeto prevent stale connections - Set PostgreSQL
max_connections= 1.5×max_open_conns - Enable connection pool metrics in production
Don't:
- Set
max_open_connshigher than PostgreSQL allows - Use
max_open_conns > 200(indicates architectural issue) - Set
max_idle_conns = 0(causes reconnection overhead) - Ignore connection wait times in metrics
If you need >200 workers, consider:
-
PgBouncer connection pooler
# Install PgBouncer between app and PostgreSQL max_client_conn = 200 # Application connections default_pool_size = 50 # Database connections reserve_pool_size = 10 # Emergency connections -
Redis job queue (architectural change)
- Offload job assignment to Redis
- Reduce database polling load
- See:
docs/ARCHITECTURE_IMPROVEMENTS.md
-
Read replicas (for read-heavy queries)
- Route worker polling to read replicas
- Keep writes on primary
- Requires query routing logic
# 1. Increase file descriptors
echo "* soft nofile 65536" | sudo tee -a /etc/security/limits.conf
echo "* hard nofile 65536" | sudo tee -a /etc/security/limits.conf
# 2. TCP tuning for high throughput
sudo tee -a /etc/sysctl.conf << EOF
net.core.rmem_max = 134217728
net.core.wmem_max = 134217728
net.ipv4.tcp_rmem = 4096 87380 67108864
net.ipv4.tcp_wmem = 4096 65536 67108864
EOF
sudo sysctl -p
# 3. Cgroups v2 optimization
# Adjust in systemd service files:
CPUQuota=1600% # 16 cores = 1600%
MemoryMax=32G
IOWeight=1000
# 4. Disable swap for consistent performance (optional)
sudo swapoff -a
For >10,000 jobs/day:
# 1. Horizontal scaling
# Add more workers:
./deployment/orchestration/rolling-update.sh \
--workers worker1,worker2,worker3,worker4,worker5 \
...
# 2. Database optimization
# PostgreSQL connection pooling
database:
max_connections: 200
max_idle_connections: 50
# 3. Dedicated queue for high-priority jobs
# Use LIVE queue for low-latency jobs
./bin/ffrtmp jobs submit --scenario test --queue live
# 4. Redis for job queue (future enhancement)
# Currently uses database, Redis would be faster
# 5. Monitoring and auto-scaling
# Use Prometheus + Alertmanager to trigger scaling
For sub-minute job processing:
# Master: Aggressive scheduling
scheduler:
interval: 2s
batch_size: 50
# Workers: Frequent polling
HEARTBEAT_INTERVAL=5s
JOB_TIMEOUT=300s
# Use LIVE queue
api:
default_queue: live
docs/DEPLOYMENT_IMPROVEMENTS.md- Complete deployment system guidedocs/TLS_SETUP_GUIDE.md- TLS/SSL configuration and troubleshootingdocs/PRODUCTION_CHECKLIST.md- Production readiness checklistDEPLOY_QUICKREF.md- Quick reference guideansible/ANSIBLE_GUIDE.md- Ansible automation guidedocs/API.md- REST API documentationdocs/ARCHITECTURE.md- System architecture overviewdocs/SECURITY.md- Security best practicesCHANGELOG.md- Version history and breaking changes
| Tool | Purpose | Documentation |
|---|---|---|
deployment-wizard.sh |
Interactive guided deployment | Built-in help |
deploy.sh |
Unified deployment script | ./deploy.sh --help |
preflight-check.sh |
Pre-deployment validation | docs/DEPLOYMENT_IMPROVEMENTS.md |
health-check.sh |
Post-deployment verification | docs/DEPLOYMENT_IMPROVEMENTS.md |
blue-green-deploy.sh |
Zero-downtime deployment | docs/DEPLOYMENT_IMPROVEMENTS.md |
rolling-update.sh |
Worker rolling updates | docs/DEPLOYMENT_IMPROVEMENTS.md |
validate-and-rollback.sh |
Backup and restore | docs/DEPLOYMENT_IMPROVEMENTS.md |
generate-certs.sh |
TLS certificate generation | docs/TLS_SETUP_GUIDE.md |
| File | Description | Template |
|---|---|---|
/etc/ffrtmp-master/config.yaml |
Master configuration | deployment/configs/master-prod.yaml |
/etc/ffrtmp/worker.env |
Worker environment | deployment/configs/worker-prod.env |
/etc/ffrtmp/watch-config.yaml |
Watch daemon config | deployment/systemd/watch-config.yaml.example |
# Check service status
sudo systemctl status ffrtmp-master
sudo systemctl status ffrtmp-worker
sudo systemctl status ffrtmp-watch
# View logs
sudo journalctl -u ffrtmp-master -f
sudo journalctl -u ffrtmp-worker -f
sudo journalctl -u ffrtmp-watch -f
# Restart services
sudo systemctl restart ffrtmp-master
sudo systemctl restart ffrtmp-worker
sudo systemctl restart ffrtmp-watch
# Enable/disable auto-start
sudo systemctl enable ffrtmp-master
sudo systemctl disable ffrtmp-worker
# Job management
./bin/ffrtmp jobs submit --scenario test
./bin/ffrtmp jobs list
./bin/ffrtmp jobs get <job-id>
./bin/ffrtmp jobs cancel <job-id>
# Node management
./bin/ffrtmp nodes list
./bin/ffrtmp nodes get <node-id>
./bin/ffrtmp nodes stats
# System info
./bin/ffrtmp version
./bin/ffrtmp health
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Security: See
SECURITY.mdfor reporting vulnerabilities - Contributing: See
CONTRIBUTING.mdfor contribution guidelines
| Goal | Command/Method |
|---|---|
| Deploy for the first time | ./deployment/deployment-wizard.sh |
| Deploy master only | sudo ./deploy.sh --master --non-interactive |
| Deploy worker only | sudo ./deploy.sh --worker --master-url URL --api-key KEY |
| Check if system is ready | ./deployment/checks/preflight-check.sh --master |
| Verify deployment worked | ./deployment/checks/health-check.sh --master |
| Update master with no downtime | ./deployment/orchestration/blue-green-deploy.sh |
| Update multiple workers | ./deployment/orchestration/rolling-update.sh |
| Rollback to previous version | ./deployment/orchestration/blue-green-deploy.sh --rollback |
| Deploy to 10+ servers | Ansible: ansible-playbook playbooks/site.yml |
| Generate TLS certificates | ./deployment/generate-certs.sh --type master |
| Validate configuration | ./deployment/checks/config-validator.sh config.yaml |
| Backup before upgrade | ./deployment/validate-and-rollback.sh --validate |
| Restore from backup | ./deployment/validate-and-rollback.sh --rollback |
| Test the system | ./bin/ffrtmp jobs submit --scenario test |
| Check logs for errors | sudo journalctl -u ffrtmp-master -n 100 --no-pager |
| Monitor metrics | curl http://localhost:9090/metrics |
- Read
docs/DEPLOYMENT_IMPROVEMENTS.md - Choose deployment method (wizard, manual, Ansible)
- Run pre-flight checks
- Prepare configuration files
- Generate or obtain TLS certificates
- Set up firewall rules
- Configure backups
- Deploy master node first
- Verify master health
- Generate and save API key
- Deploy worker nodes
- Verify worker registration
- Test job submission
- Configure monitoring
- Run health checks
- Verify all services running
- Test API endpoints
- Submit test job and verify completion
- Review logs for errors
- Set up log rotation
- Configure database backups
- Set up monitoring alerts
- Document deployment (IPs, keys, etc.)
- Test rollback procedure
- Review
PRODUCTION_CHECKLIST.md - Enable TLS with proper certificates
- Rotate default API keys
- Configure firewall rules
- Set up monitoring and alerting
- Configure automated backups
- Test disaster recovery
- Document runbooks
- Train operators
- Schedule regular maintenance
Version: 2.0.0+
Status: Production-Ready
Last Updated: 2026-01-07
- Interactive deployment wizard
- Pre-flight system validation
- Post-deployment health checks
- Configuration validation
- Blue-green deployments (zero downtime)
- Rolling worker updates
- Automated backups and rollback
- TLS certificate generation
- Environment-specific configs (dev/prod)
- Comprehensive documentation
- GitHub Actions CI/CD integration
- Ansible automation for multi-server deployments
See CHANGELOG.md for complete version history.
Next Steps:
- Quick Start: Run
./deployment/deployment-wizard.sh - Learn More: Read
docs/DEPLOYMENT_IMPROVEMENTS.md - Advanced: Explore
ansible/ANSIBLE_GUIDE.md - Troubleshoot: See troubleshooting section above
- Get Help: Open a GitHub issue
**Happy Deploying! **