Production Deployment Guide

Quick Start (5 Minutes)

Option 1: Interactive Wizard (Easiest)

# Clone the repository
git clone <repo-url>
cd ffmpeg-rtmp

# Run the interactive deployment wizard
./deployment/deployment-wizard.sh

The wizard will guide you through:

Deployment type - Master, worker, or both
Environment - Development, staging, or production
Pre-flight checks - System validation
Configuration - TLS, database, resources
Deployment - Automated installation
Verification - Health checks

That's it! The wizard handles everything automatically.

Option 2: Quick Deployment (Advanced Users)

# 1. Clone and build
git clone <repo-url>
cd ffmpeg-rtmp
make build-master build-agent build-cli

# 2. Run pre-flight checks
./deployment/checks/preflight-check.sh --master

# 3. Deploy master node
sudo ./deploy.sh --master --non-interactive

# 4. Verify deployment
./deployment/checks/health-check.sh --master

# 5. Test it
./bin/ffrtmp jobs submit --scenario test
./bin/ffrtmp jobs list

System will be running with:

Master node on port 8080 (HTTPS)
Worker agent registered and polling
API authentication enabled
Metrics on ports 9090-9091
SQLite or PostgreSQL database
Health checks passed
Systemd services configured

Deployment Methods

Method 1: Interactive Wizard (Recommended)

Best for: First-time deployments, manual setup

./deployment/deployment-wizard.sh

Features:

Step-by-step guided deployment
Automatic system validation
Configuration generation
Health checks
User-friendly prompts

Method 2: Unified Deployment Script

Best for: Automated deployments, scripts, CI/CD

# Deploy master node
sudo ./deploy.sh --master --non-interactive

# Deploy worker node
sudo ./deploy.sh --worker \
  --master-url https://master.example.com:8080 \
  --api-key YOUR_API_KEY \
  --worker-id worker-01

# Deploy both on same server
sudo ./deploy.sh --both --non-interactive

# With TLS certificate generation
sudo ./deploy.sh --master \
  --generate-certs \
  --master-ip 10.0.1.10 \
  --master-host master.example.com

Options:

--master - Deploy master node
--worker - Deploy worker node
--both - Deploy both on single server
--master-url URL - Master server URL (for workers)
--api-key KEY - Master API key (for workers)
--worker-id ID - Worker identifier
--generate-certs - Generate self-signed TLS certificates
--master-ip IP - Master server IP for certificates
--master-host HOST - Master server hostname for certificates
--non-interactive - Skip prompts (for automation)
--skip-build - Use existing binaries

Method 3: Blue-Green Deployment (Zero Downtime)

Best for: Production master updates with no downtime

# 1. Deploy new version to inactive environment
sudo ./deployment/orchestration/blue-green-deploy.sh \
  --deploy --master --version v2.0.0

# 2. Test new version (it's not active yet)
# Health checks run automatically

# 3. Switch traffic to new version
sudo ./deployment/orchestration/blue-green-deploy.sh \
  --switch --master

# 4. If something goes wrong, instant rollback
sudo ./deployment/orchestration/blue-green-deploy.sh \
  --rollback --master

How it works:

Maintains two parallel environments (blue and green)
Deploys to inactive environment
Tests before switching
Symlink switch for instant activation
Previous version ready for immediate rollback

Method 4: Rolling Updates (Worker Nodes)

Best for: Updating multiple workers safely

./deployment/orchestration/rolling-update.sh \
  --workers worker1,worker2,worker3 \
  --version v2.0.0 \
  --master-url https://master.example.com:8080 \
  --api-key YOUR_API_KEY \
  --max-parallel 2 \
  --ssh-user root

Process:

Drains each worker (stops accepting new jobs)
Waits for running jobs to complete
Creates backup of current installation
Deploys new version
Runs health checks
Activates worker
Moves to next worker

Options:

--workers - Comma-separated list of worker hosts
--version - Version tag
--max-parallel - Update multiple workers simultaneously
--drain-timeout - Seconds to wait for jobs (default: 300)
--ssh-user - SSH user (default: root)
--ssh-key - Path to SSH private key

Method 5: Ansible Automation (Multiple Servers)

Best for: Large-scale deployments, infrastructure as code

# Configure inventory
cd ansible/
cp inventory/production.ini.example inventory/production.ini
vim inventory/production.ini

# Deploy everything
ansible-playbook -i inventory/production.ini playbooks/site.yml

# Deploy only master
ansible-playbook -i inventory/production.ini playbooks/master.yml

# Deploy only workers
ansible-playbook -i inventory/production.ini playbooks/workers.yml

See ansible/ANSIBLE_GUIDE.md for detailed instructions.

Pre-Deployment Validation

Pre-flight Checks

Always run pre-flight checks before deployment:

# Check master node requirements
./deployment/checks/preflight-check.sh --master

# Check worker node requirements
./deployment/checks/preflight-check.sh --worker \
  --master-url https://master.example.com:8080

Validates:

Operating system compatibility (Ubuntu, Debian, Rocky, AlmaLinux)
CPU cores (2+ for master, 2+ for worker)
Memory (4GB+ for master, 8GB+ for worker)
Disk space (20GB+ root, 10GB+ /var for master, 100GB+ for worker)
Port availability (8080 for master, 1935 optional)
Required commands (curl, wget, git, tar, gzip)
Go version (1.24+)
FFmpeg installation and codecs (workers)
Cgroups v2 support
Network connectivity and DNS
Firewall configuration
SELinux status (RHEL-based systems)

Configuration Validation

Validate configuration files before deployment:

# Validate master config
./deployment/checks/config-validator.sh /etc/ffrtmp-master/config.yaml

# Validate worker config
./deployment/checks/config-validator.sh /etc/ffrtmp/worker.env

# Validate watch daemon config
./deployment/checks/config-validator.sh /etc/ffrtmp/watch-config.yaml

Checks:

YAML/ENV syntax correctness
Required fields present
Sensitive data configured
File permissions secure (600/640)
Type-specific validation

🏥 Post-Deployment Verification

Health Checks

Verify deployment success with comprehensive health checks:

# Check master node health
./deployment/checks/health-check.sh --master

# Check worker node health
./deployment/checks/health-check.sh --worker \
  --url https://master.example.com:8080 \
  --api-key YOUR_API_KEY

Verifies:

Service status (systemd)
Port listening (8080 for master)
HTTP endpoints responding
API authentication working
File and directory structure
Configuration files present
Disk space available
Log files accessible
No critical errors in logs
Database connectivity (master)
FFmpeg installation (workers)
Cgroups v2 enabled (workers)
Master connectivity (workers)
Worker registration successful

Output:

═══════════════════════════════════════
  Health Check Summary
═══════════════════════════════════════
Passed:   25
Warnings: 2
Failed:   0

Configuration Management

Environment-Specific Configs

Pre-configured templates for different environments:

deployment/configs/
├── master-dev.yaml       # Development master
├── master-prod.yaml      # Production master
├── worker-dev.env        # Development worker
└── worker-prod.env       # Production worker

Development:

SQLite database
Debug logging
Relaxed rate limits
TLS optional
Local paths

Production:

PostgreSQL database with SSL
JSON structured logging
Strict rate limits
TLS required with client verification
Monitoring enabled
Backup configured
Alert webhooks

Using Configuration Templates

# Copy and customize for your environment
sudo cp deployment/configs/master-prod.yaml /etc/ffrtmp-master/config.yaml
sudo vim /etc/ffrtmp-master/config.yaml

# Validate before use
./deployment/checks/config-validator.sh /etc/ffrtmp-master/config.yaml

# Deploy with custom config
sudo ./deploy.sh --master --config /etc/ffrtmp-master/config.yaml

Deployment Scenarios

Scenario 1: Single Server (Development)

Use Case: Development, testing, small deployments (<100 jobs/day)

# Interactive wizard (easiest)
./deployment/deployment-wizard.sh

# Or manual deployment
make build-master build-agent build-cli
sudo ./deploy.sh --both --non-interactive
./deployment/checks/health-check.sh --master

# Access
curl http://localhost:8080/health
./bin/ffrtmp jobs submit --scenario test

Pros: Simple, easy setup, single command
Cons: Single point of failure, limited scale

Scenario 2: Distributed Setup (Production)

Use Case: Production workloads, high availability, horizontal scaling

Master Server

# 1. Pre-flight checks
./deployment/checks/preflight-check.sh --master

# 2. Deploy master
sudo ./deploy.sh --master \
  --generate-certs \
  --master-ip 10.0.1.10 \
  --master-host master.example.com \
  --non-interactive

# 3. Health check
./deployment/checks/health-check.sh --master

# 4. Get API key
sudo cat /etc/ffrtmp-master/.api-key

Worker Servers (3+ nodes)

# Deploy to multiple workers with rolling update
./deployment/orchestration/rolling-update.sh \
  --workers worker1,worker2,worker3 \
  --version v1.0.0 \
  --master-url https://10.0.1.10:8080 \
  --api-key <API_KEY_FROM_MASTER> \
  --ssh-user root

# Or deploy individually
ssh worker1
sudo ./deploy.sh --worker \
  --master-url https://10.0.1.10:8080 \
  --api-key <API_KEY> \
  --worker-id worker1

# Verify
./deployment/checks/health-check.sh --worker \
  --url https://10.0.1.10:8080 \
  --api-key <API_KEY>

Pros: Scalable, fault-tolerant, independent worker updates
Cons: Requires multiple servers, more complex setup

Scenario 3: High-Availability Master (Blue-Green)

Use Case: Mission-critical deployments, zero-downtime updates

# Initial deployment
sudo ./deploy.sh --master --non-interactive

# Later: Deploy update with zero downtime
sudo ./deployment/orchestration/blue-green-deploy.sh \
  --deploy --master --version v2.0.0

# Test in inactive environment
# (automatic health checks run)

# Switch when ready
sudo ./deployment/orchestration/blue-green-deploy.sh \
  --switch --master

# Instant rollback if issues
sudo ./deployment/orchestration/blue-green-deploy.sh \
  --rollback --master

Pros: Zero downtime, instant rollback, safe testing
Cons: Requires double disk space for environments

Scenario 4: Multi-Region Deployment (Ansible)

Use Case: Geographic distribution, 10+ servers, infrastructure as code

# 1. Configure inventory
cd ansible/
cp inventory/production.ini.example inventory/production.ini

# Edit with your servers:
# [master]
# master.us-east.example.com
# 
# [workers]
# worker1.us-east.example.com
# worker2.us-east.example.com
# worker3.us-west.example.com

# 2. Configure variables
vim group_vars/all.yml        # Global settings
vim group_vars/master.yml     # Master-specific
vim group_vars/workers.yml    # Worker-specific

# 3. Deploy everything
ansible-playbook -i inventory/production.ini playbooks/site.yml

# 4. Or deploy incrementally
ansible-playbook -i inventory/production.ini playbooks/master.yml
ansible-playbook -i inventory/production.ini playbooks/workers.yml --limit us-east

Pros: Repeatable, version controlled, multi-region support
Cons: Ansible knowledge required, initial setup complexity

See ansible/ANSIBLE_GUIDE.md for complete Ansible documentation.

TLS/SSL Configuration

Automated Certificate Generation

# Generate self-signed certificates (development/testing)
sudo ./deployment/generate-certs.sh \
  --type master \
  --ip 10.0.1.10 \
  --hostname master.example.com \
  --output /etc/ffrtmp-master/certs

# Generate CA and client certificates (mTLS)
sudo ./deployment/generate-certs.sh \
  --type ca \
  --output /etc/ffrtmp-master/certs

sudo ./deployment/generate-certs.sh \
  --type worker \
  --output /etc/ffrtmp/certs \
  --ca-cert /etc/ffrtmp-master/certs/ca.crt \
  --ca-key /etc/ffrtmp-master/certs/ca.key

Production Certificates

For production, use CA-signed certificates:

# 1. Generate CSR
openssl req -new -key master.key -out master.csr

# 2. Get certificate from CA (Let's Encrypt, etc.)

# 3. Install certificates
sudo cp master.crt /etc/ffrtmp-master/certs/
sudo cp master.key /etc/ffrtmp-master/certs/
sudo chmod 600 /etc/ffrtmp-master/certs/master.key

# 4. Configure master to use them
sudo ./deploy.sh --master \
  --master-ip $(hostname -I | awk '{print $1}')

See docs/TLS_SETUP_GUIDE.md for complete TLS documentation.

🗄️ Database Configuration

SQLite (Default - Development)

Best for: Development, testing, <1000 jobs/day

# Automatically configured in development mode
./deployment/deployment-wizard.sh
# Select: Development environment

# Or manually
sudo ./deploy.sh --master --non-interactive

# Database location: /var/lib/ffrtmp-master/master.db

PostgreSQL (Production)

Best for: Production, >1000 jobs/day, >10 workers

# 1. Install PostgreSQL
sudo apt-get install postgresql postgresql-contrib

# Or with Docker
docker run -d \
    --name ffrtmp-postgres \
    -e POSTGRES_DB=ffrtmp \
    -e POSTGRES_USER=ffrtmp_user \
    -e POSTGRES_PASSWORD=secure_password \
    -p 5432:5432 \
    postgres:15

# 2. Create database and user
sudo -u postgres psql << EOF
CREATE DATABASE ffrtmp;
CREATE USER ffrtmp_user WITH PASSWORD 'secure_password';
GRANT ALL PRIVILEGES ON DATABASE ffrtmp TO ffrtmp_user;
\q
EOF

# 3. Run migrations
psql -U ffrtmp_user -d ffrtmp -f shared/pkg/store/migrations/001_initial_schema.sql

# 4. Configure master
sudo cp deployment/configs/master-prod.yaml /etc/ffrtmp-master/config.yaml
sudo vim /etc/ffrtmp-master/config.yaml

# Update database section:
# database:
#   type: postgres
#   host: localhost
#   port: 5432
#   database: ffrtmp
#   user: ffrtmp_user
#   password: secure_password
#   ssl_mode: require

# 5. Restart master
sudo systemctl restart ffrtmp-master

See deployment/postgres/README.md for PostgreSQL high-availability setup.

Monitoring and Observability

Built-in Metrics (Prometheus)

# Master metrics
curl http://localhost:9090/metrics

# Worker metrics  
curl http://localhost:9091/metrics

# Available metrics:
# - ffrtmp_jobs_total
# - ffrtmp_jobs_duration_seconds
# - ffrtmp_workers_active
# - ffrtmp_queue_length
# - go_goroutines, go_memstats_*
# - process_cpu_seconds_total

Health Endpoints

# Master health
curl http://localhost:8080/health

# Returns:
# {"status":"healthy","database":"ok","workers":3,"version":"2.0.0"}

# Detailed status
curl http://localhost:8080/api/v1/status

Grafana Dashboards

# 1. Start monitoring stack
cd deployment/grafana
docker-compose up -d

# 2. Access Grafana
open http://localhost:3000
# Credentials: admin/admin

# 3. Pre-configured dashboards:
# - FFmpeg-RTMP Overview
# - Job Processing Metrics
# - Worker Node Status
# - System Resources

Dashboards auto-loaded from deployment/grafana/dashboards/

Prometheus Configuration

# Prometheus config at: deployment/prometheus/prometheus.yml
# Scrapes:
# - Master: localhost:9090
# - Workers: auto-discovery or static config
# - Node Exporter: 9100 (if installed)

# View targets
open http://localhost:9090/targets

Log Aggregation

# View logs
sudo journalctl -u ffrtmp-master -f
sudo journalctl -u ffrtmp-worker -f
sudo journalctl -u ffrtmp-watch -f

# Or file-based logs
tail -f /var/log/ffrtmp/master.log
tail -f /var/log/ffrtmp/worker.log

# Search for errors
grep -i error /var/log/ffrtmp/*.log

Log rotation configured automatically:

Daily rotation
14-day retention
Gzip compression
Location: /var/log/ffrtmp/

See deployment/logrotate/ for logrotate configs.

🔒 Security Hardening

Pre-Deployment Security Checklist

Run automated security checks:

# Included in pre-flight checks
./deployment/checks/preflight-check.sh --master

Essential Security Steps

Strong API Keys

# Generate cryptographically secure key
export MASTER_API_KEY="$(openssl rand -hex 32)"

# Or let deployment script generate one
sudo ./deploy.sh --master --non-interactive
# Key saved to: /etc/ffrtmp-master/.api-key

TLS Encryption

# Production: Use CA-signed certificates
sudo cp /path/to/ca-signed.crt /etc/ffrtmp-master/certs/server.crt
sudo cp /path/to/ca-signed.key /etc/ffrtmp-master/certs/server.key

# Development: Auto-generate self-signed
sudo ./deploy.sh --master --generate-certs \
  --master-ip 10.0.1.10 \
  --master-host master.example.com

Firewall Rules

# UFW (Ubuntu/Debian)
sudo ufw allow 8080/tcp   # Master API
sudo ufw allow from 10.0.0.0/8 to any port 9090  # Metrics (internal only)
sudo ufw enable

# Firewalld (RHEL/Rocky)
sudo firewall-cmd --permanent --add-port=8080/tcp
sudo firewall-cmd --permanent --add-rich-rule='rule family="ipv4" source address="10.0.0.0/8" port port="9090" protocol="tcp" accept'
sudo firewall-cmd --reload

File Permissions

# Automatically set by deployment scripts
# Verify:
ls -la /etc/ffrtmp-master/
# Expect: 600 or 640 for sensitive files

# Fix if needed:
sudo chmod 600 /etc/ffrtmp-master/config.yaml
sudo chmod 600 /etc/ffrtmp-master/.api-key
sudo chown ffrtmp-master:ffrtmp-master /etc/ffrtmp-master/*

Database Security

# PostgreSQL: Require SSL
# In config.yaml:
database:
  ssl_mode: require
  
# Restrict PostgreSQL access
sudo vim /etc/postgresql/*/main/pg_hba.conf
# Add: hostssl ffrtmp ffrtmp_user 10.0.0.0/8 md5

Rate Limiting

# Already enabled by default in production config
# Adjust in config.yaml:
api:
  rate_limit: 1000  # requests per minute
  burst: 2000       # burst allowance

SELinux (RHEL-based)

# If SELinux is enforcing, set contexts
sudo semanage fcontext -a -t bin_t "/opt/ffrtmp(-master)?/bin(/.*)?"
sudo restorecon -Rv /opt/ffrtmp(-master)?/bin

Security Validation

# Run security audit
./deployment/checks/security-audit.sh

# Check for exposed secrets
grep -r "password\|secret\|key" /etc/ffrtmp* 2>/dev/null | grep -v "CHANGE_ME"

# Verify TLS
openssl s_client -connect localhost:8080 -showcerts

# Test API authentication
curl -X GET http://localhost:8080/api/v1/jobs  # Should return 401
curl -X GET -H "X-API-Key: wrong-key" http://localhost:8080/api/v1/jobs  # Should return 403

System Requirements

Minimum (Development/Testing)

CPU: 2 cores
RAM: 4 GB
Disk: 10 GB
OS: Linux (Ubuntu 20.04+, Debian 10+, Rocky Linux 8+)
Network: Internet connectivity

Recommended (Production Master)

CPU: 4-8 cores
RAM: 8-16 GB
Disk: 50-100 GB SSD
OS: Ubuntu 22.04 LTS or Rocky Linux 9
Network: 1 Gbps, low latency to workers

Recommended (Production Worker)

CPU: 8-16 cores (more for video encoding)
RAM: 16-32 GB
Disk: 200-500 GB SSD (for temporary files)
GPU: NVIDIA GPU with NVENC (optional, recommended)
OS: Ubuntu 22.04 LTS or Rocky Linux 9
Network: 1 Gbps minimum, 10 Gbps for 4K workloads

Software Requirements

Go: 1.24+ (for building from source)
FFmpeg: 4.4+ with libx264, libx265
PostgreSQL: 15+ (production master)
Docker: 20.10+ (optional, for monitoring stack)
Ansible: 2.15+ (optional, for automated deployment)

Optional Components

NVIDIA GPU + drivers - For NVENC hardware encoding
Intel QSV - Intel Quick Sync Video encoding
VAAPI - Intel/AMD hardware acceleration
Prometheus - Metrics collection
Grafana - Monitoring dashboards
Victoria Metrics - Long-term metrics storage

File and Directory Structure

ffmpeg-rtmp/
├── deployment/                    # Deployment scripts and tools
│   ├── checks/                    # Validation and health checks
│   │   ├── health-check.sh        # Post-deployment verification
│   │   ├── preflight-check.sh     # Pre-deployment validation
│   │   ├── config-validator.sh    # Configuration validation
│   │   └── security-audit.sh      # Security checks
│   ├── configs/                   # Configuration templates
│   │   ├── master-dev.yaml        # Development master config
│   │   ├── master-prod.yaml       # Production master config
│   │   ├── worker-dev.env         # Development worker config
│   │   └── worker-prod.env        # Production worker config
│   ├── orchestration/             # Deployment orchestration
│   │   ├── blue-green-deploy.sh   # Zero-downtime deployment
│   │   └── rolling-update.sh      # Worker rolling updates
│   ├── systemd/                   # Systemd service files
│   │   ├── ffrtmp-master.service
│   │   ├── ffrtmp-worker.service
│   │   ├── ffrtmp-watch.service
│   │   └── *.env.example
│   ├── logrotate/                 # Log rotation configs
│   ├── grafana/                   # Grafana dashboards
│   ├── prometheus/                # Prometheus configuration
│   ├── postgres/                  # PostgreSQL setup scripts
│   ├── generate-certs.sh          # TLS certificate generator
│   ├── validate-and-rollback.sh   # Backup and restore
│   ├── deployment-wizard.sh       # Interactive deployment
│   ├── test-deployment-scripts.sh # Deployment tests
│   └── simulate-deployment.sh     # Deployment simulation
├── ansible/                       # Ansible automation
│   ├── playbooks/                 # Ansible playbooks
│   ├── roles/                     # Ansible roles
│   ├── inventory/                 # Inventory files
│   ├── group_vars/                # Variable files
│   └── ANSIBLE_GUIDE.md           # Ansible documentation
├── deploy.sh                      # Unified deployment script
├── bin/                           # Compiled binaries
│   ├── master                     # Master server binary
│   ├── agent                      # Worker agent binary
│   └── ffrtmp                     # CLI tool with watch daemon
├── docs/                          # Documentation
│   ├── DEPLOYMENT_IMPROVEMENTS.md # Deployment guide
│   ├── TLS_SETUP_GUIDE.md         # TLS/SSL guide
│   ├── PRODUCTION_CHECKLIST.md    # Production checklist
│   ├── API.md                     # API documentation
│   ├── ARCHITECTURE.md            # System architecture
│   └── ...
├── /etc/ffrtmp-master/            # Master configuration
│   ├── config.yaml                # Main configuration
│   ├── .api-key                   # API key (generated)
│   └── certs/                     # TLS certificates
│       ├── server.crt
│       ├── server.key
│       └── ca.crt
├── /etc/ffrtmp/                   # Worker configuration
│   ├── worker.env                 # Environment variables
│   ├── watch-config.yaml          # Watch daemon config
│   └── certs/                     # TLS certificates
├── /var/lib/ffrtmp-master/        # Master data
│   ├── master.db                  # SQLite database
│   └── archive/                   # Archived jobs
├── /var/lib/ffrtmp/               # Worker data
│   └── results/                   # Job output files
├── /var/log/ffrtmp/               # Application logs
│   ├── master.log
│   ├── worker.log
│   ├── watch.log
│   └── *.log.[1-14].gz            # Rotated logs
├── /var/backups/ffrtmp/           # Automatic backups
│   ├── master-*.tar.gz
│   └── worker-*.tar.gz
└── /opt/ffrtmp(-master)/          # Installation directory
    └── bin/                       # Binaries

# Blue-Green Deployment Structure
/opt/ffrtmp-blue/                  # Blue environment
/opt/ffrtmp-green/                 # Green environment
/opt/ffrtmp -> blue                # Current active (symlink)

Troubleshooting

Quick Diagnostics

# Run comprehensive health check
./deployment/checks/health-check.sh --master

# Check service status
sudo systemctl status ffrtmp-master
sudo systemctl status ffrtmp-worker
sudo systemctl status ffrtmp-watch

# View recent logs
sudo journalctl -u ffrtmp-master -n 100 --no-pager
sudo journalctl -u ffrtmp-worker -n 100 --no-pager

# Check connectivity
curl -k https://localhost:8080/health

Common Issues

Issue: Pre-flight checks fail

Symptom: preflight-check.sh reports errors

Solutions:

# Insufficient memory
free -g  # Check available memory
# Add swap or upgrade server

# Port in use
sudo ss -tlnp | grep :8080
sudo systemctl stop <conflicting-service>

# Missing dependencies
sudo apt-get install curl wget git  # Debian/Ubuntu
sudo yum install curl wget git      # RHEL/Rocky

# Go version too old
# Install Go 1.24+ from https://go.dev/dl/

# Cgroups v2 not enabled
sudo grubby --update-kernel=ALL --args="systemd.unified_cgroup_hierarchy=1"
sudo reboot

Issue: Master service won't start

Symptom: systemctl status ffrtmp-master shows failed

Solutions:

# Check logs
sudo journalctl -u ffrtmp-master -n 50 --no-pager

# Common causes:
# 1. Port already in use
sudo ss -tlnp | grep :8080
sudo lsof -i:8080

# 2. Database migration failed
ls -la /var/lib/ffrtmp-master/master.db
sudo -u ffrtmp-master sqlite3 /var/lib/ffrtmp-master/master.db ".schema"

# 3. Permission issues
sudo chown -R ffrtmp-master:ffrtmp-master /var/lib/ffrtmp-master
sudo chmod 755 /var/lib/ffrtmp-master

# 4. Configuration syntax error
./deployment/checks/config-validator.sh /etc/ffrtmp-master/config.yaml

# 5. Certificate issues
ls -la /etc/ffrtmp-master/certs/
openssl x509 -in /etc/ffrtmp-master/certs/server.crt -text -noout

Issue: Worker can't register with master

Symptom: Worker logs show "failed to register" or connection errors

Solutions:

# 1. Check master is reachable
curl -k https://master.example.com:8080/health

# 2. Verify API key matches
sudo cat /etc/ffrtmp/worker.env | grep API_KEY
# Compare with master:
sudo cat /etc/ffrtmp-master/.api-key

# 3. TLS certificate verification issues
# Test with curl
curl -k https://master.example.com:8080/health  # -k skips verification
curl https://master.example.com:8080/health     # Should work if certs valid

# 4. Firewall blocking
sudo ufw status
sudo firewall-cmd --list-all

# 5. Network connectivity
ping master.example.com
telnet master.example.com 8080

Issue: Jobs not processing

Symptom: Jobs stay in "pending" state

Solutions:

# 1. Check workers are registered
./bin/ffrtmp nodes list
# Should show your workers

# 2. Check worker service is running
sudo systemctl status ffrtmp-worker
sudo journalctl -u ffrtmp-worker -f

# 3. Check worker capacity
# In worker logs, look for:
# - "max concurrent jobs reached"
# - "worker is draining"

# 4. Check job queue
./bin/ffrtmp jobs list --status pending

# 5. Check for errors in master logs
sudo grep -i error /var/log/ffrtmp/master.log

Issue: Health checks fail after deployment

Symptom: health-check.sh reports failures

Solutions:

# Re-run with verbose output
./deployment/checks/health-check.sh --master --verbose

# Address each failed check:
# - Service not running: sudo systemctl start ffrtmp-master
# - Port not listening: Check service logs
# - HTTP endpoint error: Check TLS certificates
# - Database error: Check PostgreSQL is running
# - Disk space: Clean up old files

# If multiple checks fail, consider rollback:
sudo ./deployment/orchestration/blue-green-deploy.sh --rollback --master

Issue: Deployment hangs or times out

Symptom: Deployment script stops responding

Solutions:

# 1. Check system resources
top
df -h

# 2. Check for prompts waiting for input
# Use --non-interactive flag:
sudo ./deploy.sh --master --non-interactive

# 3. Check network connectivity (if downloading packages)
ping 8.8.8.8
ping github.com

# 4. Increase timeouts
# Edit script or use:
export DEPLOY_TIMEOUT=600

Rollback Procedures

Rollback Master (Blue-Green)

# Instant rollback to previous version
sudo ./deployment/orchestration/blue-green-deploy.sh --rollback --master

# Manual rollback
sudo systemctl stop ffrtmp-master
sudo rm /opt/ffrtmp
sudo ln -s /opt/ffrtmp-green /opt/ffrtmp  # Or blue
sudo systemctl start ffrtmp-master

Rollback Worker

# SSH to worker
ssh worker1

# Find backup
ls -lt /opt/ffrtmp.backup-*

# Restore
sudo systemctl stop ffrtmp-worker ffrtmp-watch
sudo rm -rf /opt/ffrtmp
sudo mv /opt/ffrtmp.backup-20260107-103000 /opt/ffrtmp
sudo systemctl start ffrtmp-worker ffrtmp-watch

# Verify
./deployment/checks/health-check.sh --worker \
  --url https://master.example.com:8080

Getting Help

Check logs first:

sudo journalctl -u ffrtmp-* -n 200 --no-pager

Run diagnostics:

./deployment/checks/health-check.sh --master
./deployment/checks/preflight-check.sh --master

Enable debug logging:

# Edit config.yaml
logging:
  level: debug

sudo systemctl restart ffrtmp-master

Check documentation:
- docs/DEPLOYMENT_IMPROVEMENTS.md - Complete deployment guide
- docs/TLS_SETUP_GUIDE.md - TLS troubleshooting
- ansible/ANSIBLE_GUIDE.md - Ansible-specific issues
- DEPLOY_QUICKREF.md - Quick reference
Report issues:
- GitHub Issues: Include logs, config (redacted), system info
- Provide output from health checks

Maintenance and Operations

Regular Maintenance Tasks

Daily

Monitor service health
Check disk space
Review error logs

# Automated daily health check
./deployment/checks/health-check.sh --master | tee /var/log/ffrtmp/health-$(date +%Y%m%d).log

Weekly

Review job success rates
Check worker capacity
Database maintenance

# SQLite maintenance
sudo sqlite3 /var/lib/ffrtmp-master/master.db "VACUUM; ANALYZE;"

# PostgreSQL maintenance
sudo -u postgres psql ffrtmp -c "VACUUM ANALYZE;"

# Clean up old job results
find /var/lib/ffrtmp/results -mtime +30 -delete

Monthly

Review and rotate API keys
Update system packages
Test backup restoration
Review security logs

# System updates
sudo apt-get update && sudo apt-get upgrade -y  # Debian/Ubuntu
sudo yum update -y                               # RHEL/Rocky

# Test backups
./deployment/validate-and-rollback.sh --validate

Log Management

Automatic log rotation (configured during deployment):

# Configuration
/etc/logrotate.d/ffrtmp-master
/etc/logrotate.d/ffrtmp-worker
/etc/logrotate.d/ffrtmp-watch

# Settings:
# - Daily rotation
# - 14-day retention
# - Gzip compression
# - Copytruncate mode (no service restart needed)

# Test logrotate
sudo logrotate -d /etc/logrotate.d/ffrtmp-master  # Dry run
sudo logrotate -f /etc/logrotate.d/ffrtmp-master  # Force rotation

# View rotated logs
ls -lh /var/log/ffrtmp/
zcat /var/log/ffrtmp/master.log.1.gz | less

Backup and Restore

Automated Backups

# Backups created automatically during deployment
ls -lh /var/backups/ffrtmp/

# Backup locations:
# /var/backups/ffrtmp/master-YYYYMMDD-HHMMSS.tar.gz
# /var/backups/ffrtmp/worker-YYYYMMDD-HHMMSS.tar.gz

Manual Backup

# Master node backup
sudo tar -czf /var/backups/ffrtmp/master-$(date +%Y%m%d).tar.gz \
  /etc/ffrtmp-master \
  /var/lib/ffrtmp-master \
  /opt/ffrtmp-master/bin

# Worker node backup
sudo tar -czf /var/backups/ffrtmp/worker-$(date +%Y%m%d).tar.gz \
  /etc/ffrtmp \
  /opt/ffrtmp/bin

# Database-only backup
sudo cp /var/lib/ffrtmp-master/master.db /var/backups/ffrtmp/master-$(date +%Y%m%d).db
# Or PostgreSQL:
pg_dump -U ffrtmp_user ffrtmp > /var/backups/ffrtmp/ffrtmp-$(date +%Y%m%d).sql

Restore from Backup

# Interactive rollback wizard
./deployment/validate-and-rollback.sh --rollback

# Or manual restore
sudo systemctl stop ffrtmp-master
sudo tar -xzf /var/backups/ffrtmp/master-20260107.tar.gz -C /
sudo systemctl start ffrtmp-master
./deployment/checks/health-check.sh --master

Database Maintenance

SQLite

# Compact database
sudo -u ffrtmp-master sqlite3 /var/lib/ffrtmp-master/master.db "VACUUM;"

# Analyze for query optimization
sudo -u ffrtmp-master sqlite3 /var/lib/ffrtmp-master/master.db "ANALYZE;"

# Check integrity
sudo -u ffrtmp-master sqlite3 /var/lib/ffrtmp-master/master.db "PRAGMA integrity_check;"

# View database size
du -sh /var/lib/ffrtmp-master/master.db

PostgreSQL

# Vacuum and analyze
sudo -u postgres psql ffrtmp << EOF
VACUUM VERBOSE ANALYZE;
REINDEX DATABASE ffrtmp;
EOF

# Check bloat
sudo -u postgres psql ffrtmp -c "SELECT schemaname, tablename, pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size FROM pg_tables ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC LIMIT 10;"

# Backup
pg_dump -U ffrtmp_user -F c ffrtmp > /var/backups/ffrtmp/ffrtmp-$(date +%Y%m%d).dump

Cleanup Old Data

# Clean completed jobs older than 30 days
./bin/ffrtmp jobs cleanup --older-than 30d

# Or manually
sudo sqlite3 /var/lib/ffrtmp-master/master.db << EOF
DELETE FROM jobs WHERE status='completed' AND completed_at < datetime('now', '-30 days');
DELETE FROM jobs WHERE status='failed' AND completed_at < datetime('now', '-7 days');
VACUUM;
EOF

# Clean old log files (beyond logrotate retention)
find /var/log/ffrtmp -name "*.log.*" -mtime +30 -delete

# Clean old job output files
find /var/lib/ffrtmp/results -type f -mtime +30 -delete

Upgrading to New Versions

Zero-Downtime Upgrade (Master - Recommended)

# 1. Build new version
git pull origin main
make build-master build-agent build-cli

# 2. Deploy to inactive environment
sudo ./deployment/orchestration/blue-green-deploy.sh \
  --deploy --master --version v2.1.0

# 3. Health checks run automatically
# Review output

# 4. Switch to new version
sudo ./deployment/orchestration/blue-green-deploy.sh \
  --switch --master

# 5. Verify new version
./deployment/checks/health-check.sh --master
./bin/ffrtmp version

# 6. If issues, instant rollback
sudo ./deployment/orchestration/blue-green-deploy.sh \
  --rollback --master

Rolling Upgrade (Workers)

# Upgrade all workers with zero downtime
./deployment/orchestration/rolling-update.sh \
  --workers worker1,worker2,worker3 \
  --version v2.1.0 \
  --master-url https://master.example.com:8080 \
  --api-key YOUR_API_KEY \
  --max-parallel 1

# Workers are updated one at a time:
# 1. Drains worker
# 2. Waits for jobs to complete
# 3. Backs up current version
# 4. Deploys new version
# 5. Verifies health
# 6. Moves to next worker

Standard Upgrade (With Downtime)

# 1. Backup everything
sudo ./deployment/validate-and-rollback.sh --validate

# 2. Stop services
sudo systemctl stop ffrtmp-master ffrtmp-worker ffrtmp-watch

# 3. Backup database
sudo cp /var/lib/ffrtmp-master/master.db /var/backups/ffrtmp/master-pre-upgrade-$(date +%Y%m%d).db

# 4. Pull and build
git pull origin main
make clean build-master build-agent build-cli

# 5. Run any database migrations
# Check CHANGELOG.md for migration steps

# 6. Redeploy
sudo ./deploy.sh --master --non-interactive

# 7. Verify
./deployment/checks/health-check.sh --master
./bin/ffrtmp version

Upgrade Checklist

Before upgrading:

Read CHANGELOG.md for breaking changes
Backup database
Test upgrade in development environment
Schedule maintenance window (if not using blue-green)
Notify users of potential downtime
Verify disk space for new version

After upgrading:

Performance Tuning

Master Node Optimization

# Production configuration (deployment/configs/master-prod.yaml)

# 1. Use PostgreSQL for high throughput
database:
  type: postgres
  max_connections: 100
  max_idle_connections: 25
  connection_max_lifetime: 3600s

# 2. Increase API rate limits
api:
  rate_limit: 5000      # requests per minute
  burst: 10000          # burst capacity

# 3. Tune scheduler
scheduler:
  interval: 5s          # How often to check for jobs
  batch_size: 100       # Jobs to schedule per cycle

# 4. Increase worker heartbeat tolerance
worker:
  heartbeat_timeout: 30s
  max_missed_heartbeats: 5
  cleanup_interval: 60s

Worker Node Optimization

# Worker configuration (deployment/configs/worker-prod.env)

# 1. Maximize concurrent jobs (based on CPU cores)
MAX_CONCURRENT_JOBS=16     # Typically cores * 2

# 2. Faster heartbeat for responsiveness
HEARTBEAT_INTERVAL=30s     # Default: 30s (3 missed = 90s timeout)

# 3. Enable hardware acceleration
ENABLE_NVENC=true          # NVIDIA GPU
ENABLE_VAAPI=true          # Intel/AMD
ENABLE_QSV=true            # Intel Quick Sync

# 4. FFmpeg threading
FFMPEG_THREADS=0           # Auto-detect (recommended)

# 5. Resource limits (adjust based on hardware)
MAX_MEMORY_MB=32768        # 32GB
MAX_CPU_CORES=16

PostgreSQL Connection Pool Scaling

Critical for 50+ workers: Tune connection pool to prevent bottlenecks.

Default Configuration (up to 50 workers)

# config-postgres.yaml
database:
  max_open_conns: 25      # Total connections to database
  max_idle_conns: 5       # Idle connections kept warm
  conn_max_lifetime: 5m   # Recycle old connections
  conn_max_idle_time: 1m  # Close idle connections

Performance:

25 connections @ 2s poll rate = 50 workers max
Each query takes ~500ms including lock contention

Production Configuration (50-100 workers)

# config-postgres.yaml
database:
  max_open_conns: 50      # 2× default
  max_idle_conns: 10      # 2× idle pool
  conn_max_lifetime: 5m   # Keep existing
  conn_max_idle_time: 1m  # Keep existing

Performance:

50 connections @ 2s poll rate = 100 workers max
Handles burst load during job assignments

High-Scale Configuration (100-200 workers)

# config-postgres.yaml
database:
  max_open_conns: 100     # High concurrency
  max_idle_conns: 20      # Larger warm pool
  conn_max_lifetime: 3m   # Faster recycling
  conn_max_idle_time: 30s # Aggressive cleanup

Performance:

100 connections @ 2s poll rate = 200 workers max
Requires PostgreSQL max_connections = 150+

PostgreSQL Server Configuration

Match your connection pool to PostgreSQL limits:

# Edit /etc/postgresql/15/main/postgresql.conf
max_connections = 150          # 100 (app) + 50 (admin/monitoring)
shared_buffers = 4GB           # 25% of RAM (16GB server)
effective_cache_size = 12GB    # 75% of RAM
work_mem = 64MB                # Per-query memory
maintenance_work_mem = 512MB   # For VACUUM/ANALYZE

# Restart PostgreSQL
sudo systemctl restart postgresql

Connection Pool Sizing Formula

Required Connections = (Workers × Poll Frequency) / Avg Query Time

Example:
- 100 workers
- Poll every 2 seconds
- Avg query time: 500ms (0.5s)

Connections = (100 workers × 0.5 queries/sec) / (1 / 0.5s)
            = 50 queries/sec / 2 queries/conn/sec
            = 25 connections minimum

Add 20% overhead: 25 × 1.2 = 30 connections

Monitoring Connection Pool Health

# 1. Check PostgreSQL active connections
sudo -u postgres psql -c "
  SELECT state, COUNT(*) 
  FROM pg_stat_activity 
  WHERE datname = 'ffrtmp' 
  GROUP BY state;
"

# 2. Watch connection pool metrics (Prometheus)
curl localhost:9090/metrics | grep database_connections

# Expected output:
# database_connections_open 18
# database_connections_idle 4
# database_connections_in_use 14
# database_connections_wait_duration_ms 2.5

# 3. Check for connection starvation
# If wait_duration > 100ms, increase max_open_conns

Symptoms of Undersized Pool

Connection starvation signs:

Workers report "timeout waiting for connection"
Increased job assignment latency (>5s)
Prometheus metric: database_connections_wait_duration_ms > 100
PostgreSQL logs: "remaining connection slots reserved"

Fix: Increase max_open_conns by 50%

Connection exhaustion signs:

PostgreSQL error: "FATAL: sorry, too many clients already"
Master crashes with "connection refused"
Zero idle connections in pool

Fix: Increase PostgreSQL max_connections in postgresql.conf

Connection Pool Best Practices

Do:

Start with defaults (25 conns) and scale up as needed
Monitor database_connections_wait_duration_ms metric
Use conn_max_lifetime to prevent stale connections
Set PostgreSQL max_connections = 1.5× max_open_conns
Enable connection pool metrics in production

Don't:

Set max_open_conns higher than PostgreSQL allows
Use max_open_conns > 200 (indicates architectural issue)
Set max_idle_conns = 0 (causes reconnection overhead)
Ignore connection wait times in metrics

Scaling Beyond 200 Workers

If you need >200 workers, consider:

PgBouncer connection pooler

# Install PgBouncer between app and PostgreSQL
max_client_conn = 200      # Application connections
default_pool_size = 50     # Database connections
reserve_pool_size = 10     # Emergency connections

Redis job queue (architectural change)
- Offload job assignment to Redis
- Reduce database polling load
- See: docs/ARCHITECTURE_IMPROVEMENTS.md
Read replicas (for read-heavy queries)
- Route worker polling to read replicas
- Keep writes on primary
- Requires query routing logic

Worker Node Optimization (continued)

System-Level Tuning

# 1. Increase file descriptors
echo "* soft nofile 65536" | sudo tee -a /etc/security/limits.conf
echo "* hard nofile 65536" | sudo tee -a /etc/security/limits.conf

# 2. TCP tuning for high throughput
sudo tee -a /etc/sysctl.conf << EOF
net.core.rmem_max = 134217728
net.core.wmem_max = 134217728
net.ipv4.tcp_rmem = 4096 87380 67108864
net.ipv4.tcp_wmem = 4096 65536 67108864
EOF
sudo sysctl -p

# 3. Cgroups v2 optimization
# Adjust in systemd service files:
CPUQuota=1600%             # 16 cores = 1600%
MemoryMax=32G
IOWeight=1000

# 4. Disable swap for consistent performance (optional)
sudo swapoff -a

High-Throughput Configuration

For >10,000 jobs/day:

# 1. Horizontal scaling
# Add more workers:
./deployment/orchestration/rolling-update.sh \
  --workers worker1,worker2,worker3,worker4,worker5 \
  ...

# 2. Database optimization
# PostgreSQL connection pooling
database:
  max_connections: 200
  max_idle_connections: 50

# 3. Dedicated queue for high-priority jobs
# Use LIVE queue for low-latency jobs
./bin/ffrtmp jobs submit --scenario test --queue live

# 4. Redis for job queue (future enhancement)
# Currently uses database, Redis would be faster

# 5. Monitoring and auto-scaling
# Use Prometheus + Alertmanager to trigger scaling

Low-Latency Configuration

For sub-minute job processing:

# Master: Aggressive scheduling
scheduler:
  interval: 2s
  batch_size: 50

# Workers: Frequent polling
HEARTBEAT_INTERVAL=5s
JOB_TIMEOUT=300s

# Use LIVE queue
api:
  default_queue: live

Additional Resources

Documentation

docs/DEPLOYMENT_IMPROVEMENTS.md - Complete deployment system guide
docs/TLS_SETUP_GUIDE.md - TLS/SSL configuration and troubleshooting
docs/PRODUCTION_CHECKLIST.md - Production readiness checklist
DEPLOY_QUICKREF.md - Quick reference guide
ansible/ANSIBLE_GUIDE.md - Ansible automation guide
docs/API.md - REST API documentation
docs/ARCHITECTURE.md - System architecture overview
docs/SECURITY.md - Security best practices
CHANGELOG.md - Version history and breaking changes

Deployment Tools

Tool	Purpose	Documentation
`deployment-wizard.sh`	Interactive guided deployment	Built-in help
`deploy.sh`	Unified deployment script	`./deploy.sh --help`
`preflight-check.sh`	Pre-deployment validation	`docs/DEPLOYMENT_IMPROVEMENTS.md`
`health-check.sh`	Post-deployment verification	`docs/DEPLOYMENT_IMPROVEMENTS.md`
`blue-green-deploy.sh`	Zero-downtime deployment	`docs/DEPLOYMENT_IMPROVEMENTS.md`
`rolling-update.sh`	Worker rolling updates	`docs/DEPLOYMENT_IMPROVEMENTS.md`
`validate-and-rollback.sh`	Backup and restore	`docs/DEPLOYMENT_IMPROVEMENTS.md`
`generate-certs.sh`	TLS certificate generation	`docs/TLS_SETUP_GUIDE.md`

Configuration Files

File	Description	Template
`/etc/ffrtmp-master/config.yaml`	Master configuration	`deployment/configs/master-prod.yaml`
`/etc/ffrtmp/worker.env`	Worker environment	`deployment/configs/worker-prod.env`
`/etc/ffrtmp/watch-config.yaml`	Watch daemon config	`deployment/systemd/watch-config.yaml.example`

Service Management

# Check service status
sudo systemctl status ffrtmp-master
sudo systemctl status ffrtmp-worker
sudo systemctl status ffrtmp-watch

# View logs
sudo journalctl -u ffrtmp-master -f
sudo journalctl -u ffrtmp-worker -f
sudo journalctl -u ffrtmp-watch -f

# Restart services
sudo systemctl restart ffrtmp-master
sudo systemctl restart ffrtmp-worker
sudo systemctl restart ffrtmp-watch

# Enable/disable auto-start
sudo systemctl enable ffrtmp-master
sudo systemctl disable ffrtmp-worker

CLI Commands

# Job management
./bin/ffrtmp jobs submit --scenario test
./bin/ffrtmp jobs list
./bin/ffrtmp jobs get <job-id>
./bin/ffrtmp jobs cancel <job-id>

# Node management
./bin/ffrtmp nodes list
./bin/ffrtmp nodes get <node-id>
./bin/ffrtmp nodes stats

# System info
./bin/ffrtmp version
./bin/ffrtmp health

Support and Community

Issues: GitHub Issues
Discussions: GitHub Discussions
Security: See SECURITY.md for reporting vulnerabilities
Contributing: See CONTRIBUTING.md for contribution guidelines

Quick Reference by Use Case

I want to... Then use...

Goal	Command/Method
Deploy for the first time	`./deployment/deployment-wizard.sh`
Deploy master only	`sudo ./deploy.sh --master --non-interactive`
Deploy worker only	`sudo ./deploy.sh --worker --master-url URL --api-key KEY`
Check if system is ready	`./deployment/checks/preflight-check.sh --master`
Verify deployment worked	`./deployment/checks/health-check.sh --master`
Update master with no downtime	`./deployment/orchestration/blue-green-deploy.sh`
Update multiple workers	`./deployment/orchestration/rolling-update.sh`
Rollback to previous version	`./deployment/orchestration/blue-green-deploy.sh --rollback`
Deploy to 10+ servers	Ansible: `ansible-playbook playbooks/site.yml`
Generate TLS certificates	`./deployment/generate-certs.sh --type master`
Validate configuration	`./deployment/checks/config-validator.sh config.yaml`
Backup before upgrade	`./deployment/validate-and-rollback.sh --validate`
Restore from backup	`./deployment/validate-and-rollback.sh --rollback`
Test the system	`./bin/ffrtmp jobs submit --scenario test`
Check logs for errors	`sudo journalctl -u ffrtmp-master -n 100 --no-pager`
Monitor metrics	`curl http://localhost:9090/metrics`

Deployment Checklist

Pre-Deployment

Read docs/DEPLOYMENT_IMPROVEMENTS.md
Choose deployment method (wizard, manual, Ansible)
Run pre-flight checks
Prepare configuration files
Generate or obtain TLS certificates
Set up firewall rules
Configure backups

During Deployment

Post-Deployment

Production Readiness

Deployment Status

Version: 2.0.0+
Status: Production-Ready
Last Updated: 2026-01-07

What's New in Deployment System v2.0

Interactive deployment wizard
Pre-flight system validation
Post-deployment health checks
Configuration validation
Blue-green deployments (zero downtime)
Rolling worker updates
Automated backups and rollback
TLS certificate generation
Environment-specific configs (dev/prod)
Comprehensive documentation
GitHub Actions CI/CD integration
Ansible automation for multi-server deployments

Previous Versions

See CHANGELOG.md for complete version history.

Next Steps:

Quick Start: Run ./deployment/deployment-wizard.sh
Learn More: Read docs/DEPLOYMENT_IMPROVEMENTS.md
Advanced: Explore ansible/ANSIBLE_GUIDE.md
Troubleshoot: See troubleshooting section above
Get Help: Open a GitHub issue

**Happy Deploying! **

FilesExpand file tree

DEPLOY.md

Latest commit

History

DEPLOY.md

File metadata and controls

Production Deployment Guide

Quick Start (5 Minutes)

Option 1: Interactive Wizard (Easiest)

Option 2: Quick Deployment (Advanced Users)

Deployment Methods

Method 1: Interactive Wizard (Recommended)

Method 2: Unified Deployment Script

Method 3: Blue-Green Deployment (Zero Downtime)

Method 4: Rolling Updates (Worker Nodes)

Method 5: Ansible Automation (Multiple Servers)

Pre-Deployment Validation

Pre-flight Checks

Configuration Validation

🏥 Post-Deployment Verification

Health Checks

Configuration Management

Environment-Specific Configs

Using Configuration Templates

Deployment Scenarios

Scenario 1: Single Server (Development)

Scenario 2: Distributed Setup (Production)

Master Server

Worker Servers (3+ nodes)

Scenario 3: High-Availability Master (Blue-Green)

Scenario 4: Multi-Region Deployment (Ansible)

TLS/SSL Configuration

Automated Certificate Generation

Production Certificates

🗄️ Database Configuration

SQLite (Default - Development)

PostgreSQL (Production)

Monitoring and Observability

Built-in Metrics (Prometheus)

Health Endpoints

Grafana Dashboards

Prometheus Configuration

Log Aggregation

🔒 Security Hardening

Pre-Deployment Security Checklist

Essential Security Steps

Security Validation

System Requirements

Minimum (Development/Testing)

Recommended (Production Master)

Recommended (Production Worker)

Software Requirements

Optional Components

File and Directory Structure

Troubleshooting

Quick Diagnostics

Common Issues

Issue: Pre-flight checks fail

Issue: Master service won't start

Issue: Worker can't register with master

Issue: Jobs not processing

Issue: Health checks fail after deployment

Issue: Deployment hangs or times out

Rollback Procedures

Rollback Master (Blue-Green)

Rollback Worker

Getting Help

Maintenance and Operations

Regular Maintenance Tasks

Daily

Weekly

Monthly

Log Management

Backup and Restore

Automated Backups

Manual Backup

Restore from Backup

Database Maintenance

SQLite

PostgreSQL