Skip to content

Latest commit

 

History

History
474 lines (335 loc) · 10.4 KB

File metadata and controls

474 lines (335 loc) · 10.4 KB

Kamal Deployment Guide

This guide explains how to deploy applications using Kamal with automatic secrets management.

Overview

Kamal orchestrates Docker deployments to your servers:

  • Zero-downtime deploys with rolling restarts
  • Automatic SSL via Let's Encrypt and Traefik proxy
  • Health checks and automatic rollback
  • Multi-service support (Python API, TypeScript web/API, databases)
  • Secrets from 1Password automatically injected

All managed via: make kamal ARGS="<service> <stage> <command>"

Prerequisites

Before deploying, ensure:

  1. Infrastructure Deployed - Server and DNS configured
  2. Container Registry - Docker repositories created
  3. 1Password Configured - Stage-specific secrets in vault

Install Kamal

gem install kamal

Verify:

kamal version

Service Configuration

Services are defined in config/deploy/:

config/deploy/
├── py.yml        # Python FastAPI + PostgreSQL
├── ts-web.yml    # TypeScript Vite web app
└── static.yml    # Static file service (optional)

Each service config specifies:

  • Docker image name and registry
  • Server hostnames (from Terraform outputs)
  • Environment variables (from 1Password)
  • Health check endpoints
  • Proxy configuration (Traefik)
  • Accessories (databases, Redis, etc.)

Deployment Workflow

First-Time Server Bootstrap

CRITICAL: Before deploying ANY service to a fresh server, you MUST bootstrap the Kamal infrastructure.

Step 1: Bootstrap Server

The first time you deploy to a server, run setup to install Kamal infrastructure:

# Bootstrap server with Kamal infrastructure
make kamal ARGS="py production setup"

This installs:

  • Traefik reverse proxy - Handles SSL and routing
  • Docker networks - For container communication
  • Required directories - For logs, caches, volumes

You only need to run setup once per server, not once per service.

Step 2: Boot Accessories (If Applicable)

If your service has accessories (databases, Redis, etc.), boot them BEFORE deploying the app:

# For Python app with PostgreSQL
make kamal ARGS="py production accessory boot postgres"

Check which accessories a service has by looking at its config file:

# Check py.yml for accessories
grep -A 10 "accessories:" config/deploy/py.yml

Step 3: Deploy the Application

Now you can deploy the actual application:

# Deploy Python API
make kamal ARGS="py production deploy"

# Deploy TypeScript web app
make kamal ARGS="ts-web production deploy"

Complete First-Time Workflow

For a fresh server with multiple services:

# 1. Bootstrap server (only needed once)
make kamal ARGS="py production setup"

# 2. Boot database for Python app
make kamal ARGS="py production accessory boot postgres"

# 3. Deploy Python app
make kamal ARGS="py production deploy"

# 4. Deploy web app (no setup needed - server already bootstrapped)
make kamal ARGS="ts-web production deploy"

Subsequent deployments only need the deploy command:

make kamal ARGS="py production deploy"
make kamal ARGS="ts-web production deploy"

Build and Deploy

# Deploy Python API
make kamal ARGS="py production deploy"

# Deploy TypeScript web app
make kamal ARGS="ts-web production deploy"

The deploy process:

  1. Builds Docker image locally
  2. Pushes to container registry
  3. Pulls image on server
  4. Runs health checks
  5. Switches traffic to new version
  6. Removes old containers

Viewing Logs

# View Python API logs
make kamal ARGS="py production logs"

# View web app logs
make kamal ARGS="ts-web production logs"

# Follow logs (live tail)
make kamal ARGS="py production logs --follow"

# View specific number of lines
make kamal ARGS="py production logs --lines 100"

Managing Services

# Stop service
make kamal ARGS="py production stop"

# Start service
make kamal ARGS="py production start"

# Restart service
make kamal ARGS="py production restart"

# Rollback to previous version
make kamal ARGS="py production rollback"

Service Details

Python API Service

Config: config/deploy/py.yml

Includes:

  • FastAPI application (port 8000)
  • PostgreSQL database (accessory)
  • Google OAuth environment variables
  • Health check on /health
  • SSL via Traefik

Environment Variables: Automatically loaded from 1Password vault <project>-production:

  • GOOGLE_O_AUTH_CLIENT_ID
  • GOOGLE_O_AUTH_CLIENT_SECRET
  • POSTGRES_URL (generated by database accessory)

Database Management:

# Start database only
make kamal ARGS="py production accessory boot postgres"

# Stop database
make kamal ARGS="py production accessory stop postgres"

# View database logs
make kamal ARGS="py production accessory logs postgres"

# Execute command in database
make kamal ARGS="py production accessory exec postgres psql -U <dbname>"

Deploying:

# First time (includes database setup)
make kamal ARGS="py production setup"
make kamal ARGS="py production accessory boot postgres"
make kamal ARGS="py production deploy"

# Subsequent deployments
make kamal ARGS="py production deploy"

The deploy automatically runs Alembic migrations via the startup script.

TypeScript Web Service

Config: config/deploy/ts-web.yml

Includes:

  • Vite React application (port 5173)
  • Static asset serving
  • Health check on /
  • SSL via Traefik

Environment Variables: Build-time environment variables (if needed):

  • VITE_API_URL - Backend API URL

Deploying:

# First time
make kamal ARGS="ts-web production setup"
make kamal ARGS="ts-web production deploy"

# Subsequent deployments
make kamal ARGS="ts-web production deploy"

Common Operations

Executing Commands

Run commands inside containers:

# Python: Run database migration manually
make kamal ARGS="py production app exec ./bin/db.sh migrate"

# Python: Open Python shell
make kamal ARGS="py production app exec uv run python"

# Python: Check environment
make kamal ARGS="py production app exec env"

# Database: Connect to PostgreSQL
make kamal ARGS="py production accessory exec postgres psql -U <dbname>"

Viewing Status

# Show running containers
make kamal ARGS="py production ps"

# Show all details (containers, health, etc.)
make kamal ARGS="py production details"

# Show audit log (recent deployments)
make kamal ARGS="py production audit"

Image Management

# List images on server
make kamal ARGS="py production images"

# Remove old images (free up space)
make kamal ARGS="py production prune all"

Config Validation

# Validate config file before deploying
kamal config validate -c config/deploy/py.yml

# Show rendered config (with secrets redacted)
kamal config show -c config/deploy/py.yml

Troubleshooting

"Traefik container not found" or "No proxy running"

You forgot to bootstrap the server. Run setup first:

make kamal ARGS="py production setup"

This must be run once before any deployments on a fresh server.

"Database connection refused" (Python app)

You forgot to boot the PostgreSQL accessory. Boot it before deploying:

# Boot the database
make kamal ARGS="py production accessory boot postgres"

# Then deploy
make kamal ARGS="py production deploy"

"Container won't start"

Check logs:

make kamal ARGS="py production logs"

Common issues:

  • Missing environment variable: Check 1Password vault has required secrets
  • Health check failing: Check health endpoint returns 200 OK
  • Port conflict: Ensure no other containers using the port
  • Accessory not running: Boot accessories first

"Health check failed"

The deployment will fail if health checks don't pass. To debug:

# SSH to server and check container
bin/ssh production
docker ps -a  # See if container is running
docker logs <container-name>  # View container logs

# Check health endpoint manually
curl http://localhost:8000/health

"Cannot pull image"

Authentication issues with container registry:

# Check Docker Hub credentials in 1Password
bin/vault read DOCKER_HUB_USERNAME
bin/vault read DOCKER_HUB_PASSWORD

# Re-authenticate on server
bin/ssh production
docker login -u USERNAME -p PASSWORD

"SSH connection failed"

Check SSH key and server IP:

# Verify infrastructure outputs
make iac production output server_ip
make iac production output -raw ssh_private_key

# Test SSH manually
bin/ssh production

"SSL certificate not working"

Traefik handles SSL via Let's Encrypt. Common issues:

# Check Traefik logs
bin/ssh production
docker logs <traefik-container>

# Verify DNS points to server
dig yourdomain.com  # Should return SERVER_IP

# Check Traefik dashboard (if enabled)
curl http://localhost:8080/dashboard/

Note: Let's Encrypt requires:

  • Domain must resolve to server IP
  • Port 80/443 must be accessible
  • Valid email address in Kamal config

Best Practices

1. Test Locally First

Before deploying:

# Build and test locally
make build
make test
make check

2. Use Staging

Always test on staging before production:

# Deploy to staging
make kamal ARGS="py staging deploy"

# Test thoroughly

# Deploy to production
make kamal ARGS="py production deploy"

3. Monitor Deployments

Watch logs during deployment:

# In one terminal
make kamal ARGS="py production deploy"

# In another terminal
make kamal ARGS="py production logs --follow"

4. Keep Images Small

Optimize Dockerfiles:

  • Use multi-stage builds
  • Minimize layers
  • Remove build dependencies in final image

5. Regular Cleanup

Remove old images to save disk space:

# Weekly or after major deployments
make kamal ARGS="py production prune all"
make kamal ARGS="ts-web production prune all"

6. Backup Databases

Before major updates:

# Backup PostgreSQL
bin/ssh production
docker exec <postgres-container> pg_dump -U <dbname> <dbname> > backup.sql

Next Steps

After successful deployment:

  1. Verify services: Visit your domain (https://yourdomain.com)
  2. Check logs: make kamal ARGS="<service> production logs"
  3. Monitor resources: bin/ssh production && docker stats
  4. Set up monitoring: Consider Uptime Robot, Sentry, etc.
  5. Configure backups: Set up automated database backups

For local development workflows, see Local Development Guide.