Fly.io autoscaling works at the Machine level — Machines are started and stopped based on demand. Unlike traditional autoscalers that take minutes, Fly Machines can start in ~300ms, making fine-grained autoscaling practical.
Two mechanisms:
- Auto start/stop — automatic based on traffic
- Manual scaling — you control Machine count per region
Configured in fly.toml:
[http_service]
auto_stop_machines = true # Stop when idle
auto_start_machines = true # Start on incoming request
min_machines_running = 0 # Allow full scale-to-zero- When traffic drops to zero, idle Machines stop (no CPU cost)
- When a new request arrives, Fly starts a Machine (~300ms)
- The request is held briefly while the Machine boots
[http_service]
auto_stop_machines = false
min_machines_running = 1Fly can automatically start new Machines when load per Machine exceeds a threshold:
[http_service.concurrency]
type = "requests" # or "connections"
soft_limit = 25 # Start new instances at this concurrency
hard_limit = 30 # Reject above thissoft_limit— Fly starts another Machine when concurrency exceeds thishard_limit— Fly rejects connections above this (returns 503)- New Machines are started in the same region as the load
fly scale count 3 # 3 Machines globally
fly scale count 2 --region ams # 2 in Amsterdam
fly scale count 1 --region lhr # 1 in Londonfly scale vm shared-cpu-1x # 1 shared CPU
fly scale vm performance-2x # 2 dedicated CPUs
fly scale memory 1024 # 1GB RAMfly status
fly scale showFly.io has 18+ regions. Deploy to multiple regions for:
- Lower latency for global users
- Redundancy / high availability
fly platform regionsCommon regions:
| Code | City |
|---|---|
ams |
Amsterdam |
lhr |
London |
iad |
Washington DC |
syd |
Sydney |
sin |
Singapore |
nrt |
Tokyo |
gru |
São Paulo |
fly scale count 1 --region lhr
fly scale count 1 --region sydfly scale count 0 --region sydThe primary_region in fly.toml is where:
- New Machines are created by default
- Postgres writes are routed (if using
fly-replay) - The app is "home"
primary_region = "ams"Fly automatically routes incoming requests to the nearest healthy Machine using Anycast routing. No configuration needed — it just works.
If a region has no running Machine and auto-start is enabled, the request wakes a Machine in that region (or falls back to another region).
For stateless apps: multi-region is plug-and-play.
For stateful apps (databases), use the fly-replay pattern:
Request arrives in lhr
↓
Machine in lhr handles read (from local replica)
↓
Write request? → Reply with: fly-replay: region=ams
↓
Fly proxy retries the request in ams (primary)
- Set
auto_stop_machines = trueandmin_machines_running = 0infly.toml. - Deploy.
- Wait ~2 minutes with no traffic, then check
fly status— Machine should be stopped. - Visit your app URL — Machine should wake up.
- Run
fly scale count 2. - Run
fly statusto confirm 2 Machines are running. - Hit your app multiple times and check logs — you should see requests on different Machine IDs.
- Run
fly scale count 1 --region lhr(or another region). - Run
fly status— confirm Machines in 2 regions. - Run
fly scale count 0 --region lhrto remove it.
→ Continue to 1000 — Zero-Downtime Deploys