Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
65 changes: 65 additions & 0 deletions docs/concepts/scaling.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
---
title: Scaling
description: Defang can help you handle service irregular loads.
sidebar_position: 375
---

# Scaling

Scaling is the process of adjusting the number of instances (or replicas) of a service to meet the current demand. Services that receive requests—such as APIs, workers, or background jobs—can be scaled up or down to optimize performance, availability, and cost.

Scaling is a core concept in distributed systems and cloud-native applications. It ensures your system can handle varying workloads without degrading user experience or over-provisioning resources.

## Why Scale?

Scaling enables services to respond effectively under different conditions:

- **High Traffic**: When demand spikes, scaling up ensures your service can process more requests in parallel.
- **Cost Optimization**: Scaling down during periods of low demand helps reduce unnecessary resource usage and cloud costs.
- **Fault Tolerance**: Multiple instances of a service provide redundancy in case of instance failure.
- **Throughput & Latency**: Additional instances can reduce response times and increase the number of operations your service can perform per second.

## Types of Scaling

There are two main ways to scale a service:

- **Horizontal Scaling**: Adds or removes instances of a service. This is the most common approach for stateless services.
- **Vertical Scaling**: Increases or decreases the resources (CPU, memory) available to a single instance.

In most modern deployments, horizontal scaling is preferred because it aligns well with cloud-native principles and is easier to automate and distribute.

## Auto-Scaling

**Auto-scaling** refers to automatically adjusting the number of service instances based on defined policies or metrics.

Instead of manually adding more instances when traffic increases, an auto-scaling system watches key indicators (like CPU usage) and takes action in real time.

### How It Works

Auto-scaling systems typically rely on:

- **Metrics Collection**: Real-time monitoring of system metrics.
- **Scaling Policies**: Rules that define when to scale up or down. For example:
- If average CPU > 85% for 5 minutes → scale up by 2 instances.
- **Cooldown Periods**: Delays between scaling events to prevent rapid, repeated changes (flapping).

### Supported Platforms

| Platform | Auto-Scaling Support |
|----------------|:----------------------:|
| Playground | ❌ |
| AWS | ✅ |
| DigitalOcean | ❌ |
| GCP | ✅ |

### Benefits of Auto-Scaling

- **Elasticity**: Automatically adapts to changing workloads.
- **Resilience**: Helps maintain performance during traffic surges or partial outages.
- **Efficiency**: Reduces the need for manual intervention or over-provisioning.

### Considerations

- Ensure services are **stateless** or use **externalized state** (e.g., databases, caches) for smooth scaling. ([12 Factor App](https://12factor.net/processes))
- Test services under load to identify scaling bottlenecks.

42 changes: 39 additions & 3 deletions docs/tutorials/scaling-your-services.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -27,14 +27,14 @@ services:
deploy:
resources:
reservations:
cpus: '2'
memory: '512M'
cpus: "2"
memory: "512M"
```

The minimum resources which can be reserved:

| Resource | Minimum |
|----------|---------|
| -------- | ------- |
| CPUs | 0.5 |
| Memory | 512M |

Expand All @@ -57,3 +57,39 @@ services:
deploy:
replicas: 3
```

## Autoscaling Your Services

Autoscaling allows your services to automatically adjust the number of replicas based on CPU usage — helping you scale up during traffic spikes and scale down during quieter periods.

> **Note:** Autoscaling is only available to **Pro** tier or higher users.

### Enabling Autoscaling

To enable autoscaling for a service, add the `x-defang-autoscaling: true` extension under the service definition in your `compose.yaml` file.

Example:

```yaml
services:
web:
image: myorg/web:latest
ports:
- 80:80
x-defang-autoscaling: true
```

Once deployed, your services' CPU usage is monitored for how much load it is handling, sustained high loads will result in more replicas being started.

Requirements

- BYOC, your own cloud platform account.
- You must be on the Pro or higher plan to use autoscaling. ([Defang plans](https://defang.io/#pricing))
- Only staging and production deployment modes supported. ([Deployment modes](/docs/concepts/deployment-modes))
- The service must be stateless or able to run in multiple instances. ([Scaling](/docs/concepts/scaling))
- Only CPU metrics are used for scaling decisions.

Best Practices

- Design your services to be horizontally scalable. ([12 Factor App](https://12factor.net/processes))
- Use shared or external storage if your service writes data. (e.g. Postgres or Redis [managed services](/docs/concepts/managed-storage) )
Loading