Skip to content

Commit e49421f

Browse files
authored
Merge pull request #200 from DefangLabs/eric-auto-scaling-text
Add section about autoscaling Add concepts page about scaling
2 parents 1c96337 + 73c1c97 commit e49421f

File tree

2 files changed

+104
-3
lines changed

2 files changed

+104
-3
lines changed

docs/concepts/scaling.md

Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
---
2+
title: Scaling
3+
description: Defang can help you handle service irregular loads.
4+
sidebar_position: 375
5+
---
6+
7+
# Scaling
8+
9+
Scaling is the process of adjusting the number of instances (or replicas) of a service to meet the current demand. Services that receive requests—such as APIs, workers, or background jobs—can be scaled up or down to optimize performance, availability, and cost.
10+
11+
Scaling is a core concept in distributed systems and cloud-native applications. It ensures your system can handle varying workloads without degrading user experience or over-provisioning resources.
12+
13+
## Why Scale?
14+
15+
Scaling enables services to respond effectively under different conditions:
16+
17+
- **High Traffic**: When demand spikes, scaling up ensures your service can process more requests in parallel.
18+
- **Cost Optimization**: Scaling down during periods of low demand helps reduce unnecessary resource usage and cloud costs.
19+
- **Fault Tolerance**: Multiple instances of a service provide redundancy in case of instance failure.
20+
- **Throughput & Latency**: Additional instances can reduce response times and increase the number of operations your service can perform per second.
21+
22+
## Types of Scaling
23+
24+
There are two main ways to scale a service:
25+
26+
- **Horizontal Scaling**: Adds or removes instances of a service. This is the most common approach for stateless services.
27+
- **Vertical Scaling**: Increases or decreases the resources (CPU, memory) available to a single instance.
28+
29+
In most modern deployments, horizontal scaling is preferred because it aligns well with cloud-native principles and is easier to automate and distribute.
30+
31+
## Auto-Scaling
32+
33+
**Auto-scaling** refers to automatically adjusting the number of service instances based on defined policies or metrics.
34+
35+
Instead of manually adding more instances when traffic increases, an auto-scaling system watches key indicators (like CPU usage) and takes action in real time.
36+
37+
### How It Works
38+
39+
Auto-scaling systems typically rely on:
40+
41+
- **Metrics Collection**: Real-time monitoring of system metrics.
42+
- **Scaling Policies**: Rules that define when to scale up or down. For example:
43+
- If average CPU > 85% for 5 minutes → scale up by 2 instances.
44+
- **Cooldown Periods**: Delays between scaling events to prevent rapid, repeated changes (flapping).
45+
46+
### Supported Platforms
47+
48+
| Platform | Auto-Scaling Support |
49+
|----------------|:----------------------:|
50+
| Playground ||
51+
| AWS ||
52+
| DigitalOcean ||
53+
| GCP ||
54+
55+
### Benefits of Auto-Scaling
56+
57+
- **Elasticity**: Automatically adapts to changing workloads.
58+
- **Resilience**: Helps maintain performance during traffic surges or partial outages.
59+
- **Efficiency**: Reduces the need for manual intervention or over-provisioning.
60+
61+
### Considerations
62+
63+
- Ensure services are **stateless** or use **externalized state** (e.g., databases, caches) for smooth scaling. ([12 Factor App](https://12factor.net/processes))
64+
- Test services under load to identify scaling bottlenecks.
65+

docs/tutorials/scaling-your-services.mdx

Lines changed: 39 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -27,14 +27,14 @@ services:
2727
deploy:
2828
resources:
2929
reservations:
30-
cpus: '2'
31-
memory: '512M'
30+
cpus: "2"
31+
memory: "512M"
3232
```
3333
3434
The minimum resources which can be reserved:
3535
3636
| Resource | Minimum |
37-
|----------|---------|
37+
| -------- | ------- |
3838
| CPUs | 0.5 |
3939
| Memory | 512M |
4040
@@ -57,3 +57,39 @@ services:
5757
deploy:
5858
replicas: 3
5959
```
60+
61+
## Autoscaling Your Services
62+
63+
Autoscaling allows your services to automatically adjust the number of replicas based on CPU usage — helping you scale up during traffic spikes and scale down during quieter periods.
64+
65+
> **Note:** Autoscaling is only available to **Pro** tier or higher users.
66+
67+
### Enabling Autoscaling
68+
69+
To enable autoscaling for a service, add the `x-defang-autoscaling: true` extension under the service definition in your `compose.yaml` file.
70+
71+
Example:
72+
73+
```yaml
74+
services:
75+
web:
76+
image: myorg/web:latest
77+
ports:
78+
- 80:80
79+
x-defang-autoscaling: true
80+
```
81+
82+
Once deployed, your services' CPU usage is monitored for how much load it is handling, sustained high loads will result in more replicas being started.
83+
84+
Requirements
85+
86+
- BYOC, your own cloud platform account.
87+
- You must be on the Pro or higher plan to use autoscaling. ([Defang plans](https://defang.io/#pricing))
88+
- Only staging and production deployment modes supported. ([Deployment modes](/docs/concepts/deployment-modes))
89+
- The service must be stateless or able to run in multiple instances. ([Scaling](/docs/concepts/scaling))
90+
- Only CPU metrics are used for scaling decisions.
91+
92+
Best Practices
93+
94+
- Design your services to be horizontally scalable. ([12 Factor App](https://12factor.net/processes))
95+
- Use shared or external storage if your service writes data. (e.g. Postgres or Redis [managed services](/docs/concepts/managed-storage) )

0 commit comments

Comments
 (0)