diff --git a/docs/concepts/scaling.md b/docs/concepts/scaling.md new file mode 100644 index 000000000..a94938e19 --- /dev/null +++ b/docs/concepts/scaling.md @@ -0,0 +1,65 @@ +--- +title: Scaling +description: Defang can help you handle service irregular loads. +sidebar_position: 375 +--- + +# Scaling + +Scaling is the process of adjusting the number of instances (or replicas) of a service to meet the current demand. Services that receive requests—such as APIs, workers, or background jobs—can be scaled up or down to optimize performance, availability, and cost. + +Scaling is a core concept in distributed systems and cloud-native applications. It ensures your system can handle varying workloads without degrading user experience or over-provisioning resources. + +## Why Scale? + +Scaling enables services to respond effectively under different conditions: + +- **High Traffic**: When demand spikes, scaling up ensures your service can process more requests in parallel. +- **Cost Optimization**: Scaling down during periods of low demand helps reduce unnecessary resource usage and cloud costs. +- **Fault Tolerance**: Multiple instances of a service provide redundancy in case of instance failure. +- **Throughput & Latency**: Additional instances can reduce response times and increase the number of operations your service can perform per second. + +## Types of Scaling + +There are two main ways to scale a service: + +- **Horizontal Scaling**: Adds or removes instances of a service. This is the most common approach for stateless services. +- **Vertical Scaling**: Increases or decreases the resources (CPU, memory) available to a single instance. + +In most modern deployments, horizontal scaling is preferred because it aligns well with cloud-native principles and is easier to automate and distribute. + +## Auto-Scaling + +**Auto-scaling** refers to automatically adjusting the number of service instances based on defined policies or metrics. + +Instead of manually adding more instances when traffic increases, an auto-scaling system watches key indicators (like CPU usage) and takes action in real time. + +### How It Works + +Auto-scaling systems typically rely on: + +- **Metrics Collection**: Real-time monitoring of system metrics. +- **Scaling Policies**: Rules that define when to scale up or down. For example: + - If average CPU > 85% for 5 minutes → scale up by 2 instances. +- **Cooldown Periods**: Delays between scaling events to prevent rapid, repeated changes (flapping). + +### Supported Platforms + +| Platform | Auto-Scaling Support | +|----------------|:----------------------:| +| Playground | ❌ | +| AWS | ✅ | +| DigitalOcean | ❌ | +| GCP | ✅ | + +### Benefits of Auto-Scaling + +- **Elasticity**: Automatically adapts to changing workloads. +- **Resilience**: Helps maintain performance during traffic surges or partial outages. +- **Efficiency**: Reduces the need for manual intervention or over-provisioning. + +### Considerations + +- Ensure services are **stateless** or use **externalized state** (e.g., databases, caches) for smooth scaling. ([12 Factor App](https://12factor.net/processes)) +- Test services under load to identify scaling bottlenecks. + \ No newline at end of file diff --git a/docs/tutorials/scaling-your-services.mdx b/docs/tutorials/scaling-your-services.mdx index 6186ba8e3..5dd7035bb 100644 --- a/docs/tutorials/scaling-your-services.mdx +++ b/docs/tutorials/scaling-your-services.mdx @@ -27,14 +27,14 @@ services: deploy: resources: reservations: - cpus: '2' - memory: '512M' + cpus: "2" + memory: "512M" ``` The minimum resources which can be reserved: | Resource | Minimum | -|----------|---------| +| -------- | ------- | | CPUs | 0.5 | | Memory | 512M | @@ -57,3 +57,39 @@ services: deploy: replicas: 3 ``` + +## Autoscaling Your Services + +Autoscaling allows your services to automatically adjust the number of replicas based on CPU usage — helping you scale up during traffic spikes and scale down during quieter periods. + +> **Note:** Autoscaling is only available to **Pro** tier or higher users. + +### Enabling Autoscaling + +To enable autoscaling for a service, add the `x-defang-autoscaling: true` extension under the service definition in your `compose.yaml` file. + +Example: + +```yaml +services: + web: + image: myorg/web:latest + ports: + - 80:80 + x-defang-autoscaling: true +``` + +Once deployed, your services' CPU usage is monitored for how much load it is handling, sustained high loads will result in more replicas being started. + +Requirements + +- BYOC, your own cloud platform account. +- You must be on the Pro or higher plan to use autoscaling. ([Defang plans](https://defang.io/#pricing)) +- Only staging and production deployment modes supported. ([Deployment modes](/docs/concepts/deployment-modes)) +- The service must be stateless or able to run in multiple instances. ([Scaling](/docs/concepts/scaling)) +- Only CPU metrics are used for scaling decisions. + +Best Practices + +- Design your services to be horizontally scalable. ([12 Factor App](https://12factor.net/processes)) +- Use shared or external storage if your service writes data. (e.g. Postgres or Redis [managed services](/docs/concepts/managed-storage) )