Skip to content

Commit 1497258

Browse files
thomas-tacquetnerda-codesRoRoJ
authored
feat(serverless): scaling doc (#5024)
* feat(serverless): scaling doc * autoscaler * Apply suggestions from code review Co-authored-by: Rowena Jones <[email protected]> --------- Co-authored-by: Néda <[email protected]> Co-authored-by: Rowena Jones <[email protected]>
1 parent 23b9944 commit 1497258

File tree

3 files changed

+15
-25
lines changed

3 files changed

+15
-25
lines changed

pages/serverless-containers/reference-content/containers-autoscaling.mdx

Lines changed: 7 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -46,19 +46,16 @@ When the maximum scale is reached, new requests are queued for processing. When
4646

4747
### Autoscaler behavior
4848

49-
The autoscaler decides to start new instances when:
49+
The autoscaler decides to add new instances (scale up) when the number of concurrent requests defined (default is `80`) is reached.
5050

51-
- the existing instances are no longer able to handle the load because they are busy responding to other ongoing requests. By default, this happens if an instance is already processing 80 requests (max_concurrency = 80).
52-
53-
- our system detects an unusual number of requests. In this case, some instances may be started in anticipation to avoid a potential cold start.
51+
The same autoscaler decides to remove instances (scale down) down to `1` when no more requests are received for 30 seconds.
5452

55-
The same autoscaler decides to remove instances when:
56-
57-
- no more requests are being processed. If even a single request is being processed (or detected as being processed), then the autoscaler will not be able to remove this instance. The system also prioritizes instances with the fewest ongoing requests, or if very few requests are being sent, it tries to select a particular instance to shut down the others, and therefore scale down.
58-
- an instance has not responded to a request for more than 15 minutes of inactivity. The instance is only shut down after this interval, once again to absorb any potential new peaks and thus avoid the cold start. These 15 minutes of inactivity are not configurable.
53+
Scaling down to zero (if min-scale is set to `0`) happens after 15 minutes of inactivity.
5954

6055
<Message type="note">
61-
Redeploying your resource results in the termination of existing instances and a return to the minimum scale.
56+
Redeploying your resource does not entail downtime. Instances are gradually replaced with new ones.
57+
58+
Old instances remain running to handle traffic, while new instances are brought up and verified before fully replacing the old ones. This method helps maintain application availability and service continuity throughout the update process.
6259
</Message>
6360

6461
## CPU and RAM percentage
@@ -81,4 +78,4 @@ This parameter sets the maximum number of instances of your resource. You should
8178

8279
The autoscaler decides to start new instances when the existing instances' CPU or RAM usage exceeds the threshold you defined for a certain amount of time.
8380

84-
The same autoscaler decides to remove existing instances when the CPU or RAM usage of certain instances is reduced, and the remaining instances' usage does not exceed the threshold.
81+
The same autoscaler decides to remove existing instances when the CPU or RAM usage of certain instances is reduced, and the remaining instances' usage does not exceed the threshold.

pages/serverless-functions/concepts.mdx

Lines changed: 1 addition & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -17,11 +17,6 @@ categories:
1717
Autoscaling refers to the ability of Serverless Functions to automatically adjust the number of instances without manual intervention.
1818
Scaling mechanisms ensure that resources are provisioned dynamically to handle incoming requests efficiently while minimizing idle capacity and cost.
1919

20-
Autoscaling parameters are [min-scale](/serverless-functions/concepts/#min-scale) and [max-scale](/serverless-functions/concepts/#max-scale). Available scaling policies are:
21-
* **Concurrent requests:** requests incoming to the resource at the same time. Default value suitable for most use cases.
22-
* **CPU usage:** to scale based on CPU percentage, suitable for intensive CPU workloads.
23-
* **RAM usage** to scale based on RAM percentage, suitable for memory-intensive workloads.
24-
2520
## Build step
2621

2722
Before deploying Serverless Functions, they have to be built. This step occurs during deployment.
@@ -215,4 +210,4 @@ Triggers can take many forms, such as HTTP requests, messages from a queue or a
215210

216211
## vCPU-s
217212

218-
Unit used to measure the resource consumption of a container. It reflects the amount of vCPU used over time.
213+
Unit used to measure the resource consumption of a container. It reflects the amount of vCPU used over time.

pages/serverless-functions/reference-content/functions-autoscaling.mdx

Lines changed: 7 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -38,16 +38,14 @@ When the maximum scale is reached, new requests are queued for processing. When
3838

3939
### Autoscaler behavior
4040

41-
The autoscaler decides to start new instances when:
41+
The autoscaler decides to add new instances (scale up) for each concurrent request. For example, 5 concurrent requests will generate 5 Serverless Functions instances. This parameter can be customized on Serverless Containers but not on Serverless Functions.
4242

43-
- the existing instances are no longer able to handle the load because they are busy responding to other ongoing requests. By default, this happens if an instance is already processing 80 requests (max_concurrency = 80).
44-
- our system detects an unusual number of requests. In this case, some instances may be started in anticipation to avoid a potential cold start.
43+
The same autoscaler decides to remove instances (scale down) down to `1` when no more requests are received for 30 seconds.
4544

46-
The same autoscaler decides to remove instances when:
47-
48-
- no more requests are being processed. If even a single request is being processed (or detected as being processed), then the autoscaler will not be able to remove this instance. The system also prioritizes instances with the fewest ongoing requests, or if very few requests are being sent, it tries to select a particular instance to shut down the others, and therefore scale down.
49-
- an instance has not responded to a request for more than 15 minutes of inactivity. The instance is only shut down after this interval, once again to absorb any potential new peaks and thus avoid the cold start. These 15 minutes of inactivity are not configurable.
45+
Scaling down to zero (if min-scale is set to `0`) happens after 15 minutes of inactivity.
5046

5147
<Message type="note">
52-
Redeploying your resource results in the termination of existing instances and a return to the min scale, which you observe when redeploying.
53-
</Message>
48+
Redeploying your resource does not entail downtime. Instances are gradually replaced with new ones.
49+
50+
Old instances remain running to handle traffic, while new instances are brought up and verified before fully replacing the old ones. This method helps maintain application availability and service continuity throughout the update process.
51+
</Message>

0 commit comments

Comments
 (0)