feat(serverless): scaling doc (#5024)

thomas-tacquet · nerda-codes · RoRoJ · web-flow · commit 149725807181 · 2025-05-27T14:11:15.000+02:00
* feat(serverless): scaling doc

* autoscaler

* Apply suggestions from code review

Co-authored-by: Rowena Jones &lt;36301604+RoRoJ@users.noreply.github.com&gt;

---------

Co-authored-by: Néda &lt;87707325+nerda-codes@users.noreply.github.com&gt;
Co-authored-by: Rowena Jones &lt;36301604+RoRoJ@users.noreply.github.com&gt;
diff --git a/pages/serverless-containers/reference-content/containers-autoscaling.mdx b/pages/serverless-containers/reference-content/containers-autoscaling.mdx
@@ -46,19 +46,16 @@ When the maximum scale is reached, new requests are queued for processing. When
 
 ### Autoscaler behavior
 
-The autoscaler decides to start new instances when:
+The autoscaler decides to add new instances (scale up) when the number of concurrent requests defined (default is `80`) is reached.
 
-  - the existing instances are no longer able to handle the load because they are busy responding to other ongoing requests. By default, this happens if an instance is already processing 80 requests (max_concurrency = 80).
-  
-  - our system detects an unusual number of requests. In this case, some instances may be started in anticipation to avoid a potential cold start.
+The same autoscaler decides to remove instances (scale down) down to `1` when no more requests are received for 30 seconds.
 
-The same autoscaler decides to remove instances when:
-
-  - no more requests are being processed. If even a single request is being processed (or detected as being processed), then the autoscaler will not be able to remove this instance. The system also prioritizes instances with the fewest ongoing requests, or if very few requests are being sent, it tries to select a particular instance to shut down the others, and therefore scale down.
-  - an instance has not responded to a request for more than 15 minutes of inactivity. The instance is only shut down after this interval, once again to absorb any potential new peaks and thus avoid the cold start. These 15 minutes of inactivity are not configurable.
+Scaling down to zero (if min-scale is set to `0`) happens after 15 minutes of inactivity.
 
 <Message type="note">
-Redeploying your resource results in the termination of existing instances and a return to the minimum scale.
+Redeploying your resource does not entail downtime. Instances are gradually replaced with new ones.
+
+Old instances remain running to handle traffic, while new instances are brought up and verified before fully replacing the old ones. This method helps maintain application availability and service continuity throughout the update process.
 </Message>
 
 ## CPU and RAM percentage
@@ -81,4 +78,4 @@ This parameter sets the maximum number of instances of your resource. You should
 
 The autoscaler decides to start new instances when the existing instances' CPU or RAM usage exceeds the threshold you defined for a certain amount of time.
 
-The same autoscaler decides to remove existing instances when the CPU or RAM usage of certain instances is reduced, and the remaining instances' usage does not exceed the threshold.
+The same autoscaler decides to remove existing instances when the CPU or RAM usage of certain instances is reduced, and the remaining instances' usage does not exceed the threshold.
diff --git a/pages/serverless-functions/concepts.mdx b/pages/serverless-functions/concepts.mdx
@@ -17,11 +17,6 @@ categories:
 Autoscaling refers to the ability of Serverless Functions to automatically adjust the number of instances without manual intervention.
 Scaling mechanisms ensure that resources are provisioned dynamically to handle incoming requests efficiently while minimizing idle capacity and cost.
 
-Autoscaling parameters are [min-scale](/serverless-functions/concepts/#min-scale) and [max-scale](/serverless-functions/concepts/#max-scale). Available scaling policies are:
-* **Concurrent requests:** requests incoming to the resource at the same time. Default value suitable for most use cases.
-* **CPU usage:** to scale based on CPU percentage, suitable for intensive CPU workloads.
-* **RAM usage** to scale based on RAM percentage, suitable for memory-intensive workloads.
-
 ## Build step
 
 Before deploying Serverless Functions, they have to be built. This step occurs during deployment.
@@ -215,4 +210,4 @@ Triggers can take many forms, such as HTTP requests, messages from a queue or a
 
 ## vCPU-s
 
-Unit used to measure the resource consumption of a container. It reflects the amount of vCPU used over time.
+Unit used to measure the resource consumption of a container. It reflects the amount of vCPU used over time.
diff --git a/pages/serverless-functions/reference-content/functions-autoscaling.mdx b/pages/serverless-functions/reference-content/functions-autoscaling.mdx
@@ -38,16 +38,14 @@ When the maximum scale is reached, new requests are queued for processing. When
 
 ### Autoscaler behavior
 
-The autoscaler decides to start new instances when:
+The autoscaler decides to add new instances (scale up) for each concurrent request. For example, 5 concurrent requests will generate 5 Serverless Functions instances. This parameter can be customized on Serverless Containers but not on Serverless Functions.
 
-  - the existing instances are no longer able to handle the load because they are busy responding to other ongoing requests. By default, this happens if an instance is already processing 80 requests (max_concurrency = 80).
-  - our system detects an unusual number of requests. In this case, some instances may be started in anticipation to avoid a potential cold start.
+The same autoscaler decides to remove instances (scale down) down to `1` when no more requests are received for 30 seconds.
 
-The same autoscaler decides to remove instances when:
-
-  - no more requests are being processed. If even a single request is being processed (or detected as being processed), then the autoscaler will not be able to remove this instance. The system also prioritizes instances with the fewest ongoing requests, or if very few requests are being sent, it tries to select a particular instance to shut down the others, and therefore scale down.
-  - an instance has not responded to a request for more than 15 minutes of inactivity. The instance is only shut down after this interval, once again to absorb any potential new peaks and thus avoid the cold start. These 15 minutes of inactivity are not configurable.
+Scaling down to zero (if min-scale is set to `0`) happens after 15 minutes of inactivity.
 
 <Message type="note">
-Redeploying your resource results in the termination of existing instances and a return to the min scale, which you observe when redeploying.
-</Message>
+Redeploying your resource does not entail downtime. Instances are gradually replaced with new ones.
+
+Old instances remain running to handle traffic, while new instances are brought up and verified before fully replacing the old ones. This method helps maintain application availability and service continuity throughout the update process.
+</Message>