From b5c418ecfe2e7a96520dd3d49e986fcf88d636c5 Mon Sep 17 00:00:00 2001 From: Thomas Tacquet Date: Mon, 26 May 2025 16:45:09 +0200 Subject: [PATCH 1/3] feat(serverless): scaling doc --- .../containers-autoscaling.mdx | 17 +++++++---------- pages/serverless-functions/concepts.mdx | 7 +------ .../reference-content/functions-autoscaling.mdx | 16 +++++++--------- 3 files changed, 15 insertions(+), 25 deletions(-) diff --git a/pages/serverless-containers/reference-content/containers-autoscaling.mdx b/pages/serverless-containers/reference-content/containers-autoscaling.mdx index 293ceb7e56..42e60105bd 100644 --- a/pages/serverless-containers/reference-content/containers-autoscaling.mdx +++ b/pages/serverless-containers/reference-content/containers-autoscaling.mdx @@ -46,19 +46,16 @@ When the maximum scale is reached, new requests are queued for processing. When ### Autoscaler behavior -The autoscaler decides to start new instances when: +The autoscaler decides to add new instances (scale-up) when the number of concurrent requests defined (default is `80`) is reached. - - the existing instances are no longer able to handle the load because they are busy responding to other ongoing requests. By default, this happens if an instance is already processing 80 requests (max_concurrency = 80). - - - our system detects an unusual number of requests. In this case, some instances may be started in anticipation to avoid a potential cold start. +The same autoscaler decides to remove instances (scale-down) down to `1` when no more requests are received during 30 seconds. -The same autoscaler decides to remove instances when: - - - no more requests are being processed. If even a single request is being processed (or detected as being processed), then the autoscaler will not be able to remove this instance. The system also prioritizes instances with the fewest ongoing requests, or if very few requests are being sent, it tries to select a particular instance to shut down the others, and therefore scale down. - - an instance has not responded to a request for more than 15 minutes of inactivity. The instance is only shut down after this interval, once again to absorb any potential new peaks and thus avoid the cold start. These 15 minutes of inactivity are not configurable. +Scaling down to zero (if min-scale is set to `0`) happens after 15 minutes of inactivity. -Redeploying your resource results in the termination of existing instances and a return to the minimum scale. +Redeploying your resource does not generate downtime. Instances are gradually replaced with new ones. + +Old instances remain running to handle traffic, while new instances are brought up and verified before fully replacing the old ones. This method helps maintain application availability and service continuity throughout the update process. ## CPU and RAM percentage @@ -81,4 +78,4 @@ This parameter sets the maximum number of instances of your resource. You should The autoscaler decides to start new instances when the existing instances' CPU or RAM usage exceeds the threshold you defined for a certain amount of time. -The same autoscaler decides to remove existing instances when the CPU or RAM usage of certain instances is reduced, and the remaining instances' usage does not exceed the threshold. \ No newline at end of file +The same autoscaler decides to remove existing instances when the CPU or RAM usage of certain instances is reduced, and the remaining instances' usage does not exceed the threshold. diff --git a/pages/serverless-functions/concepts.mdx b/pages/serverless-functions/concepts.mdx index 5f6caa9c84..336c087bb2 100644 --- a/pages/serverless-functions/concepts.mdx +++ b/pages/serverless-functions/concepts.mdx @@ -17,11 +17,6 @@ categories: Autoscaling refers to the ability of Serverless Functions to automatically adjust the number of instances without manual intervention. Scaling mechanisms ensure that resources are provisioned dynamically to handle incoming requests efficiently while minimizing idle capacity and cost. -Autoscaling parameters are [min-scale](/serverless-functions/concepts/#min-scale) and [max-scale](/serverless-functions/concepts/#max-scale). Available scaling policies are: -* **Concurrent requests:** requests incoming to the resource at the same time. Default value suitable for most use cases. -* **CPU usage:** to scale based on CPU percentage, suitable for intensive CPU workloads. -* **RAM usage** to scale based on RAM percentage, suitable for memory-intensive workloads. - ## Build step Before deploying Serverless Functions, they have to be built. This step occurs during deployment. @@ -215,4 +210,4 @@ Triggers can take many forms, such as HTTP requests, messages from a queue or a ## vCPU-s -Unit used to measure the resource consumption of a container. It reflects the amount of vCPU used over time. \ No newline at end of file +Unit used to measure the resource consumption of a container. It reflects the amount of vCPU used over time. diff --git a/pages/serverless-functions/reference-content/functions-autoscaling.mdx b/pages/serverless-functions/reference-content/functions-autoscaling.mdx index abffd8a5ba..cdcf90144e 100644 --- a/pages/serverless-functions/reference-content/functions-autoscaling.mdx +++ b/pages/serverless-functions/reference-content/functions-autoscaling.mdx @@ -38,16 +38,14 @@ When the maximum scale is reached, new requests are queued for processing. When ### Autoscaler behavior -The autoscaler decides to start new instances when: +The autoscaler decides to add new instances (scale-up) on concurrent requests to handle incoming load. For example, 5 concurrent requests will generate 5 Serverless Functions instances. - - the existing instances are no longer able to handle the load because they are busy responding to other ongoing requests. By default, this happens if an instance is already processing 80 requests (max_concurrency = 80). - - our system detects an unusual number of requests. In this case, some instances may be started in anticipation to avoid a potential cold start. +The same autoscaler decides to remove instances (scale-down) down to `1` when no more requests are received during 30 seconds. -The same autoscaler decides to remove instances when: - - - no more requests are being processed. If even a single request is being processed (or detected as being processed), then the autoscaler will not be able to remove this instance. The system also prioritizes instances with the fewest ongoing requests, or if very few requests are being sent, it tries to select a particular instance to shut down the others, and therefore scale down. - - an instance has not responded to a request for more than 15 minutes of inactivity. The instance is only shut down after this interval, once again to absorb any potential new peaks and thus avoid the cold start. These 15 minutes of inactivity are not configurable. +Scaling down to zero (if min-scale is set to `0`) happens after 15 minutes of inactivity. -Redeploying your resource results in the termination of existing instances and a return to the min scale, which you observe when redeploying. - \ No newline at end of file +Redeploying your resource does not generate downtime. Instances are gradually replaced with new ones. + +Old instances remain running to handle traffic, while new instances are brought up and verified before fully replacing the old ones. This method helps maintain application availability and service continuity throughout the update process. + From 2ba62b6df4ba507c8a5c97812ba635279b4fca25 Mon Sep 17 00:00:00 2001 From: Thomas Tacquet Date: Mon, 26 May 2025 16:58:31 +0200 Subject: [PATCH 2/3] autoscaler --- .../reference-content/functions-autoscaling.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pages/serverless-functions/reference-content/functions-autoscaling.mdx b/pages/serverless-functions/reference-content/functions-autoscaling.mdx index cdcf90144e..4d284ca314 100644 --- a/pages/serverless-functions/reference-content/functions-autoscaling.mdx +++ b/pages/serverless-functions/reference-content/functions-autoscaling.mdx @@ -38,7 +38,7 @@ When the maximum scale is reached, new requests are queued for processing. When ### Autoscaler behavior -The autoscaler decides to add new instances (scale-up) on concurrent requests to handle incoming load. For example, 5 concurrent requests will generate 5 Serverless Functions instances. +The autoscaler decides to add new instances (scale-up) for each concurrent request. For example, 5 concurrent requests will generate 5 Serverless Functions instances. This parameter can be customized on Serverless Containers but not on Serverless Functions. The same autoscaler decides to remove instances (scale-down) down to `1` when no more requests are received during 30 seconds. From 2f16d4b20f2ef3c365f887e5f7799421168d1aee Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?N=C3=A9da?= <87707325+nerda-codes@users.noreply.github.com> Date: Tue, 27 May 2025 13:42:06 +0200 Subject: [PATCH 3/3] Apply suggestions from code review Co-authored-by: Rowena Jones <36301604+RoRoJ@users.noreply.github.com> --- .../reference-content/containers-autoscaling.mdx | 6 +++--- .../reference-content/functions-autoscaling.mdx | 6 +++--- 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/pages/serverless-containers/reference-content/containers-autoscaling.mdx b/pages/serverless-containers/reference-content/containers-autoscaling.mdx index 42e60105bd..c884527384 100644 --- a/pages/serverless-containers/reference-content/containers-autoscaling.mdx +++ b/pages/serverless-containers/reference-content/containers-autoscaling.mdx @@ -46,14 +46,14 @@ When the maximum scale is reached, new requests are queued for processing. When ### Autoscaler behavior -The autoscaler decides to add new instances (scale-up) when the number of concurrent requests defined (default is `80`) is reached. +The autoscaler decides to add new instances (scale up) when the number of concurrent requests defined (default is `80`) is reached. -The same autoscaler decides to remove instances (scale-down) down to `1` when no more requests are received during 30 seconds. +The same autoscaler decides to remove instances (scale down) down to `1` when no more requests are received for 30 seconds. Scaling down to zero (if min-scale is set to `0`) happens after 15 minutes of inactivity. -Redeploying your resource does not generate downtime. Instances are gradually replaced with new ones. +Redeploying your resource does not entail downtime. Instances are gradually replaced with new ones. Old instances remain running to handle traffic, while new instances are brought up and verified before fully replacing the old ones. This method helps maintain application availability and service continuity throughout the update process. diff --git a/pages/serverless-functions/reference-content/functions-autoscaling.mdx b/pages/serverless-functions/reference-content/functions-autoscaling.mdx index 4d284ca314..7e15a90f59 100644 --- a/pages/serverless-functions/reference-content/functions-autoscaling.mdx +++ b/pages/serverless-functions/reference-content/functions-autoscaling.mdx @@ -38,14 +38,14 @@ When the maximum scale is reached, new requests are queued for processing. When ### Autoscaler behavior -The autoscaler decides to add new instances (scale-up) for each concurrent request. For example, 5 concurrent requests will generate 5 Serverless Functions instances. This parameter can be customized on Serverless Containers but not on Serverless Functions. +The autoscaler decides to add new instances (scale up) for each concurrent request. For example, 5 concurrent requests will generate 5 Serverless Functions instances. This parameter can be customized on Serverless Containers but not on Serverless Functions. -The same autoscaler decides to remove instances (scale-down) down to `1` when no more requests are received during 30 seconds. +The same autoscaler decides to remove instances (scale down) down to `1` when no more requests are received for 30 seconds. Scaling down to zero (if min-scale is set to `0`) happens after 15 minutes of inactivity. -Redeploying your resource does not generate downtime. Instances are gradually replaced with new ones. +Redeploying your resource does not entail downtime. Instances are gradually replaced with new ones. Old instances remain running to handle traffic, while new instances are brought up and verified before fully replacing the old ones. This method helps maintain application availability and service continuity throughout the update process.