Skip to content

Commit 2e1c212

Browse files
committed
Adds information about cool down periods for Trained models autoscaling
1 parent d79d337 commit 2e1c212

File tree

3 files changed

+9
-1
lines changed

3 files changed

+9
-1
lines changed

deploy-manage/autoscaling.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,10 @@ The available resources of self-managed deployments are static, so trained model
4141

4242
Trained model autoscaling automatically adjusts the resources allocated to trained model deployments based on demand. This feature is available on all cloud deployments (ECE, ECK, ECH) and {{serverless-short}}. See [Trained model autoscaling](/deploy-manage/autoscaling/trained-model-autoscaling.md) for details.
4343

44+
::::{note}
45+
{applies_to}`serverless: ga` In {{serverless-short}}, trained model deployments remain active for 24 hours after the last inference request. After that, they scale down to zero. When scaled up again, they stay active for 5 minutes before they can scale down. These cooldown periods prevent unnecessary scaling and ensure models are available when needed.
46+
::::
47+
4448
Trained model autoscaling supports:
4549
* Scaling trained model deployments
4650

deploy-manage/autoscaling/trained-model-autoscaling.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ There are two ways to enable autoscaling:
2222
* through APIs by enabling adaptive allocations
2323
* in {{kib}} by enabling adaptive resources
2424

25-
For {{serverless-short}} projects, trained model autoscaling is automatically enabled and cannot be disabled.
25+
{applies_to}`serverless: ga` For {{serverless-short}} projects, trained model autoscaling is always enabled and cannot be turned off. Trained model deployments remain active for 24 hours after the last inference request before scaling down to zero. When scaled up again, they stay active for 5 minutes before they can scale down. These cooldown periods prevent unnecessary scaling and ensure models are available when needed.
2626

2727
::::{important}
2828
To fully leverage model autoscaling in {{ech}}, {{ece}}, and {{eck}}, it is highly recommended to enable [{{es}} deployment autoscaling](../../deploy-manage/autoscaling.md).

deploy-manage/cloud-organization/billing/elasticsearch-billing-dimensions.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,10 @@ You can control costs using the following strategies:
4747

4848
* When starting or updating a trained model deployment, [Enable adaptive resources](../../autoscaling/trained-model-autoscaling.md#enabling-autoscaling-in-kibana-adaptive-resources) and set the VCU usage level to **Low**.
4949
* When using the inference API for {{es}} or ELSER, [enable `adaptive_allocations`](../../autoscaling/trained-model-autoscaling.md#enabling-autoscaling-through-apis-adaptive-allocations).
50+
51+
::::{note}
52+
{applies_to}`serverless: ga` In {{serverless-short}}, trained model deployments scale down to zero only after 24 hours without any inference requests. After scaling up, they remain active for 5 minutes before they can scale down again. During these cooldown periods, you will continue to be billed for the active resources.
53+
::::
5054

5155
* **Indexing Strategies:** Consider your indexing strategies and how they might impact overall VCU usage and costs:
5256

0 commit comments

Comments
 (0)