Skip to content
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions deploy-manage/autoscaling.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,10 @@ The available resources of self-managed deployments are static, so trained model

Trained model autoscaling automatically adjusts the resources allocated to trained model deployments based on demand. This feature is available on all cloud deployments (ECE, ECK, ECH) and {{serverless-short}}. See [Trained model autoscaling](/deploy-manage/autoscaling/trained-model-autoscaling.md) for details.

::::{note}
In {{serverless-short}}, trained model deployments remain active for 24 hours after the last inference request. After that, they scale down to zero. When scaled up again, they stay active for 5 minutes before they can scale down. These cooldown periods prevent unnecessary scaling and ensure models are available when needed.
::::

Trained model autoscaling supports:
* Scaling trained model deployments

Expand Down
2 changes: 1 addition & 1 deletion deploy-manage/autoscaling/trained-model-autoscaling.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ There are two ways to enable autoscaling:
* through APIs by enabling adaptive allocations
* in {{kib}} by enabling adaptive resources

For {{serverless-short}} projects, trained model autoscaling is automatically enabled and cannot be disabled.
For {{serverless-short}} projects, trained model autoscaling is always enabled and cannot be turned off. Trained model deployments remain active for 24 hours after the last inference request before scaling down to zero. When scaled up again, they stay active for 5 minutes before they can scale down. These cooldown periods prevent unnecessary scaling and ensure models are available when needed.

::::{important}
To fully leverage model autoscaling in {{ech}}, {{ece}}, and {{eck}}, it is highly recommended to enable [{{es}} deployment autoscaling](../../deploy-manage/autoscaling.md).
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,14 @@ You can control costs using the following strategies:

* **Search Power setting:** [Search Power](../../deploy/elastic-cloud/project-settings.md#elasticsearch-manage-project-search-power-settings) controls the speed of searches against your data. With Search Power, you can improve search performance by adding more resources for querying, or you can reduce provisioned resources to cut costs.
* **Search boost window**: By limiting the number of days of [time series data](../../../solutions/search/ingest-for-search.md#elasticsearch-ingest-time-series-data) that are available for caching, you can reduce the number of search VCUs required.
* **Machine learning trained model autoscaling:** Configure your trained model deployment to allow it to scale down to zero allocations when there are no active inference requests:

* When starting or updating a trained model deployment, [Enable adaptive resources](../../autoscaling/trained-model-autoscaling.md#enabling-autoscaling-in-kibana-adaptive-resources) and set the VCU usage level to **Low**.
* When using the inference API for {{es}} or ELSER, [enable `adaptive_allocations`](../../autoscaling/trained-model-autoscaling.md#enabling-autoscaling-through-apis-adaptive-allocations).

::::{note}
In {{serverless-short}}, trained model deployments scale down to zero only after 24 hours without any inference requests. After scaling up, they remain active for 5 minutes before they can scale down again. During these cooldown periods, you will continue to be billed for the active resources.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is true outside of serverless as well. All environments will now wait 24 hours before scaling to zero: elastic/elasticsearch#128914

Outside of serverless, this can be modified using xpack.ml.trained_models.adaptive_allocations.scale_to_zero_time to a minimum of one minute.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @prwhelan, thanks a lot for your feedback! I've modified my PR based on it, along with a few other smaller changes:

  • Trained model autoscaling: I moved the cooldown period information into its own heading. This makes it easier to highlight and also allows other pages to link directly to this specific section.

  • Autoscaling: I felt that going into the details of cooldown periods here would be out of scope and make the page a bit overwhelming. Instead, I added a more concise sentence that links to the new Cooldown periods section on the Trained model autoscaling page.

  • Elasticsearch billing dimensions: Realizing that this page is only applicable to Serverless, I updated the description for the Machine learning trained model autoscaling bullet point to reflect the new autoscaling behavior in Serverless.

Please let me know if you think these changes are appropriate or if you’d like me to adjust anything.
Thanks again!

::::

* **Indexing Strategies:** Consider your indexing strategies and how they might impact overall VCU usage and costs:

Expand Down
Loading