Skip to content

Commit 956cacc

Browse files
Updates 'Trained model autoscaling' page for Serverless adaptive resources behavior (elastic#2224)
On Serverless, adaptive resources are now always enabled. This PR updates the Trained model autoscaling page to reflect this behavior. Related issue: elastic/developer-docs-team#309 This is a follow-up PR for elastic#2184. --------- Co-authored-by: Vlada Chirmicci <[email protected]>
1 parent e9e6477 commit 956cacc

File tree

3 files changed

+6
-12
lines changed

3 files changed

+6
-12
lines changed

deploy-manage/autoscaling/trained-model-autoscaling.md

Lines changed: 6 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -22,11 +22,13 @@ There are two ways to enable autoscaling:
2222
* through APIs by enabling adaptive allocations
2323
* in {{kib}} by enabling adaptive resources
2424

25+
For {{serverless-short}} projects, trained model autoscaling is automatically enabled and cannot be disabled.
26+
2527
::::{important}
2628
To fully leverage model autoscaling in {{ech}}, {{ece}}, and {{eck}}, it is highly recommended to enable [{{es}} deployment autoscaling](../../deploy-manage/autoscaling.md).
2729
::::
2830

29-
Trained model autoscaling is available for {{serverless-short}}, {{ech}}, {{ece}}, and {{eck}} deployments. In serverless deployments, processing power is managed differently across Search, Observability, and Security projects, which impacts their costs and resource limits.
31+
Trained model autoscaling is available for {{serverless-short}}, {{ech}}, {{ece}}, and {{eck}} deployments. In {{serverless-short}} projects, processing power is managed differently across Search, Observability, and Security projects, which impacts their costs and resource limits.
3032

3133
:::{admonition} Trained model auto-scaling for self-managed deployments
3234
The available resources of self-managed deployments are static, so trained model autoscaling is not applicable. However, available resources are still segmented based on the settings described in this section.
@@ -54,10 +56,6 @@ You can enable adaptive allocations by using:
5456

5557
If the new allocations fit on the current {{ml}} nodes, they are immediately started. If more resource capacity is needed for creating new model allocations, then your {{ml}} node will be scaled up if {{ml}} autoscaling is enabled to provide enough resources for the new allocation. The number of model allocations can be scaled down to 0. They cannot be scaled up to more than 32 allocations, unless you explicitly set the maximum number of allocations to more. Adaptive allocations must be set up independently for each deployment and [{{infer}} endpoint](https://www.elastic.co/docs/api/doc/elasticsearch/group/endpoint-inference).
5658

57-
:::{note}
58-
When you create inference endpoints on {{serverless-short}} using {{kib}}, adaptive allocations are automatically turned on, and there is no option to disable them.
59-
:::
60-
6159
### Optimizing for typical use cases [optimizing-for-typical-use-cases]
6260

6361
You can optimize your model deployment for typical use cases, such as search and ingest. When you optimize for ingest, the throughput will be higher, which increases the number of {{infer}} requests that can be performed in parallel. When you optimize for search, the latency will be lower during search processes.
@@ -73,16 +71,16 @@ You can choose from three levels of resource usage for your trained model deploy
7371

7472
Refer to the tables in the [Model deployment resource matrix](#model-deployment-resource-matrix) section to find out the settings for the level you selected.
7573

76-
:::{image} /deploy-manage/images/machine-learning-ml-nlp-deployment-id-elser-v2.png
74+
The image below shows the process of starting a trained model on an {{ech}} deployment. In {{serverless-short}} projects, the **Adaptive resources** toggle is not available when starting trained model deployments, as adaptive allocations are always enabled and cannot be disabled.
75+
76+
:::{image} /deploy-manage/images/ml-nlp-deployment-id-elser.png
7777
:alt: ELSER deployment with adaptive resources enabled.
7878
:screenshot:
7979
:width: 500px
8080
:::
8181

8282
In {{serverless-full}}, Search projects are given access to more processing resources, while Security and Observability projects have lower limits. This difference is reflected in the UI configuration: Search projects have higher resource limits compared to Security and Observability projects to accommodate their more complex operations.
8383

84-
On {{serverless-short}}, adaptive allocations are automatically enabled for all project types.
85-
8684
## Model deployment resource matrix [model-deployment-resource-matrix]
8785

8886
The used resources for trained model deployments depend on three factors:
@@ -100,10 +98,6 @@ If you use a self-managed cluster or ECK, vCPUs level ranges are derived from th
10098

10199
The following tables show you the number of allocations, threads, and vCPUs available in ECE and ECH when adaptive resources are enabled or disabled.
102100

103-
::::{note}
104-
On {{serverless-short}}, adaptive allocations are automatically enabled for all project types. However, the "Adaptive resources" control is not displayed in {{kib}} for Observability and Security projects.
105-
::::
106-
107101
### Ingest optimized
108102

109103
In case of ingest-optimized deployments, we maximize the number of model allocations.
Binary file not shown.
185 KB
Loading

0 commit comments

Comments
 (0)