You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: serverless/pages/ml-nlp-auto-scale.mdx
+33-33Lines changed: 33 additions & 33 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -13,20 +13,20 @@ There are two ways to enable autoscaling:
13
13
- in Kibana by enabling adaptive resources
14
14
15
15
16
-
Trained model autoscaling is available for Search, Observability, and Security projects on serverless deployments. However, these projects handle processing power differently, which impacts their costs and resource limits.
16
+
Trained model autoscaling is available for both serverless and Cloud deployments. In serverless deployments, processing power is managed differently across Search, Observability, and Security projects, which impacts their costs and resource limits.
17
17
18
-
Security and Observability projects are only charged for data ingestion and retention. They are not charged for processing power (vCU usage), which is used for more complex operations, like running advanced search models. For example, in Search projects, models such as ELSER require significant processing power to provide more accurate search results.
18
+
Security and Observability projects are only charged for data ingestion and retention. They are not charged for processing power (VCU usage), which is used for more complex operations, like running advanced search models. For example, in Search projects, models such as ELSER require significant processing power to provide more accurate search results.
19
19
20
20
## Enabling autoscaling through APIs - adaptive allocations
21
21
22
22
Model allocations are independent units of work for NLP tasks.
23
-
If you set the numbers of threads and allocations for a model manually, they remain constant even when not all the available resources are fully used or when the load on the model requires more resources.
23
+
If you set a static number of allocations, they remain constant even when not all the available resources are fully used or when the load on the model requires more resources.
24
24
Instead of setting the number of allocations manually, you can enable adaptive allocations to set the number of allocations based on the load on the process.
25
25
This can help you to manage performance and cost more easily.
26
26
(Refer to the [pricing calculator](https://cloud.elastic.co/pricing) to learn more about the possible costs.)
27
27
28
28
When adaptive allocations are enabled, the number of allocations of the model is set automatically based on the current load.
29
-
When the load is high, a new model allocation is automatically created.
29
+
When the load is high, additional model allocations are automatically created as needed.
30
30
When the load is low, a model allocation is automatically removed.
31
31
You can explicitely set the minimum and maximum number of allocations; autoscaling will occur within these limits.
32
32
@@ -63,8 +63,8 @@ You can enable adaptive resources for your models when starting or updating the
63
63
Adaptive resources make it possible for Elasticsearch to scale up or down the available resources based on the load on the process.
64
64
This can help you to manage performance and cost more easily.
65
65
When adaptive resources are enabled, the number of VCUs that the model deployment uses is set automatically based on the current load.
66
-
When the load is high, the number of vCUs that the process can use is automatically increased.
67
-
When the load is low, the number of vCUs that the process can use is automatically decreased.
66
+
When the load is high, the number of VCUs that the process can use is automatically increased.
67
+
When the load is low, the number of VCUs that the process can use is automatically decreased.
68
68
69
69
You can choose from three levels of resource usage for your trained model deployment; autoscaling will occur within the selected level's range.
70
70
@@ -85,44 +85,44 @@ The used resources for trained model deployments depend on three factors:
85
85
- the use case you optimize the model deployment for (ingest or search)
86
86
- whether model autoscaling is enabled with adaptive allocations/resources to have dynamic resources, or disabled for static resources
87
87
88
-
The following tables show you the number of allocations, threads, and vCUs available on Serverless when adaptive resources are enabled or disabled.
88
+
The following tables show you the number of allocations, threads, and VCUs available on Serverless when adaptive resources are enabled or disabled.
89
89
90
90
### Deployments on serverless optimized for ingest
91
91
92
92
In case of ingest-optimized deployments, we maximize the number of model allocations.
| Low | 0 to 2 dynamically | 1 | 0 to 16 dynamically |
101
+
| Medium | 1 to 32 dynamically | 1 | 8 to 256 dynamically |
102
+
| High | - 1 to 512 for Search <br /> - 1 to 128 for Security and Observability | 1 | - 8 to 4096 for Search <br /> - 8 to 1024 for Security and Observability |
| High | 512 for Search <br /> No static allocations for Security and Observability | 1 | 512 for Search <br /> No static allocations for Security and Observability |
104
+
#### Adaptive Resources Disabled (Search Only)
109
105
110
-
### Deployments on serverless optimized for search
| High | - 512 for Search <br /> - No static allocations for Security and Observability | 1 | - 4096 for Search <br /> - No static allocations for Security and Observability |
111
111
112
-
In case of search-optimized deployments, we maximize the number of threads.
112
+
### Deployments on Serverless Optimized for Search
| Low | 0 to 1 dynamically | Always 2 | 0 to 16 dynamically |
119
+
| Medium | 1 to 2 (if threads=16), dynamically| Maximum (e.g., 16) |8 to 256 dynamically |
120
+
| High |- 1 to 32 (if threads=16), dynamically <br /> - 1 to 128 for Security and Observability | Maximum (e.g., 16) |- 8 to 4096 for Search <br /> - 8 to 1024 for Security and Observability |
| Medium | 2 statically (if threads=16) | Maximum (for example, 16) | 32 |
128
-
| High | 32 statically (if threads=16) for Search <br /> No static allocations for Security and Observability | Maximum (for example, 16) |512 for Search <br /> No static allocations for Security and Observability |
| Medium | 2 statically (if threads=16) | Maximum (e.g., 16) | 256|
128
+
| High |- 32 statically (if threads=16) for Search <br /> - No static allocations for Security and Observability | Maximum (e.g., 16) |- 4096 for Search <br /> - No static allocations for Security and Observability |
0 commit comments