You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: deploy-manage/autoscaling/autoscaling-in-ece-and-ech.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -85,7 +85,7 @@ The following are known limitations and restrictions with autoscaling:
85
85
In {{ech}} the following additional limitations apply:
86
86
87
87
* Trial deployments cannot be configured to autoscale beyond the normal Trial deployment size limits. The maximum size per zone is increased automatically from the Trial limit when you convert to a paid subscription.
88
-
* ELSER deployments do not scale automatically. For more information, refer to [ELSER](../../explore-analyze/machine-learning/nlp/ml-nlp-elser.md) and [Trained model autoscaling](../../explore-analyze/machine-learning/nlp/ml-nlp-auto-scale.md).
88
+
* ELSER deployments do not scale automatically. For more information, refer to [ELSER](../../explore-analyze/machine-learning/nlp/ml-nlp-elser.md) and [Trained model autoscaling](../autoscaling/trained-model-autoscaling.md).
89
89
90
90
In {{ece}}, the following additional limitations apply:
@@ -25,6 +25,7 @@ Trained model autoscaling is available for both {{serverless-short}} and Cloud d
25
25
Security and Observability projects are only charged for data ingestion and retention. They are not charged for processing power (VCU usage), which is used for more complex operations, like running advanced search models. For example, in Search projects, models such as ELSER require significant processing power to provide more accurate search results.
26
26
27
27
## Enabling autoscaling through APIs - adaptive allocations [enabling-autoscaling-through-apis-adaptive-allocations]
28
+
$$$nlp-model-adaptive-resources$$$
28
29
29
30
Model allocations are independent units of work for NLP tasks. If you set the numbers of threads and allocations for a model manually, they remain constant even when not all the available resources are fully used or when the load on the model requires more resources. Instead of setting the number of allocations manually, you can enable adaptive allocations to set the number of allocations based on the load on the process. This can help you to manage performance and cost more easily. (Refer to the [pricing calculator](https://cloud.elastic.co/pricing) to learn more about the possible costs.)
Copy file name to clipboardExpand all lines: explore-analyze/machine-learning/nlp/ml-nlp-deploy-model.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -25,13 +25,13 @@ Each deployment will be fine-tuned automatically based on its specific purpose y
25
25
Since eland uses APIs to deploy the models, you cannot see the models in {{kib}} until the saved objects are synchronized. You can follow the prompts in {{kib}}, wait for automatic synchronization, or use the [sync {{ml}} saved objects API](https://www.elastic.co/docs/api/doc/kibana/v8/group/endpoint-ml).
26
26
::::
27
27
28
-
You can define the resource usage level of the NLP model during model deployment. The resource usage levels behave differently depending on [adaptive resources](ml-nlp-auto-scale.md#nlp-model-adaptive-resources) being enabled or disabled. When adaptive resources are disabled but {{ml}} autoscaling is enabled, vCPU usage of Cloud deployments derived from the Cloud console and functions as follows:
28
+
You can define the resource usage level of the NLP model during model deployment. The resource usage levels behave differently depending on [adaptive resources](../../../deploy-manage/autoscaling/trained-model-autoscaling.md#enabling-autoscaling-through-apis-adaptive-allocations) being enabled or disabled. When adaptive resources are disabled but {{ml}} autoscaling is enabled, vCPU usage of Cloud deployments derived from the Cloud console and functions as follows:
29
29
30
30
* Low: This level limits resources to two vCPUs, which may be suitable for development, testing, and demos depending on your parameters. It is not recommended for production use
31
31
* Medium: This level limits resources to 32 vCPUs, which may be suitable for development, testing, and demos depending on your parameters. It is not recommended for production use.
32
32
* High: This level may use the maximum number of vCPUs available for this deployment from the Cloud console. If the maximum is 2 vCPUs or fewer, this level is equivalent to the medium or low level.
33
33
34
-
For the resource levels when adaptive resources are enabled, refer to <[*Trained model autoscaling*](ml-nlp-auto-scale.md).
34
+
For the resource levels when adaptive resources are enabled, refer to <[*Trained model autoscaling*](../../../deploy-manage/autoscaling/trained-model-autoscaling.md).
35
35
36
36
## Request queues and search priority [infer-request-queues]
Copy file name to clipboardExpand all lines: explore-analyze/machine-learning/nlp/ml-nlp-e5.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -21,7 +21,7 @@ Refer to the model cards of the [multilingual-e5-small](https://huggingface.co/e
21
21
22
22
To use E5, you must have the [appropriate subscription](https://www.elastic.co/subscriptions) level for semantic search or the trial period activated.
23
23
24
-
Enabling trained model autoscaling for your E5 deployment is recommended. Refer to [*Trained model autoscaling*](ml-nlp-auto-scale.md) to learn more.
24
+
Enabling trained model autoscaling for your E5 deployment is recommended. Refer to [*Trained model autoscaling*](../../../deploy-manage/autoscaling/trained-model-autoscaling.md) to learn more.
Copy file name to clipboardExpand all lines: explore-analyze/machine-learning/nlp/ml-nlp-elser.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -33,7 +33,7 @@ To use ELSER, you must have the [appropriate subscription](https://www.elastic.c
33
33
The minimum dedicated ML node size for deploying and using the ELSER model is 4 GB in {{ech}} if [deployment autoscaling](../../../deploy-manage/autoscaling.md) is turned off. Turning on autoscaling is recommended because it allows your deployment to dynamically adjust resources based on demand. Better performance can be achieved by using more allocations or more threads per allocation, which requires bigger ML nodes. Autoscaling provides bigger nodes when required. If autoscaling is turned off, you must provide suitably sized nodes yourself.
34
34
::::
35
35
36
-
Enabling trained model autoscaling for your ELSER deployment is recommended. Refer to [*Trained model autoscaling*](ml-nlp-auto-scale.md) to learn more.
36
+
Enabling trained model autoscaling for your ELSER deployment is recommended. Refer to [*Trained model autoscaling*](../../../deploy-manage/autoscaling/trained-model-autoscaling.md) to learn more.
37
37
38
38
## ELSER v2 [elser-v2]
39
39
@@ -72,7 +72,7 @@ PUT _inference/sparse_embedding/my-elser-model
72
72
}
73
73
```
74
74
75
-
The API request automatically initiates the model download and then deploy the model. This example uses [autoscaling](ml-nlp-auto-scale.md) through adaptive allocation.
75
+
The API request automatically initiates the model download and then deploy the model. This example uses [autoscaling](../../../deploy-manage/autoscaling/trained-model-autoscaling.md) through adaptive allocation.
76
76
77
77
Refer to the [ELSER {{infer}} integration documentation](../../../solutions/search/inference-api/elser-inference-integration.md) to learn more about the available settings.
78
78
@@ -292,7 +292,7 @@ To gain the biggest value out of ELSER trained models, consider to follow this l
292
292
293
293
* If quick response time is important for your use case, keep {{ml}} resources available at all times by setting `min_allocations` to `1`.
294
294
* Setting `min_allocations` to `0` can save on costs for non-critical use cases or testing environments.
295
-
* Enabling [autoscaling](ml-nlp-auto-scale.md) through adaptive allocations or adaptive resources makes it possible for {{es}} to scale up or down the available resources of your ELSER deployment based on the load on the process.
295
+
* Enabling [autoscaling](../../../deploy-manage/autoscaling/trained-model-autoscaling.md) through adaptive allocations or adaptive resources makes it possible for {{es}} to scale up or down the available resources of your ELSER deployment based on the load on the process.
296
296
* Use dedicated, optimized ELSER {{infer}} endpoints for ingest and search use cases.
297
297
* When deploying a trained model in {{kib}}, you can selectfor which case you want to optimize your ELSER deployment.
298
298
* If you use the trained model or {{infer}} APIs and want to optimize your ELSER trained model deployment or {{infer}} endpoint for ingest, set the number of threads to `1` (`"num_threads": 1`).
Copy file name to clipboardExpand all lines: explore-analyze/machine-learning/nlp/ml-nlp-rerank.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -73,7 +73,7 @@ PUT _inference/rerank/my-rerank-model
73
73
```
74
74
75
75
::::{note}
76
-
The API request automatically downloads and deploys the model. This example uses [autoscaling](ml-nlp-auto-scale.md) through adaptive allocation.
76
+
The API request automatically downloads and deploys the model. This example uses [autoscaling](../../../deploy-manage/autoscaling/trained-model-autoscaling.md) through adaptive allocation.
0 commit comments