Skip to content

Commit 41b1ce5

Browse files
cleanup trained model auto
1 parent 3f7c05d commit 41b1ce5

File tree

14 files changed

+13
-247
lines changed

14 files changed

+13
-247
lines changed

deploy-manage/autoscaling/autoscaling-in-ece-and-ech.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -85,7 +85,7 @@ The following are known limitations and restrictions with autoscaling:
8585
In {{ech}} the following additional limitations apply:
8686

8787
* Trial deployments cannot be configured to autoscale beyond the normal Trial deployment size limits. The maximum size per zone is increased automatically from the Trial limit when you convert to a paid subscription.
88-
* ELSER deployments do not scale automatically. For more information, refer to [ELSER](../../explore-analyze/machine-learning/nlp/ml-nlp-elser.md) and [Trained model autoscaling](../../explore-analyze/machine-learning/nlp/ml-nlp-auto-scale.md).
88+
* ELSER deployments do not scale automatically. For more information, refer to [ELSER](../../explore-analyze/machine-learning/nlp/ml-nlp-elser.md) and [Trained model autoscaling](../autoscaling/trained-model-autoscaling.md).
8989

9090
In {{ece}}, the following additional limitations apply:
9191

deploy-manage/autoscaling/trained-model-autoscaling.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
mapped_urls:
33
- https://www.elastic.co/guide/en/serverless/current/general-ml-nlp-auto-scale.html
4-
- https://www.elastic.co/guide/en/serverless/current/general-ml-nlp-auto-scale.html
4+
- https://www.elastic.co/guide/en/machine-learning/current/ml-nlp-auto-scale.html
55
applies_to:
66
stack: ga
77
serverless: ga
@@ -25,6 +25,7 @@ Trained model autoscaling is available for both {{serverless-short}} and Cloud d
2525
Security and Observability projects are only charged for data ingestion and retention. They are not charged for processing power (VCU usage), which is used for more complex operations, like running advanced search models. For example, in Search projects, models such as ELSER require significant processing power to provide more accurate search results.
2626

2727
## Enabling autoscaling through APIs - adaptive allocations [enabling-autoscaling-through-apis-adaptive-allocations]
28+
$$$nlp-model-adaptive-resources$$$
2829

2930
Model allocations are independent units of work for NLP tasks. If you set the numbers of threads and allocations for a model manually, they remain constant even when not all the available resources are fully used or when the load on the model requires more resources. Instead of setting the number of allocations manually, you can enable adaptive allocations to set the number of allocations based on the load on the process. This can help you to manage performance and cost more easily. (Refer to the [pricing calculator](https://cloud.elastic.co/pricing) to learn more about the possible costs.)
3031

explore-analyze/machine-learning/nlp.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,6 @@ You can use {{stack-ml-features}} to analyze natural language data and make pred
1212

1313
* [Overview](nlp/ml-nlp-overview.md)
1414
* [Deploy trained models](nlp/ml-nlp-deploy-models.md)
15-
* [Trained model autoscaling](nlp/ml-nlp-auto-scale.md)
1615
* [Add NLP {{infer}} to ingest pipelines](nlp/ml-nlp-inference.md)
1716
* [API quick reference](nlp/ml-nlp-apis.md)
1817
* [ELSER](nlp/ml-nlp-elser.md)

explore-analyze/machine-learning/nlp/ml-nlp-auto-scale.md

Lines changed: 0 additions & 115 deletions
This file was deleted.

explore-analyze/machine-learning/nlp/ml-nlp-deploy-model.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -25,13 +25,13 @@ Each deployment will be fine-tuned automatically based on its specific purpose y
2525
Since eland uses APIs to deploy the models, you cannot see the models in {{kib}} until the saved objects are synchronized. You can follow the prompts in {{kib}}, wait for automatic synchronization, or use the [sync {{ml}} saved objects API](https://www.elastic.co/docs/api/doc/kibana/v8/group/endpoint-ml).
2626
::::
2727

28-
You can define the resource usage level of the NLP model during model deployment. The resource usage levels behave differently depending on [adaptive resources](ml-nlp-auto-scale.md#nlp-model-adaptive-resources) being enabled or disabled. When adaptive resources are disabled but {{ml}} autoscaling is enabled, vCPU usage of Cloud deployments derived from the Cloud console and functions as follows:
28+
You can define the resource usage level of the NLP model during model deployment. The resource usage levels behave differently depending on [adaptive resources](../../../deploy-manage/autoscaling/trained-model-autoscaling.md#enabling-autoscaling-through-apis-adaptive-allocations) being enabled or disabled. When adaptive resources are disabled but {{ml}} autoscaling is enabled, vCPU usage of Cloud deployments derived from the Cloud console and functions as follows:
2929

3030
* Low: This level limits resources to two vCPUs, which may be suitable for development, testing, and demos depending on your parameters. It is not recommended for production use
3131
* Medium: This level limits resources to 32 vCPUs, which may be suitable for development, testing, and demos depending on your parameters. It is not recommended for production use.
3232
* High: This level may use the maximum number of vCPUs available for this deployment from the Cloud console. If the maximum is 2 vCPUs or fewer, this level is equivalent to the medium or low level.
3333

34-
For the resource levels when adaptive resources are enabled, refer to <[*Trained model autoscaling*](ml-nlp-auto-scale.md).
34+
For the resource levels when adaptive resources are enabled, refer to <[*Trained model autoscaling*](../../../deploy-manage/autoscaling/trained-model-autoscaling.md).
3535

3636
## Request queues and search priority [infer-request-queues]
3737

explore-analyze/machine-learning/nlp/ml-nlp-e5.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ Refer to the model cards of the [multilingual-e5-small](https://huggingface.co/e
2121

2222
To use E5, you must have the [appropriate subscription](https://www.elastic.co/subscriptions) level for semantic search or the trial period activated.
2323

24-
Enabling trained model autoscaling for your E5 deployment is recommended. Refer to [*Trained model autoscaling*](ml-nlp-auto-scale.md) to learn more.
24+
Enabling trained model autoscaling for your E5 deployment is recommended. Refer to [*Trained model autoscaling*](../../../deploy-manage/autoscaling/trained-model-autoscaling.md) to learn more.
2525

2626
## Download and deploy E5 [download-deploy-e5]
2727

explore-analyze/machine-learning/nlp/ml-nlp-elser.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ To use ELSER, you must have the [appropriate subscription](https://www.elastic.c
3333
The minimum dedicated ML node size for deploying and using the ELSER model is 4 GB in {{ech}} if [deployment autoscaling](../../../deploy-manage/autoscaling.md) is turned off. Turning on autoscaling is recommended because it allows your deployment to dynamically adjust resources based on demand. Better performance can be achieved by using more allocations or more threads per allocation, which requires bigger ML nodes. Autoscaling provides bigger nodes when required. If autoscaling is turned off, you must provide suitably sized nodes yourself.
3434
::::
3535

36-
Enabling trained model autoscaling for your ELSER deployment is recommended. Refer to [*Trained model autoscaling*](ml-nlp-auto-scale.md) to learn more.
36+
Enabling trained model autoscaling for your ELSER deployment is recommended. Refer to [*Trained model autoscaling*](../../../deploy-manage/autoscaling/trained-model-autoscaling.md) to learn more.
3737

3838
## ELSER v2 [elser-v2]
3939

@@ -72,7 +72,7 @@ PUT _inference/sparse_embedding/my-elser-model
7272
}
7373
```
7474

75-
The API request automatically initiates the model download and then deploy the model. This example uses [autoscaling](ml-nlp-auto-scale.md) through adaptive allocation.
75+
The API request automatically initiates the model download and then deploy the model. This example uses [autoscaling](../../../deploy-manage/autoscaling/trained-model-autoscaling.md) through adaptive allocation.
7676

7777
Refer to the [ELSER {{infer}} integration documentation](../../../solutions/search/inference-api/elser-inference-integration.md) to learn more about the available settings.
7878

@@ -292,7 +292,7 @@ To gain the biggest value out of ELSER trained models, consider to follow this l
292292

293293
* If quick response time is important for your use case, keep {{ml}} resources available at all times by setting `min_allocations` to `1`.
294294
* Setting `min_allocations` to `0` can save on costs for non-critical use cases or testing environments.
295-
* Enabling [autoscaling](ml-nlp-auto-scale.md) through adaptive allocations or adaptive resources makes it possible for {{es}} to scale up or down the available resources of your ELSER deployment based on the load on the process.
295+
* Enabling [autoscaling](../../../deploy-manage/autoscaling/trained-model-autoscaling.md) through adaptive allocations or adaptive resources makes it possible for {{es}} to scale up or down the available resources of your ELSER deployment based on the load on the process.
296296
* Use dedicated, optimized ELSER {{infer}} endpoints for ingest and search use cases.
297297
* When deploying a trained model in {{kib}}, you can select for which case you want to optimize your ELSER deployment.
298298
* If you use the trained model or {{infer}} APIs and want to optimize your ELSER trained model deployment or {{infer}} endpoint for ingest, set the number of threads to `1` (`"num_threads": 1`).

explore-analyze/machine-learning/nlp/ml-nlp-rerank.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -73,7 +73,7 @@ PUT _inference/rerank/my-rerank-model
7373
```
7474

7575
::::{note}
76-
The API request automatically downloads and deploys the model. This example uses [autoscaling](ml-nlp-auto-scale.md) through adaptive allocation.
76+
The API request automatically downloads and deploys the model. This example uses [autoscaling](../../../deploy-manage/autoscaling/trained-model-autoscaling.md) through adaptive allocation.
7777
::::
7878

7979
::::{note}

explore-analyze/toc.yml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -190,7 +190,6 @@ toc:
190190
- file: machine-learning/nlp/ml-nlp-import-model.md
191191
- file: machine-learning/nlp/ml-nlp-deploy-model.md
192192
- file: machine-learning/nlp/ml-nlp-test-inference.md
193-
- file: machine-learning/nlp/ml-nlp-auto-scale.md
194193
- file: machine-learning/nlp/ml-nlp-inference.md
195194
- file: machine-learning/nlp/ml-nlp-apis.md
196195
- file: machine-learning/nlp/ml-nlp-built-in-models.md

0 commit comments

Comments
 (0)