elastic
diff --git a/‎deploy-manage/autoscaling/autoscaling-in-ece-and-ech.md‎
Lines changed: 1 addition & 1 deletion b/‎deploy-manage/autoscaling/autoscaling-in-ece-and-ech.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎deploy-manage/autoscaling/trained-model-autoscaling.md‎
Lines changed: 2 additions & 1 deletion b/‎deploy-manage/autoscaling/trained-model-autoscaling.md‎
Lines changed: 2 additions & 1 deletion
diff --git a/‎explore-analyze/machine-learning/nlp.md‎
Lines changed: 0 additions & 1 deletion b/‎explore-analyze/machine-learning/nlp.md‎
Lines changed: 0 additions & 1 deletion
diff --git a/‎explore-analyze/machine-learning/nlp/ml-nlp-auto-scale.md‎
Lines changed: 0 additions & 115 deletions b/‎explore-analyze/machine-learning/nlp/ml-nlp-auto-scale.md‎
Lines changed: 0 additions & 115 deletions
diff --git a/‎explore-analyze/machine-learning/nlp/ml-nlp-deploy-model.md‎
Lines changed: 2 additions & 2 deletions b/‎explore-analyze/machine-learning/nlp/ml-nlp-deploy-model.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎explore-analyze/machine-learning/nlp/ml-nlp-e5.md‎
Lines changed: 1 addition & 1 deletion b/‎explore-analyze/machine-learning/nlp/ml-nlp-e5.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎explore-analyze/machine-learning/nlp/ml-nlp-elser.md‎
Lines changed: 3 additions & 3 deletions b/‎explore-analyze/machine-learning/nlp/ml-nlp-elser.md‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎explore-analyze/machine-learning/nlp/ml-nlp-rerank.md‎
Lines changed: 1 addition & 1 deletion b/‎explore-analyze/machine-learning/nlp/ml-nlp-rerank.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎explore-analyze/toc.yml‎
Lines changed: 0 additions & 1 deletion b/‎explore-analyze/toc.yml‎
Lines changed: 0 additions & 1 deletion
@@ -85,7 +85,7 @@ The following are known limitations and restrictions with autoscaling:
 In {{ech}} the following additional limitations apply:
 
 * Trial deployments cannot be configured to autoscale beyond the normal Trial deployment size limits. The maximum size per zone is increased automatically from the Trial limit when you convert to a paid subscription.
-* ELSER deployments do not scale automatically. For more information, refer to [ELSER](../../explore-analyze/machine-learning/nlp/ml-nlp-elser.md) and [Trained model autoscaling](../../explore-analyze/machine-learning/nlp/ml-nlp-auto-scale.md).
+* ELSER deployments do not scale automatically. For more information, refer to [ELSER](../../explore-analyze/machine-learning/nlp/ml-nlp-elser.md) and [Trained model autoscaling](../autoscaling/trained-model-autoscaling.md).
 
 In {{ece}}, the following additional limitations apply:
 
 
@@ -1,7 +1,7 @@
 ---
 mapped_urls:
   - https://www.elastic.co/guide/en/serverless/current/general-ml-nlp-auto-scale.html
-  - https://www.elastic.co/guide/en/serverless/current/general-ml-nlp-auto-scale.html
+  - https://www.elastic.co/guide/en/machine-learning/current/ml-nlp-auto-scale.html
 applies_to:
   stack: ga
   serverless: ga
@@ -25,6 +25,7 @@ Trained model autoscaling is available for both {{serverless-short}} and Cloud d
 Security and Observability projects are only charged for data ingestion and retention. They are not charged for processing power (VCU usage), which is used for more complex operations, like running advanced search models. For example, in Search projects, models such as ELSER require significant processing power to provide more accurate search results.
 
 ## Enabling autoscaling through APIs - adaptive allocations [enabling-autoscaling-through-apis-adaptive-allocations]
+$$$nlp-model-adaptive-resources$$$
 
 Model allocations are independent units of work for NLP tasks. If you set the numbers of threads and allocations for a model manually, they remain constant even when not all the available resources are fully used or when the load on the model requires more resources. Instead of setting the number of allocations manually, you can enable adaptive allocations to set the number of allocations based on the load on the process. This can help you to manage performance and cost more easily. (Refer to the [pricing calculator](https://cloud.elastic.co/pricing) to learn more about the possible costs.)
 
 
@@ -12,7 +12,6 @@ You can use {{stack-ml-features}} to analyze natural language data and make pred
 
 * [Overview](nlp/ml-nlp-overview.md)
 * [Deploy trained models](nlp/ml-nlp-deploy-models.md)
-* [Trained model autoscaling](nlp/ml-nlp-auto-scale.md)
 * [Add NLP {{infer}} to ingest pipelines](nlp/ml-nlp-inference.md)
 * [API quick reference](nlp/ml-nlp-apis.md)
 * [ELSER](nlp/ml-nlp-elser.md)
 
@@ -25,13 +25,13 @@ Each deployment will be fine-tuned automatically based on its specific purpose y
 Since eland uses APIs to deploy the models, you cannot see the models in {{kib}} until the saved objects are synchronized. You can follow the prompts in {{kib}}, wait for automatic synchronization, or use the [sync {{ml}} saved objects API](https://www.elastic.co/docs/api/doc/kibana/v8/group/endpoint-ml).
 ::::
 
-You can define the resource usage level of the NLP model during model deployment. The resource usage levels behave differently depending on [adaptive resources](ml-nlp-auto-scale.md#nlp-model-adaptive-resources) being enabled or disabled. When adaptive resources are disabled but {{ml}} autoscaling is enabled, vCPU usage of Cloud deployments derived from the Cloud console and functions as follows:
+You can define the resource usage level of the NLP model during model deployment. The resource usage levels behave differently depending on [adaptive resources](../../../deploy-manage/autoscaling/trained-model-autoscaling.md#enabling-autoscaling-through-apis-adaptive-allocations) being enabled or disabled. When adaptive resources are disabled but {{ml}} autoscaling is enabled, vCPU usage of Cloud deployments derived from the Cloud console and functions as follows:
 
 * Low: This level limits resources to two vCPUs, which may be suitable for development, testing, and demos depending on your parameters. It is not recommended for production use
 * Medium: This level limits resources to 32 vCPUs, which may be suitable for development, testing, and demos depending on your parameters. It is not recommended for production use.
 * High: This level may use the maximum number of vCPUs available for this deployment from the Cloud console. If the maximum is 2 vCPUs or fewer, this level is equivalent to the medium or low level.
 
-For the resource levels when adaptive resources are enabled, refer to <[*Trained model autoscaling*](ml-nlp-auto-scale.md).
+For the resource levels when adaptive resources are enabled, refer to <[*Trained model autoscaling*](../../../deploy-manage/autoscaling/trained-model-autoscaling.md).
 
 ## Request queues and search priority [infer-request-queues]
 
 
@@ -21,7 +21,7 @@ Refer to the model cards of the [multilingual-e5-small](https://huggingface.co/e
 
 To use E5, you must have the [appropriate subscription](https://www.elastic.co/subscriptions) level for semantic search or the trial period activated.
 
-Enabling trained model autoscaling for your E5 deployment is recommended. Refer to [*Trained model autoscaling*](ml-nlp-auto-scale.md) to learn more.
+Enabling trained model autoscaling for your E5 deployment is recommended. Refer to [*Trained model autoscaling*](../../../deploy-manage/autoscaling/trained-model-autoscaling.md) to learn more.
 
 ## Download and deploy E5 [download-deploy-e5]
 
 
@@ -33,7 +33,7 @@ To use ELSER, you must have the [appropriate subscription](https://www.elastic.c
 The minimum dedicated ML node size for deploying and using the ELSER model is 4 GB in {{ech}} if [deployment autoscaling](../../../deploy-manage/autoscaling.md) is turned off. Turning on autoscaling is recommended because it allows your deployment to dynamically adjust resources based on demand. Better performance can be achieved by using more allocations or more threads per allocation, which requires bigger ML nodes. Autoscaling provides bigger nodes when required. If autoscaling is turned off, you must provide suitably sized nodes yourself.
 ::::
 
-Enabling trained model autoscaling for your ELSER deployment is recommended. Refer to [*Trained model autoscaling*](ml-nlp-auto-scale.md) to learn more.
+Enabling trained model autoscaling for your ELSER deployment is recommended. Refer to [*Trained model autoscaling*](../../../deploy-manage/autoscaling/trained-model-autoscaling.md) to learn more.
 
 ## ELSER v2 [elser-v2]
 
@@ -72,7 +72,7 @@ PUT _inference/sparse_embedding/my-elser-model
     }
 ```
 
-The API request automatically initiates the model download and then deploy the model. This example uses [autoscaling](ml-nlp-auto-scale.md) through adaptive allocation.
+The API request automatically initiates the model download and then deploy the model. This example uses [autoscaling](../../../deploy-manage/autoscaling/trained-model-autoscaling.md) through adaptive allocation.
 
 Refer to the [ELSER {{infer}} integration documentation](../../../solutions/search/inference-api/elser-inference-integration.md) to learn more about the available settings.
 
@@ -292,7 +292,7 @@ To gain the biggest value out of ELSER trained models, consider to follow this l
 
 * If quick response time is important for your use case, keep {{ml}} resources available at all times by setting `min_allocations` to `1`.
 * Setting `min_allocations` to `0` can save on costs for non-critical use cases or testing environments.
-* Enabling [autoscaling](ml-nlp-auto-scale.md) through adaptive allocations or adaptive resources makes it possible for {{es}} to scale up or down the available resources of your ELSER deployment based on the load on the process.
+* Enabling [autoscaling](../../../deploy-manage/autoscaling/trained-model-autoscaling.md) through adaptive allocations or adaptive resources makes it possible for {{es}} to scale up or down the available resources of your ELSER deployment based on the load on the process.
 * Use dedicated, optimized ELSER {{infer}} endpoints for ingest and search use cases.
   * When deploying a trained model in {{kib}}, you can select for which case you want to optimize your ELSER deployment.
   * If you use the trained model or {{infer}} APIs and want to optimize your ELSER trained model deployment or {{infer}} endpoint for ingest, set the number of threads to `1` (`"num_threads": 1`).
 
@@ -73,7 +73,7 @@ PUT _inference/rerank/my-rerank-model
 ```
 
 ::::{note}
-The API request automatically downloads and deploys the model. This example uses [autoscaling](ml-nlp-auto-scale.md) through adaptive allocation.
+The API request automatically downloads and deploys the model. This example uses [autoscaling](../../../deploy-manage/autoscaling/trained-model-autoscaling.md) through adaptive allocation.
 ::::
 
 ::::{note}
 
@@ -190,7 +190,6 @@ toc:
               - file: machine-learning/nlp/ml-nlp-import-model.md
               - file: machine-learning/nlp/ml-nlp-deploy-model.md
               - file: machine-learning/nlp/ml-nlp-test-inference.md
-          - file: machine-learning/nlp/ml-nlp-auto-scale.md
           - file: machine-learning/nlp/ml-nlp-inference.md
           - file: machine-learning/nlp/ml-nlp-apis.md
           - file: machine-learning/nlp/ml-nlp-built-in-models.md