diff --git a/deploy-manage/autoscaling.md b/deploy-manage/autoscaling.md
index 090269c45d..755f38f96a 100644
--- a/deploy-manage/autoscaling.md
+++ b/deploy-manage/autoscaling.md
@@ -1,70 +1,54 @@
---
mapped_urls:
- - https://www.elastic.co/guide/en/cloud-heroku/current/ech-autoscaling.html
- - https://www.elastic.co/guide/en/cloud/current/ec-autoscaling.html
- - https://www.elastic.co/guide/en/cloud-enterprise/current/ece-autoscaling.html
- https://www.elastic.co/guide/en/elasticsearch/reference/current/xpack-autoscaling.html
+applies_to:
+ deployment:
+ ece: ga
+ ess: ga
+ eck: ga
+ serverless: all
---
# Autoscaling
-% What needs to be done: Refine
+The autoscaling feature adjusts resources based on demand. A deployment can use autoscaling to scale resources as needed, ensuring sufficient capacity to meet workload requirements. In {{ece}}, {{eck}}, and {{ech}} deployments, autoscaling follows predefined policies, while in {{serverless-short}}, it is fully managed and automatic.
-% GitHub issue: https://github.com/elastic/docs-projects/issues/344
+:::{{tip}} - Serverless handles autoscaling for you
+By default, {{serverless-full}} automatically scales your {{es}} resources based on your usage. You don't need to enable autoscaling.
+:::
-% Scope notes: Creating a new landing page and subheadings/pages for different deployment types. Merge content when appropriate
+## Cluster autoscaling
-% Use migrated content from existing pages that map to this page:
+::::{admonition} Indirect use only
+This feature is designed for indirect use by {{ech}}, {{ece}}, and {{eck}}. Direct use is not supported.
+::::
-% - [ ] ./raw-migrated-files/cloud/cloud-heroku/ech-autoscaling.md
-% Notes: 1 child
-% - [ ] ./raw-migrated-files/cloud/cloud/ec-autoscaling.md
-% Notes: 2 children
-% - [ ] ./raw-migrated-files/cloud/cloud-enterprise/ece-autoscaling.md
-% Notes: 2 children
-% - [ ] ./raw-migrated-files/elasticsearch/elasticsearch-reference/xpack-autoscaling.md
+Cluster autoscaling allows an operator to create tiers of nodes that monitor themselves and determine if scaling is needed based on an operator-defined policy. An Elasticsearch cluster can use the autoscaling API to report when additional resources are required. For example, an operator can define a policy that scales a warm tier based on available disk space. Elasticsearch monitors disk space in the warm tier. If it predicts low disk space for current and future shard copies, the autoscaling API reports that the cluster needs to scale. It remains the responsibility of the operator to add the additional resources that the cluster signals it requires.
-% Internal links rely on the following IDs being on this page (e.g. as a heading ID, paragraph ID, etc):
+A policy is composed of a list of roles and a list of deciders. The policy governs the nodes matching the roles. The deciders provide independent estimates of the capacity required. See [Autoscaling deciders](../deploy-manage/autoscaling/autoscaling-deciders.md) for details on available deciders.
-$$$ec-autoscaling-intro$$$
+Cluster autoscaling supports:
+* Scaling machine learning nodes up and down.
+* Scaling data nodes up based on storage.
-$$$ec-autoscaling-factors$$$
+## Trained model autoscaling
-$$$ec-autoscaling-notifications$$$
+:::{admonition} Trained model auto-scaling for self-managed deployments
+The available resources of self-managed deployments are static, so trained model autoscaling is not applicable. However, available resources are still segmented based on the settings described in this section.
+:::
-$$$ec-autoscaling-restrictions$$$
+Trained model autoscaling automatically adjusts the resources allocated to trained model deployments based on demand. This feature is available on all cloud deployments (ECE, ECK, ECH) and {{serverless-short}}. See [Trained model autoscaling](/deploy-manage/autoscaling/trained-model-autoscaling.md) for details.
-$$$ec-autoscaling-enable$$$
+Trained model autoscaling supports:
+* Scaling trained model deployments
-$$$ec-autoscaling-update$$$
+::::{note}
+Autoscaling is not supported on Debian 8.
+::::
-$$$ece-autoscaling-intro$$$
+Find instructions on setting up and managing autoscaling, including supported environments, configuration options, and examples:
-$$$ece-autoscaling-factors$$$
-
-$$$ece-autoscaling-notifications$$$
-
-$$$ece-autoscaling-restrictions$$$
-
-$$$ece-autoscaling-enable$$$
-
-$$$ece-autoscaling-update$$$
-
-$$$ech-autoscaling-intro$$$
-
-$$$ech-autoscaling-factors$$$
-
-$$$ech-autoscaling-notifications$$$
-
-$$$ech-autoscaling-restrictions$$$
-
-$$$ech-autoscaling-enable$$$
-
-$$$ech-autoscaling-update$$$
-
-**This page is a work in progress.** The documentation team is working to combine content pulled from the following pages:
-
-* [/raw-migrated-files/cloud/cloud-heroku/ech-autoscaling.md](/raw-migrated-files/cloud/cloud-heroku/ech-autoscaling.md)
-* [/raw-migrated-files/cloud/cloud/ec-autoscaling.md](/raw-migrated-files/cloud/cloud/ec-autoscaling.md)
-* [/raw-migrated-files/cloud/cloud-enterprise/ece-autoscaling.md](/raw-migrated-files/cloud/cloud-enterprise/ece-autoscaling.md)
-* [/raw-migrated-files/elasticsearch/elasticsearch-reference/xpack-autoscaling.md](/raw-migrated-files/elasticsearch/elasticsearch-reference/xpack-autoscaling.md)
\ No newline at end of file
+* [Autoscaling in {{ece}} and {{ech}}](/deploy-manage/autoscaling/autoscaling-in-ece-and-ech.md)
+* [Autoscaling in {{eck}}](/deploy-manage/autoscaling/autoscaling-in-eck.md)
+* [Autoscaling deciders](/deploy-manage/autoscaling/autoscaling-deciders.md)
+* [Trained model autoscaling](/deploy-manage/autoscaling/trained-model-autoscaling.md)
diff --git a/deploy-manage/autoscaling/autoscaling-deciders.md b/deploy-manage/autoscaling/autoscaling-deciders.md
index 1289aa1657..90f94fd949 100644
--- a/deploy-manage/autoscaling/autoscaling-deciders.md
+++ b/deploy-manage/autoscaling/autoscaling-deciders.md
@@ -8,36 +8,216 @@ mapped_urls:
- https://www.elastic.co/guide/en/elasticsearch/reference/current/autoscaling-frozen-existence-decider.html
- https://www.elastic.co/guide/en/elasticsearch/reference/current/autoscaling-machine-learning-decider.html
- https://www.elastic.co/guide/en/elasticsearch/reference/current/autoscaling-fixed-decider.html
+applies_to:
+ ece:
+ eck:
+ ess:
---
-# Autoscaling deciders
+# Autoscaling deciders [autoscaling-deciders]
-% What needs to be done: Refine
+[Autoscaling](/deploy-manage/autoscaling.md) in Elasticsearch enables dynamic resource allocation based on predefined policies. A key component of this mechanism is autoscaling deciders, which independently assess resource requirements and determine when scaling actions are necessary. Deciders analyze various factors, such as storage usage, indexing rates, and machine learning workloads, to ensure clusters maintain optimal performance without manual intervention.
-% GitHub issue: https://github.com/elastic/docs-projects/issues/344
+::::{admonition} Indirect use only
+This feature is designed for indirect use by {{ech}}, {{ece}}, and {{eck}}. Direct use is not supported.
+::::
-% Scope notes: Collapse to a single page, explain what deciders are
+[Reactive storage decider](#autoscaling-reactive-storage-decider)
+: Estimates required storage capacity of current data set. Available for policies governing data nodes.
-% Use migrated content from existing pages that map to this page:
+[Proactive storage decider](#autoscaling-proactive-storage-decider)
+: Estimates required storage capacity based on current ingestion into hot nodes. Available for policies governing hot data nodes.
-% - [ ] ./raw-migrated-files/elasticsearch/elasticsearch-reference/autoscaling-deciders.md
-% - [ ] ./raw-migrated-files/elasticsearch/elasticsearch-reference/autoscaling-reactive-storage-decider.md
-% - [ ] ./raw-migrated-files/elasticsearch/elasticsearch-reference/autoscaling-proactive-storage-decider.md
-% - [ ] ./raw-migrated-files/elasticsearch/elasticsearch-reference/autoscaling-frozen-shards-decider.md
-% - [ ] ./raw-migrated-files/elasticsearch/elasticsearch-reference/autoscaling-frozen-storage-decider.md
-% - [ ] ./raw-migrated-files/elasticsearch/elasticsearch-reference/autoscaling-frozen-existence-decider.md
-% - [ ] ./raw-migrated-files/elasticsearch/elasticsearch-reference/autoscaling-machine-learning-decider.md
-% - [ ] ./raw-migrated-files/elasticsearch/elasticsearch-reference/autoscaling-fixed-decider.md
+[Frozen shards decider](#autoscaling-frozen-shards-decider)
+: Estimates required memory capacity based on the number of partially mounted shards. Available for policies governing frozen data nodes.
-⚠️ **This page is a work in progress.** ⚠️
+[Frozen storage decider](#autoscaling-frozen-storage-decider)
+: Estimates required storage capacity as a percentage of the total data set of partially mounted indices. Available for policies governing frozen data nodes.
-The documentation team is working to combine content pulled from the following pages:
+[Frozen existence decider](#autoscaling-frozen-existence-decider)
+: Estimates a minimum require frozen memory and storage capacity when any index is in the frozen [ILM](../../manage-data/lifecycle/index-lifecycle-management.md) phase.
-* [/raw-migrated-files/elasticsearch/elasticsearch-reference/autoscaling-deciders.md](/raw-migrated-files/elasticsearch/elasticsearch-reference/autoscaling-deciders.md)
-* [/raw-migrated-files/elasticsearch/elasticsearch-reference/autoscaling-reactive-storage-decider.md](/raw-migrated-files/elasticsearch/elasticsearch-reference/autoscaling-reactive-storage-decider.md)
-* [/raw-migrated-files/elasticsearch/elasticsearch-reference/autoscaling-proactive-storage-decider.md](/raw-migrated-files/elasticsearch/elasticsearch-reference/autoscaling-proactive-storage-decider.md)
-* [/raw-migrated-files/elasticsearch/elasticsearch-reference/autoscaling-frozen-shards-decider.md](/raw-migrated-files/elasticsearch/elasticsearch-reference/autoscaling-frozen-shards-decider.md)
-* [/raw-migrated-files/elasticsearch/elasticsearch-reference/autoscaling-frozen-storage-decider.md](/raw-migrated-files/elasticsearch/elasticsearch-reference/autoscaling-frozen-storage-decider.md)
-* [/raw-migrated-files/elasticsearch/elasticsearch-reference/autoscaling-frozen-existence-decider.md](/raw-migrated-files/elasticsearch/elasticsearch-reference/autoscaling-frozen-existence-decider.md)
-* [/raw-migrated-files/elasticsearch/elasticsearch-reference/autoscaling-machine-learning-decider.md](/raw-migrated-files/elasticsearch/elasticsearch-reference/autoscaling-machine-learning-decider.md)
-* [/raw-migrated-files/elasticsearch/elasticsearch-reference/autoscaling-fixed-decider.md](/raw-migrated-files/elasticsearch/elasticsearch-reference/autoscaling-fixed-decider.md)
\ No newline at end of file
+[Machine learning decider](#autoscaling-machine-learning-decider)
+: Estimates required memory capacity based on machine learning jobs. Available for policies governing machine learning nodes.
+
+[Fixed decider](#autoscaling-fixed-decider)
+: Responds with a fixed required capacity. This decider is intended for testing only.
+
+## Reactive storage decider [autoscaling-reactive-storage-decider]
+
+The [autoscaling](../../deploy-manage/autoscaling.md) reactive storage decider (`reactive_storage`) calculates the storage required to contain the current data set. It signals that additional storage capacity is necessary when existing capacity has been exceeded (reactively).
+
+The reactive storage decider is enabled for all policies governing data nodes and has no configuration options.
+
+The decider relies partially on using [data tier preference](../../manage-data/lifecycle/data-tiers.md#data-tier-allocation) allocation rather than node attributes. In particular, scaling a data tier into existence (starting the first node in a tier) will result in starting a node in any data tier that is empty if not using allocation based on data tier preference. Using the [ILM migrate](asciidocalypse://docs/elasticsearch/docs/reference/elasticsearch/index-lifecycle-actions/ilm-migrate.md) action to migrate between tiers is the preferred way of allocating to tiers and fully supports scaling a tier into existence.
+
+## Proactive storage decider [autoscaling-proactive-storage-decider]
+
+The [autoscaling](../../deploy-manage/autoscaling.md) proactive storage decider (`proactive_storage`) calculates the storage required to contain the current data set plus an estimated amount of expected additional data.
+
+The proactive storage decider is enabled for all policies governing nodes with the `data_hot` role.
+
+The estimation of expected additional data is based on past indexing that occurred within the `forecast_window`. Only indexing into data streams contributes to the estimate.
+
+### Configuration settings [autoscaling-proactive-storage-decider-settings]
+
+`forecast_window`
+: (Optional, [time value](asciidocalypse://docs/elasticsearch/docs/reference/elasticsearch/rest-apis/api-conventions.md#time-units)) The window of time to use for forecasting. Defaults to 30 minutes.
+
+
+### {{api-examples-title}} [autoscaling-proactive-storage-decider-examples]
+
+This example puts an autoscaling policy named `my_autoscaling_policy`, overriding the proactive decider’s `forecast_window` to be 10 minutes.
+
+```console
+PUT /_autoscaling/policy/my_autoscaling_policy
+{
+ "roles" : [ "data_hot" ],
+ "deciders": {
+ "proactive_storage": {
+ "forecast_window": "10m"
+ }
+ }
+}
+```
+
+The API returns the following result:
+
+```console-result
+{
+ "acknowledged": true
+}
+```
+
+## Frozen shards decider [autoscaling-frozen-shards-decider]
+
+The [autoscaling](../../deploy-manage/autoscaling.md) frozen shards decider (`frozen_shards`) calculates the memory required to search the current set of partially mounted indices in the frozen tier. Based on a required memory amount per shard, it calculates the necessary memory in the frozen tier.
+
+### Configuration settings [autoscaling-frozen-shards-decider-settings]
+
+`memory_per_shard`
+: (Optional, [byte value](asciidocalypse://docs/elasticsearch/docs/reference/elasticsearch/rest-apis/api-conventions.md#byte-units)) The memory needed per shard, in bytes. Defaults to 2000 shards per 64 GB node (roughly 32 MB per shard). Notice that this is total memory, not heap, assuming that the Elasticsearch default heap sizing mechanism is used and that nodes are not bigger than 64 GB.
+
+## Frozen storage decider [autoscaling-frozen-storage-decider]
+
+The [autoscaling](../../deploy-manage/autoscaling.md) frozen storage decider (`frozen_storage`) calculates the local storage required to search the current set of partially mounted indices based on a percentage of the total data set size of such indices. It signals that additional storage capacity is necessary when existing capacity is less than the percentage multiplied by total data set size.
+
+The frozen storage decider is enabled for all policies governing frozen data nodes and has no configuration options.
+
+### Configuration settings [autoscaling-frozen-storage-decider-settings]
+
+`percentage`
+: (Optional, number value) Percentage of local storage relative to the data set size. Defaults to 5.
+
+## Frozen existence decider [autoscaling-frozen-existence-decider]
+
+The [autoscaling](../../deploy-manage/autoscaling.md) frozen existence decider (`frozen_existence`) ensures that once the first index enters the frozen ILM phase, the frozen tier is scaled into existence.
+
+The frozen existence decider is enabled for all policies governing frozen data nodes and has no configuration options.
+
+## Machine learning decider [autoscaling-machine-learning-decider]
+
+The [autoscaling](../../deploy-manage/autoscaling.md) {{ml}} decider (`ml`) calculates the memory and CPU requirements to run {{ml}} jobs and trained models.
+
+The {{ml}} decider is enabled for policies governing `ml` nodes.
+
+::::{note}
+For {{ml}} jobs to open when the cluster is not appropriately scaled, set `xpack.ml.max_lazy_ml_nodes` to the largest number of possible {{ml}} nodes (refer to [Advanced machine learning settings](asciidocalypse://docs/elasticsearch/docs/reference/elasticsearch/configuration-reference/machine-learning-settings.md#advanced-ml-settings) for more information). In {{ess}}, this is automatically set.
+::::
+
+
+### Configuration settings [autoscaling-machine-learning-decider-settings]
+
+Both `num_anomaly_jobs_in_queue` and `num_analytics_jobs_in_queue` are designed to delay a scale-up event. If the cluster is too small, these settings indicate how many jobs of each type can be unassigned from a node. Both settings are only considered for jobs that can be opened given the current scale. If a job is too large for any node size or if a job can’t be assigned without user intervention (for example, a user calling `_stop` against a real-time {{anomaly-job}}), the numbers are ignored for that particular job.
+
+`num_anomaly_jobs_in_queue`
+: (Optional, integer) Specifies the number of queued {{anomaly-jobs}} to allow. Defaults to `0`.
+
+`num_analytics_jobs_in_queue`
+: (Optional, integer) Specifies the number of queued {{dfanalytics-jobs}} to allow. Defaults to `0`.
+
+`down_scale_delay`
+: (Optional, [time value](asciidocalypse://docs/elasticsearch/docs/reference/elasticsearch/rest-apis/api-conventions.md#time-units)) Specifies the time to delay before scaling down. Defaults to 1 hour. If a scale down is possible for the entire time window, then a scale down is requested. If the cluster requires a scale up during the window, the window is reset.
+
+
+### {{api-examples-title}} [autoscaling-machine-learning-decider-examples]
+
+This example creates an autoscaling policy named `my_autoscaling_policy` that overrides the default configuration of the {{ml}} decider.
+
+```console
+PUT /_autoscaling/policy/my_autoscaling_policy
+{
+ "roles" : [ "ml" ],
+ "deciders": {
+ "ml": {
+ "num_anomaly_jobs_in_queue": 5,
+ "num_analytics_jobs_in_queue": 3,
+ "down_scale_delay": "30m"
+ }
+ }
+}
+```
+
+The API returns the following result:
+
+```console-result
+{
+ "acknowledged": true
+}
+```
+
+## Fixed decider [autoscaling-fixed-decider]
+
+::::{warning}
+This functionality is in technical preview and may be changed or removed in a future release. Elastic will work to fix any issues, but features in technical preview are not subject to the support SLA of official GA features.
+::::
+
+
+::::{warning}
+The fixed decider is intended for testing only. Do not use this decider in production.
+::::
+
+
+The [autoscaling](../../deploy-manage/autoscaling.md) `fixed` decider responds with a fixed required capacity. It is not enabled by default but can be enabled for any policy by explicitly configuring it.
+
+### Configuration settings [_configuration_settings]
+
+`storage`
+: (Optional, [byte value](asciidocalypse://docs/elasticsearch/docs/reference/elasticsearch/rest-apis/api-conventions.md#byte-units)) Required amount of node-level storage. Defaults to `-1` (disabled).
+
+`memory`
+: (Optional, [byte value](asciidocalypse://docs/elasticsearch/docs/reference/elasticsearch/rest-apis/api-conventions.md#byte-units)) Required amount of node-level memory. Defaults to `-1` (disabled).
+
+`processors`
+: (Optional, float) Required number of processors. Defaults to disabled.
+
+`nodes`
+: (Optional, integer) Number of nodes to use when calculating capacity. Defaults to `1`.
+
+
+### {{api-examples-title}} [autoscaling-fixed-decider-examples]
+
+This example puts an autoscaling policy named `my_autoscaling_policy`, enabling and configuring the fixed decider.
+
+```console
+PUT /_autoscaling/policy/my_autoscaling_policy
+{
+ "roles" : [ "data_hot" ],
+ "deciders": {
+ "fixed": {
+ "storage": "1tb",
+ "memory": "32gb",
+ "processors": 2.3,
+ "nodes": 8
+ }
+ }
+}
+```
+
+The API returns the following result:
+
+```console-result
+{
+ "acknowledged": true
+}
+```
\ No newline at end of file
diff --git a/deploy-manage/autoscaling/autoscaling-in-ece-and-ech.md b/deploy-manage/autoscaling/autoscaling-in-ece-and-ech.md
new file mode 100644
index 0000000000..ce5621a8d8
--- /dev/null
+++ b/deploy-manage/autoscaling/autoscaling-in-ece-and-ech.md
@@ -0,0 +1,660 @@
+---
+mapped_urls:
+ - https://www.elastic.co/guide/en/cloud-heroku/current/ech-autoscaling.html
+ - https://www.elastic.co/guide/en/cloud/current/ec-autoscaling.html
+ - https://www.elastic.co/guide/en/cloud-enterprise/current/ece-autoscaling.html
+ - https://www.elastic.co/guide/en/cloud-enterprise/current/ece-autoscaling-example.html
+ - https://www.elastic.co/guide/en/cloud/current/ec-autoscaling-example.html
+ - https://www.elastic.co/guide/en/cloud-heroku/current/ech-autoscaling-example.html
+ - https://www.elastic.co/guide/en/cloud/current/ec-autoscaling-api-example.html
+ - https://www.elastic.co/guide/en/cloud-enterprise/current/ece-autoscaling-api-example.html
+applies_to:
+ deployment:
+ ece: ga
+ ess: ga
+navigation_title: "In ECE and ECH"
+---
+
+# Autoscaling in {{ece}} and {{ech}}
+
+## Overview [ec-autoscaling-intro]
+When you first create a deployment it can be challenging to determine the amount of storage your data nodes will require. The same is relevant for the amount of memory and CPU that you want to allocate to your machine learning nodes. It can become even more challenging to predict these requirements for weeks or months into the future. In an ideal scenario, these resources should be sized to both ensure efficient performance and resiliency, and to avoid excess costs. Autoscaling can help with this balance by adjusting the resources available to a deployment automatically as loads change over time, reducing the need for monitoring and manual intervention.
+
+To learn more about configuring and managing autoscaling, check the following sections:
+
+* [Overview](#ec-autoscaling-intro)
+* [When does autoscaling occur?](#ec-autoscaling-factors)
+* [Notifications](#ec-autoscaling-notifications)
+* [Restrictions and limitations](#ec-autoscaling-restrictions)
+* [Enable or disable autoscaling](#ec-autoscaling-enable)
+* [Update your autoscaling settings](#ec-autoscaling-update)
+
+You can also have a look at our [autoscaling example](#ec-autoscaling-example), as well as a sample request to [create an autoscaled deployment through the API](#ec-autoscaling-api-example).
+
+::::{note}
+Autoscaling is enabled for the Machine Learning tier by default for new deployments.
+::::
+
+Currently, autoscaling behavior is as follows:
+
+* **Data tiers**
+
+ * Each Elasticsearch [data tier](../../manage-data/lifecycle/data-tiers.md) scales upward based on the amount of available storage. When we detect more storage is needed, autoscaling will scale up each data tier independently to ensure you can continue and ingest more data to your hot and content tier, or move data to the warm, cold, or frozen data tiers.
+ * In addition to scaling up existing data tiers, a new data tier will be automatically added when necessary, based on your [index lifecycle management policies](https://www.elastic.co/guide/en/cloud-enterprise/current/ece-configure-index-management.html).
+ * To control the maximum size of each data tier and ensure it will not scale above a certain size, you can use the maximum size per zone field.
+ * Autoscaling based on memory or CPU, as well as autoscaling downward, is not currently supported. In case you want to adjust the size of your data tier to add more memory or CPU, or in case you deleted data and want to scale it down, you can set the current size per zone of each data tier manually.
+
+* **Machine learning nodes**
+
+ * Machine learning nodes can scale upward and downward based on the configured machine learning jobs.
+ * When a machine learning job is opened, or a machine learning trained model is deployed, if there are no machine learning nodes in your deployment, the autoscaling mechanism will automatically add machine learning nodes. Similarly, after a period of no active machine learning jobs, any enabled machine learning nodes are disabled automatically.
+ * To control the maximum size of your machine learning nodes and ensure they will not scale above a certain size, you can use the maximum size per zone field.
+ * To control the minimum size of your machine learning nodes and ensure the autoscaling mechanism will not scale machine learning below a certain size, you can use the minimum size per zone field.
+ * The determination of when to scale is based on the expected memory and CPU requirements for the currently configured machine learning jobs and trained models.
+
+::::{note}
+For any Elasticsearch component the number of availability zones is not affected by autoscaling. You can always set the number of availability zones manually and the autoscaling mechanism will add or remove capacity per availability zone.
+::::
+
+## When does autoscaling occur?[ec-autoscaling-factors]
+
+Several factors determine when data tiers or machine learning nodes are scaled.
+
+For a data tier, an autoscaling event can be triggered in the following cases:
+
+* Based on an assessment of how shards are currently allocated, and the amount of storage and buffer space currently available.
+
+* When past behavior on a hot tier indicates that the influx of data can increase significantly in the near future. Refer to [Reactive storage decider](autoscaling-deciders.md) and [Proactive storage decider](autoscaling-deciders.md) for more detail.
+
+* Through ILM policies. For example, if a deployment has only hot nodes and autoscaling is enabled, it automatically creates warm or cold nodes, if an ILM policy is trying to move data from hot to warm or cold nodes.
+
+On machine learning nodes, scaling is determined by an estimate of the memory and CPU requirements for the currently configured jobs and trained models. When a new machine learning job tries to start, it looks for a node with adequate native memory and CPU capacity. If one cannot be found, it stays in an `opening` state. If this waiting job exceeds the queueing limit set in the machine learning decider, a scale up is requested. Conversely, as machine learning jobs run, their memory and CPU usage might decrease or other running jobs might finish or close. In this case, if the duration of decreased resource usage exceeds the set value for `down_scale_delay`, a scale down is requested. Check [Machine learning decider](autoscaling-deciders.md)for more detail. To learn more about machine learning jobs in general, check [Create anomaly detection jobs](../../explore-analyze/machine-learning/anomaly-detection/ml-ad-run-jobs.md#ml-ad-create-job)
+
+On a highly available deployment, autoscaling events are always applied to instances in each availability zone simultaneously, to ensure consistency.
+
+## Notifications[ec-autoscaling-notifications]
+In the event that a data tier or machine learning node scales up to its maximum possible size, you’ll receive an email, and a notice also appears on the deployment overview page prompting you to adjust your autoscaling settings to ensure optimal performance.
+
+In {{ece}} deployments, a warning is also issued in the ECE `service-constructor` logs with the field `labels.autoscaling_notification_type` and a value of `data-tier-at-limit` (for a fully scaled data tier) or `ml-tier-at-limit` (for a fully scaled machine learning node). The warning is indexed in the `logging-and-metrics` deployment, so you can use that event to [configure an email notification](../../explore-analyze/alerts-cases/watcher.md).
+
+## Restrictions and limitations[ec-autoscaling-restrictions]
+
+The following are known limitations and restrictions with autoscaling:
+
+* Autoscaling will not run if the cluster is unhealthy or if the last Elasticsearch plan failed.
+
+In {{ech}} the following additional limitations apply:
+
+* Trial deployments cannot be configured to autoscale beyond the normal Trial deployment size limits. The maximum size per zone is increased automatically from the Trial limit when you convert to a paid subscription.
+* ELSER deployments do not scale automatically. For more information, refer to [ELSER](../../explore-analyze/machine-learning/nlp/ml-nlp-elser.md) and [Trained model autoscaling](../autoscaling/trained-model-autoscaling.md).
+
+In {{ece}}, the following additional limitations apply:
+
+* In the event that an override is set for the instance size or disk quota multiplier for an instance by means of the [Instance Overrides API](https://www.elastic.co/docs/api/doc/cloud-enterprise/operation/operation-set-all-instances-settings-overrides), autoscaling will be effectively disabled. It’s recommended to avoid adjusting the instance size or disk quota multiplier for an instance that uses autoscaling, since the setting prevents autoscaling.
+
+## Enable or disable autoscaling[ec-autoscaling-enable]
+
+To enable or disable autoscaling on a deployment:
+
+1. Log in to the ECE [Cloud UI](../deploy/cloud-enterprise/log-into-cloud-ui.md) or [{{ecloud}} Console](https://cloud.elastic.co?page=docs&placement=docs-body).
+
+2. On the **Deployments** page, select your deployment.
+
+ Narrow your deployments by name, ID, or choose from several other filters. To customize your view, use a combination of filters, or change the format from a grid to a list.
+
+
+3. In your deployment menu, select **Edit**.
+4. Select desired autoscaling configuration for this deployment using **Enable Autoscaling for:** dropdown menu.
+5. Select **Confirm** to have the autoscaling change and any other settings take effect. All plan changes are shown on the Deployment **Activity** page.
+
+When autoscaling has been enabled, the autoscaled nodes resize according to the [autoscaling settings](#ec-autoscaling-update). Current sizes are shown on the deployment overview page.
+
+When autoscaling has been disabled, you need to adjust the size of data tiers and machine learning nodes manually.
+
+## Update your autoscaling settings[ec-autoscaling-update]
+
+Each autoscaling setting is configured with a default value. You can adjust these if necessary, as follows:
+
+1. Log in to the ECE [Cloud UI](../deploy/cloud-enterprise/log-into-cloud-ui.md) or [{{ecloud}} Console](https://cloud.elastic.co?page=docs&placement=docs-body).
+
+2. On the **Deployments** page, select your deployment.
+
+ Narrow your deployments by name, ID, or choose from several other filters. To customize your view, use a combination of filters, or change the format from a grid to a list.
+
+3. In your deployment menu, select **Edit**.
+4. To update a data tier:
+
+ 1. Use the dropdown box to set the **Maximum size per zone** to the largest amount of resources that should be allocated to the data tier automatically. The resources will not scale above this value.
+ 2. You can also update the **Current size per zone**. If you update this setting to match the **Maximum size per zone**, the data tier will remain fixed at that size.
+ 3. For a hot data tier you can also adjust the **Forecast window**. This is the duration of time, up to the present, for which past storage usage is assessed in order to predict when additional storage is needed.
+ 4. Select **Save** to apply the changes to your deployment.
+
+5. To update machine learning nodes:
+
+ 1. Use the dropdown box to set the **Minimum size per zone** and **Maximum size per zone** to the smallest and largest amount of resources, respectively, that should be allocated to the nodes automatically. The resources allocated to machine learning will not exceed these values. If you set these two settings to the same value, the machine learning node will remain fixed at that size.
+ 2. Select **Save** to apply the changes to your deployment.
+
+% ECE NOTE
+::::{note} - {{ece}}
+On Elastic Cloud Enterprise, system-owned deployment templates include the default values for all deployment autoscaling settings.
+::::
+
+## Autoscaling example [ec-autoscaling-example]
+
+To help you better understand the available autoscaling settings, this example describes a typical autoscaling workflow on sample Elastic Cloud Enterprise or {{ech}} deployment.
+
+1. Enable autoscaling:
+
+ * On an **existing deployment**, open the deployment **Edit** page to find the option to turn on autoscaling.
+ * When you create a new deployment, you can find the autoscaling option under **Advanced settings**.
+
+ Once you confirm your changes or create a new deployment, autoscaling is activated with system default settings that you can adjust as needed (though for most use cases the default settings will likely suffice).
+
+2. View and adjust autoscaling settings on data tiers:
+
+ 1. Open the **Edit** page for your deployment to get the current and maximum size per zone of each Elasticsearch data tier. In this example, the hot data and content tier has the following settings:
+
+ | | | |
+ | --- | --- | --- |
+ | **Current size per zone** | **Maximum size per zone** | |
+ | 45GB storage | 1.41TB storage | |
+ | 1GB RAM | 32GB RAM | |
+ | Up to 2.5 vCPU | 5 vCPU | |
+
+ The fault tolerance for the data tier is set to 2 availability zones.
+
+ :::{image} ../../images/cloud-enterprise-ec-ce-autoscaling-data-summary2.png
+ :alt: A screenshot showing sizing information for the autoscaled data tier
+ :::
+
+ 2. Use the dropdown boxes to adjust the current and/or the maximum size of the data tier. Capacity will be added to the hot content and data tier when required, based on its past and present storage usage, until it reaches the maximum size per zone. Any scaling events are applied simultaneously across availability zones. In this example, the tier has plenty of room to scale relative to its current size, and it will not scale above the maximum size setting. There is no minimum size setting since downward scaling is currently not supported on data tiers.
+
+3. View and adjust autoscaling settings on a machine learning instance:
+
+ 1. From the deployment **Edit** page you can check the minimum and maximum size of your deployment’s machine learning instances. In this example, the machine learning instance has the following settings:
+
+ | | | |
+ | --- | --- | --- |
+ | **Minimum size per zone** | **Maximum size per zone** | |
+ | 1GB RAM | 64GB RAM | |
+ | 0.5 vCPU up to 8 vCPU | 32 vCPU | |
+
+ The fault tolerance for the machine learning instance is set to 1 availability zone.
+
+ :::{image} ../../images/cloud-enterprise-ec-ce-autoscaling-ml-summary2.png
+ :alt: A screenshot showing sizing information for the autoscaled machine learning node
+ :::
+
+ 2. Use the dropdown boxes to adjust the minimum and/or the maximum size of the data tier. Capacity will be added to or removed from the machine learning instances as needed. The need for a scaling event is determined by the expected memory and vCPU requirements for the currently configured machine learning job. Any scaling events are applied simultaneously across availability zones. Note that unlike data tiers, machine learning nodes do not have a **Current size per zone** setting. That setting is not needed since machine learning nodes support both upward and downward scaling.
+
+4. Over time, the volume of data and the size of any machine learning jobs in your deployment are likely to grow. Let’s assume that to meet storage requirements your hot data tier has scaled up to its maximum allowed size of 64GB RAM and 32 vCPU. At this point, a notification appears on the deployment overview page indicating that the tier has scaled to capacity.
+5. If you expect a continued increase in either storage, memory, or vCPU requirements, you can use the **Maximum size per zone** dropdown box to adjust the maximum capacity settings for your data tiers and machine learning instances, as appropriate. And, you can always re-adjust these levels downward if the requirements change.
+
+As you can see, autoscaling greatly reduces the manual work involved to manage a deployment. The deployment capacity adjusts automatically as demands change, within the boundaries that you define. Check our main [Deployment autoscaling](../autoscaling.md) page for more information.
+
+## Autoscaling through the API [ec-autoscaling-api-example]
+
+This example demonstrates how to use the {{ecloud}} RESTful API to create a deployment with autoscaling enabled.
+
+The example deployment has a hot data and content tier, warm data tier, cold data tier, and a machine learning node, all of which will scale within the defined parameters. To learn about the autoscaling settings, check [Deployment autoscaling](../autoscaling.md) and [Autoscaling example](#ec-autoscaling-example).
+
+To learn more about the {{ece}} API, see the [RESTful API](asciidocalypse://docs/cloud/docs/reference/cloud-enterprise/restful-api.md) documentation. For details on the {{ech}} API, check [RESTful API](asciidocalypse://docs/cloud/docs/reference/cloud-hosted/ec-api-restful.md).
+
+### Requirements [ec_requirements]
+
+Note the following requirements when you run this API request:
+
+* All Elasticsearch components must be included in the request, even if they are not enabled (that is, if they have a zero size). All components are included in this example.
+* The request requires a format that supports data tiers. Specifically, all Elasticsearch components must contain the following properties:
+
+ * `id`
+ * `node_attributes`
+ * `node_roles`
+
+* The `size`, `autoscaling_min`, and `autoscaling_max` properties must be specified according to the following rules. This is because:
+
+ * On data tiers only upward scaling is currently supported.
+ * On machine learning nodes both upward and downward scaling is supported.
+ * On all other components autoscaling is not currently supported.
+* On {{ece}}, autoscaling is supported for custom deployment templates on version 2.12 and above. To learn more, refer to [Updating custom templates to support `node_roles` and autoscaling](../deploy/cloud-enterprise/ce-add-support-for-node-roles-autoscaling.md).
+
+$$$ece-autoscaling-api-example-requirements-table$$$
+
+| | | | |
+| --- | --- | --- | --- |
+| | `size` | `autoscaling_min` | `autoscaling_max` |
+| data tier | ✓ | ✕ | ✓ |
+| machine learning node | ✕ | ✓ | ✓ |
+| coordinating and master nodes | ✓ | ✕ | ✕ |
+| Kibana | ✓ | ✕ | ✕ |
+| APM | ✓ | ✕ | ✕ |
+
+* ✓ = Include the property.
+* ✕ = Do not include the property.
+
+* These rules match the behavior of the {{ech}} and {{ece}} user console.
+
+* The `elasticsearch` object must contain the property `"autoscaling_enabled": true`.
+
+### API request example [ec_api_request_example]
+
+::::{note}
+Although autoscaling can scale some tiers by CPU, the primary measurement of tier size is memory. Limits on tier size are in terms of memory.
+::::
+
+Run this example API request to create a deployment with autoscaling:
+
+::::{tab-set}
+
+:::{tab-item} {{ece}}
+
+```sh
+curl -k -X POST -H "Authorization: ApiKey $ECE_API_KEY" https://$COORDINATOR_HOST:12443/api/v1/deployments -H 'content-type: application/json' -d '
+{
+ "name": "my-first-autoscaling-deployment",
+ "resources": {
+ "elasticsearch": [
+ {
+ "ref_id": "main-elasticsearch",
+ "region": "ece-region",
+ "plan": {
+ "autoscaling_enabled": true,
+ "cluster_topology": [
+ {
+ "id": "hot_content",
+ "node_roles": [
+ "master",
+ "ingest",
+ "remote_cluster_client",
+ "data_hot",
+ "transform",
+ "data_content"
+ ],
+ "zone_count": 1,
+ "elasticsearch": {
+ "node_attributes": {
+ "data": "hot"
+ },
+ "enabled_built_in_plugins": []
+ },
+ "instance_configuration_id": "data.default",
+ "size": {
+ "value": 4096,
+ "resource": "memory"
+ },
+ "autoscaling_max": {
+ "value": 2097152,
+ "resource": "memory"
+ }
+ },
+ {
+ "id": "warm",
+ "node_roles": [
+ "data_warm",
+ "remote_cluster_client"
+ ],
+ "zone_count": 1,
+ "elasticsearch": {
+ "node_attributes": {
+ "data": "warm"
+ },
+ "enabled_built_in_plugins": []
+ },
+ "instance_configuration_id": "data.highstorage",
+ "size": {
+ "value": 0,
+ "resource": "memory"
+ },
+ "autoscaling_max": {
+ "value": 2097152,
+ "resource": "memory"
+ }
+ },
+ {
+ "id": "cold",
+ "node_roles": [
+ "data_cold",
+ "remote_cluster_client"
+ ],
+ "zone_count": 1,
+ "elasticsearch": {
+ "node_attributes": {
+ "data": "cold"
+ },
+ "enabled_built_in_plugins": []
+ },
+ "instance_configuration_id": "data.highstorage",
+ "size": {
+ "value": 0,
+ "resource": "memory"
+ },
+ "autoscaling_max": {
+ "value": 2097152,
+ "resource": "memory"
+ }
+ },
+ {
+ "id": "coordinating",
+ "node_roles": [
+ "ingest",
+ "remote_cluster_client"
+ ],
+ "zone_count": 1,
+ "instance_configuration_id": "coordinating",
+ "size": {
+ "value": 0,
+ "resource": "memory"
+ },
+ "elasticsearch": {
+ "enabled_built_in_plugins": []
+ }
+ },
+ {
+ "id": "master",
+ "node_roles": [
+ "master"
+ ],
+ "zone_count": 1,
+ "instance_configuration_id": "master",
+ "size": {
+ "value": 0,
+ "resource": "memory"
+ },
+ "elasticsearch": {
+ "enabled_built_in_plugins": []
+ }
+ },
+ {
+ "id": "ml",
+ "node_roles": [
+ "ml",
+ "remote_cluster_client"
+ ],
+ "zone_count": 1,
+ "instance_configuration_id": "ml",
+ "autoscaling_min": {
+ "value": 0,
+ "resource": "memory"
+ },
+ "autoscaling_max": {
+ "value": 2097152,
+ "resource": "memory"
+ },
+ "elasticsearch": {
+ "enabled_built_in_plugins": []
+ }
+ }
+ ],
+ "elasticsearch": {
+ "version": "8.13.1"
+ },
+ "deployment_template": {
+ "id": "default"
+ }
+ },
+ "settings": {
+ "dedicated_masters_threshold": 6
+ }
+ }
+ ],
+ "kibana": [
+ {
+ "ref_id": "main-kibana",
+ "elasticsearch_cluster_ref_id": "main-elasticsearch",
+ "region": "ece-region",
+ "plan": {
+ "zone_count": 1,
+ "cluster_topology": [
+ {
+ "instance_configuration_id": "kibana",
+ "size": {
+ "value": 1024,
+ "resource": "memory"
+ },
+ "zone_count": 1
+ }
+ ],
+ "kibana": {
+ "version": "8.13.1"
+ }
+ }
+ }
+ ],
+ "apm": [
+ {
+ "ref_id": "main-apm",
+ "elasticsearch_cluster_ref_id": "main-elasticsearch",
+ "region": "ece-region",
+ "plan": {
+ "cluster_topology": [
+ {
+ "instance_configuration_id": "apm",
+ "size": {
+ "value": 512,
+ "resource": "memory"
+ },
+ "zone_count": 1
+ }
+ ],
+ "apm": {
+ "version": "8.13.1"
+ }
+ }
+ }
+ ],
+ "enterprise_search": []
+ }
+}
+'
+```
+
+:::
+
+:::{tab-item} {{ech}}
+
+```sh
+curl -XPOST \
+-H 'Content-Type: application/json' \
+-H "Authorization: ApiKey $EC_API_KEY" \
+"https://api.elastic-cloud.com/api/v1/deployments" \
+-d '
+{
+ "name": "my-first-autoscaling-deployment",
+ "resources": {
+ "elasticsearch": [
+ {
+ "ref_id": "main-elasticsearch",
+ "region": "us-east-1",
+ "plan": {
+ "autoscaling_enabled": true,
+ "cluster_topology": [
+ {
+ "id": "hot_content",
+ "node_roles": [
+ "remote_cluster_client",
+ "data_hot",
+ "transform",
+ "data_content",
+ "master",
+ "ingest"
+ ],
+ "zone_count": 2,
+ "elasticsearch": {
+ "node_attributes": {
+ "data": "hot"
+ },
+ "enabled_built_in_plugins": []
+ },
+ "instance_configuration_id": "aws.data.highio.i3",
+ "size": {
+ "resource": "memory",
+ "value": 8192
+ },
+ "autoscaling_max": {
+ "value": 118784,
+ "resource": "memory"
+ }
+ },
+ {
+ "id": "warm",
+ "node_roles": [
+ "data_warm",
+ "remote_cluster_client"
+ ],
+ "zone_count": 2,
+ "elasticsearch": {
+ "node_attributes": {
+ "data": "warm"
+ },
+ "enabled_built_in_plugins": []
+ },
+ "instance_configuration_id": "aws.data.highstorage.d3",
+ "size": {
+ "value": 0,
+ "resource": "memory"
+ },
+ "autoscaling_max": {
+ "value": 118784,
+ "resource": "memory"
+ }
+ },
+ {
+ "id": "cold",
+ "node_roles": [
+ "data_cold",
+ "remote_cluster_client"
+ ],
+ "zone_count": 1,
+ "elasticsearch": {
+ "node_attributes": {
+ "data": "cold"
+ },
+ "enabled_built_in_plugins": []
+ },
+ "instance_configuration_id": "aws.data.highstorage.d3",
+ "size": {
+ "value": 0,
+ "resource": "memory"
+ },
+ "autoscaling_max": {
+ "value": 59392,
+ "resource": "memory"
+ }
+ },
+ {
+ "id": "coordinating",
+ "zone_count": 2,
+ "node_roles": [
+ "ingest",
+ "remote_cluster_client"
+ ],
+ "instance_configuration_id": "aws.coordinating.m5d",
+ "size": {
+ "value": 0,
+ "resource": "memory"
+ },
+ "elasticsearch": {
+ "enabled_built_in_plugins": []
+ }
+ },
+ {
+ "id": "master",
+ "node_roles": [
+ "master"
+ ],
+ "zone_count": 3,
+ "instance_configuration_id": "aws.master.r5d",
+ "size": {
+ "value": 0,
+ "resource": "memory"
+ },
+ "elasticsearch": {
+ "enabled_built_in_plugins": []
+ }
+ },
+ {
+ "id": "ml",
+ "node_roles": [
+ "ml",
+ "remote_cluster_client"
+ ],
+ "zone_count": 1,
+ "instance_configuration_id": "aws.ml.m5d",
+ "autoscaling_min": {
+ "value": 0,
+ "resource": "memory"
+ },
+ "autoscaling_max": {
+ "value": 61440,
+ "resource": "memory"
+ },
+ "elasticsearch": {
+ "enabled_built_in_plugins": []
+ }
+ }
+ ],
+ "elasticsearch": {
+ "version": "7.11.0"
+ },
+ "deployment_template": {
+ "id": "aws-io-optimized-v2"
+ }
+ },
+ "settings": {
+ "dedicated_masters_threshold": 6
+ }
+ }
+ ],
+ "kibana": [
+ {
+ "elasticsearch_cluster_ref_id": "main-elasticsearch",
+ "region": "us-east-1",
+ "plan": {
+ "cluster_topology": [
+ {
+ "instance_configuration_id": "aws.kibana.r5d",
+ "zone_count": 1,
+ "size": {
+ "resource": "memory",
+ "value": 1024
+ }
+ }
+ ],
+ "kibana": {
+ "version": "7.11.0"
+ }
+ },
+ "ref_id": "main-kibana"
+ }
+ ],
+ "apm": [
+ {
+ "elasticsearch_cluster_ref_id": "main-elasticsearch",
+ "region": "us-east-1",
+ "plan": {
+ "cluster_topology": [
+ {
+ "instance_configuration_id": "aws.apm.r5d",
+ "zone_count": 1,
+ "size": {
+ "resource": "memory",
+ "value": 512
+ }
+ }
+ ],
+ "apm": {
+ "version": "7.11.0"
+ }
+ },
+ "ref_id": "main-apm"
+ }
+ ],
+ "enterprise_search": []
+ }
+}
+'
+```
+
+:::
+
+::::
\ No newline at end of file
diff --git a/deploy-manage/autoscaling/deployments-autoscaling-on-eck.md b/deploy-manage/autoscaling/autoscaling-in-eck.md
similarity index 79%
rename from deploy-manage/autoscaling/deployments-autoscaling-on-eck.md
rename to deploy-manage/autoscaling/autoscaling-in-eck.md
index 421d109b7a..cd27a422a1 100644
--- a/deploy-manage/autoscaling/deployments-autoscaling-on-eck.md
+++ b/deploy-manage/autoscaling/autoscaling-in-eck.md
@@ -1,9 +1,17 @@
---
mapped_pages:
- https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-autoscaling.html
+ - https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-stateless-autoscaling.html
+applies_to:
+ deployment:
+ eck: ga
+navigation_title: "In ECK"
---
+# Autoscaling in {{eck}}
-# Deployments autoscaling on ECK [k8s-autoscaling]
+Configure autoscaling for Elasticsearch deployments in {{eck}}. Learn how to enable autoscaling, define policies, manage resource limits, and monitor scaling. Includes details on autoscaling stateless applications like Kibana, APM Server, and Elastic Maps Server.
+
+## Deployments autoscaling on ECK [k8s-autoscaling]
::::{note}
Elasticsearch autoscaling requires a valid Enterprise license or Enterprise trial license. Check [the license documentation](../license/manage-your-license-in-eck.md) for more details about managing licenses.
@@ -13,12 +21,12 @@ Elasticsearch autoscaling requires a valid Enterprise license or Enterprise tria
ECK can leverage the [autoscaling API](https://www.elastic.co/docs/api/doc/elasticsearch/group/endpoint-autoscaling) introduced in Elasticsearch 7.11 to adjust automatically the number of Pods and the allocated resources in a tier. Currently, autoscaling is supported for Elasticsearch [data tiers](/manage-data/lifecycle/data-tiers.md) and machine learning nodes.
-## Enable autoscaling [k8s-enable]
+### Enable autoscaling [k8s-enable]
To enable autoscaling on an Elasticsearch cluster, you need to define one or more autoscaling policies. Each autoscaling policy applies to one or more NodeSets which share the same set of roles specified in the `node.roles` setting in the Elasticsearch configuration.
-### Define autoscaling policies [k8s-autoscaling-policies]
+#### Define autoscaling policies [k8s-autoscaling-policies]
Autoscaling policies can be defined in an `ElasticsearchAutoscaler` resource. Each autoscaling policy must have the following fields:
@@ -90,7 +98,7 @@ In the case of storage the following restrictions apply:
* Scaling up (vertically) is only supported if the available capacity in a PersistentVolume matches the capacity claimed in the PersistentVolumeClaim. Refer to the next section for more information.
-### Scale Up and Scale Out [k8s-autoscaling-algorithm]
+#### Scale Up and Scale Out [k8s-autoscaling-algorithm]
In order to adapt the resources to the workload, the operator first attempts to scale up the resources (cpu, memory, and storage) allocated to each node in the NodeSets. The operator always ensures that the requested resources are within the limits specified in the autoscaling policy. If each individual node has reached the limits specified in the autoscaling policy, but more resources are required to handle the load, then the operator adds some nodes to the NodeSets. Nodes are added up to the `max` value specified in the `nodeCount` of the policy.
@@ -126,7 +134,7 @@ spec:
```
-### Set the limits [k8s-autoscaling-resources]
+#### Set the limits [k8s-autoscaling-resources]
The value set for memory and CPU limits are computed by applying a ratio to the calculated resource request. The default ratio between the request and the limit for both CPU and memory is 1. This means that request and limit have the same value. You can change the default ratio between the request and the limit for both the CPU and memory ranges by using the `requestsToLimitsRatio` field.
@@ -162,7 +170,7 @@ spec:
You can find [a complete example in the ECK GitHub repository](https://github.com/elastic/cloud-on-k8s/blob/2.16/config/recipes/autoscaling/elasticsearch.yaml) which will also show you how to fine-tune the [autoscaling deciders](/deploy-manage/autoscaling/autoscaling-deciders.md).
-### Change the polling interval [k8s-autoscaling-polling-interval]
+#### Change the polling interval [k8s-autoscaling-polling-interval]
The Elasticsearch autoscaling capacity endpoint is polled every minute by the operator. This interval duration can be controlled using the `pollingPeriod` field in the autoscaling specification:
@@ -194,10 +202,10 @@ spec:
```
-## Monitoring [k8s-monitoring]
+### Monitoring [k8s-monitoring]
-### Autoscaling status [k8s-autoscaling-status]
+#### Autoscaling status [k8s-autoscaling-status]
In addition to the logs generated by the operator, an autoscaling status is maintained in the `ElasticsearchAutoscaler` resource. This status holds several `Conditions` to summarize the health and the status of the autoscaling mechanism. For example, dedicated `Conditions` may report if the controller cannot connect to the Elasticsearch cluster, or if a resource limit has been reached:
@@ -234,7 +242,7 @@ kubectl get elasticsearchautoscaler autoscaling-sample \
```
-### Expected resources [k8s-autoscaling-expected-resources]
+#### Expected resources [k8s-autoscaling-expected-resources]
The autoscaler status also contains a `policies` section which describes the expected resources for each NodeSet managed by an autoscaling policy.
@@ -270,7 +278,7 @@ kubectl get elasticsearchautoscaler.autoscaling.k8s.elastic.co/autoscaling-sampl
```
-### Events [k8s-events]
+#### Events [k8s-events]
Important events are also reported through Kubernetes events, for example when the maximum autoscaling size limit is reached:
@@ -281,7 +289,7 @@ Important events are also reported through Kubernetes events, for example when t
```
-## Disable autoscaling [k8s-disable]
+### Disable autoscaling [k8s-disable]
You can disable autoscaling at any time by deleting the `ElasticsearchAutoscaler` resource. For machine learning the following settings are not automatically reset:
@@ -291,3 +299,50 @@ You can disable autoscaling at any time by deleting the `ElasticsearchAutoscaler
You should adjust those settings manually to match the size of your deployment when you disable autoscaling.
+## Autoscaling stateless applications on ECK [k8s-stateless-autoscaling]
+
+::::{note}
+This section only applies to stateless applications. Check [Elasticsearch autoscaling](#k8s-autoscaling) for more details about scaling automatically Elasticsearch.
+::::
+
+
+The [Horizontal Pod Autoscaler](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale) can be used to automatically scale the deployments of the following resources:
+
+* Kibana
+* APM Server
+* Elastic Maps Server
+
+These resources expose the `scale` subresource which can be used by the Horizontal Pod Autoscaler controller to automatically adjust the number of replicas according to the CPU load or any other custom or external metric. This example shows how to create an `HorizontalPodAutoscaler` resource to adjust the replicas of a Kibana deployment according to the CPU load:
+
+```yaml
+apiVersion: elasticsearch.k8s.elastic.co/v1
+kind: Elasticsearch
+metadata:
+ name: elasticsearch-sample
+spec:
+ version: 8.16.1
+ nodeSets:
+ - name: default
+ count: 1
+ config:
+ node.store.allow_mmap: false
+
+apiVersion: autoscaling/v2beta2
+kind: HorizontalPodAutoscaler
+metadata:
+ name: kb
+spec:
+ scaleTargetRef:
+ apiVersion: kibana.k8s.elastic.co/v1
+ kind: Kibana
+ name: kibana-sample
+ minReplicas: 1
+ maxReplicas: 4
+ metrics:
+ - type: Resource
+ resource:
+ name: cpu
+ target:
+ type: Utilization
+ averageUtilization: 50
+```
\ No newline at end of file
diff --git a/deploy-manage/autoscaling/autoscaling-stateless-applications-on-eck.md b/deploy-manage/autoscaling/autoscaling-stateless-applications-on-eck.md
deleted file mode 100644
index 1a618acb0f..0000000000
--- a/deploy-manage/autoscaling/autoscaling-stateless-applications-on-eck.md
+++ /dev/null
@@ -1,53 +0,0 @@
----
-mapped_pages:
- - https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-stateless-autoscaling.html
----
-
-# Autoscaling stateless applications on ECK [k8s-stateless-autoscaling]
-
-::::{note}
-This section only applies to stateless applications. Check [Elasticsearch autoscaling](deployments-autoscaling-on-eck.md) for more details about scaling automatically Elasticsearch.
-::::
-
-
-The [Horizontal Pod Autoscaler](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale) can be used to automatically scale the deployments of the following resources:
-
-* Kibana
-* APM Server
-* Elastic Maps Server
-
-These resources expose the `scale` subresource which can be used by the Horizontal Pod Autoscaler controller to automatically adjust the number of replicas according to the CPU load or any other custom or external metric. This example shows how to create an `HorizontalPodAutoscaler` resource to adjust the replicas of a Kibana deployment according to the CPU load:
-
-```yaml
-apiVersion: elasticsearch.k8s.elastic.co/v1
-kind: Elasticsearch
-metadata:
- name: elasticsearch-sample
-spec:
- version: 8.16.1
- nodeSets:
- - name: default
- count: 1
- config:
- node.store.allow_mmap: false
-
-apiVersion: autoscaling/v2beta2
-kind: HorizontalPodAutoscaler
-metadata:
- name: kb
-spec:
- scaleTargetRef:
- apiVersion: kibana.k8s.elastic.co/v1
- kind: Kibana
- name: kibana-sample
- minReplicas: 1
- maxReplicas: 4
- metrics:
- - type: Resource
- resource:
- name: cpu
- target:
- type: Utilization
- averageUtilization: 50
-```
-
diff --git a/deploy-manage/autoscaling/ec-autoscaling-api-example.md b/deploy-manage/autoscaling/ec-autoscaling-api-example.md
deleted file mode 100644
index 92e391558a..0000000000
--- a/deploy-manage/autoscaling/ec-autoscaling-api-example.md
+++ /dev/null
@@ -1,266 +0,0 @@
----
-mapped_pages:
- - https://www.elastic.co/guide/en/cloud/current/ec-autoscaling-api-example.html
----
-
-# Autoscaling through the API [ec-autoscaling-api-example]
-
-This example demonstrates how to use the {{ecloud}} RESTful API to create a deployment with autoscaling enabled.
-
-The example deployment has a hot data and content tier, warm data tier, cold data tier, and a machine learning node, all of which will scale within the defined parameters. To learn about the autoscaling settings, check [Deployment autoscaling](../autoscaling.md) and [Autoscaling example](ec-autoscaling-example.md). For more information about using the {{ecloud}} API in general, check [RESTful API](asciidocalypse://docs/cloud/docs/reference/cloud-hosted/ec-api-restful.md).
-
-
-## Requirements [ec_requirements]
-
-Note the following requirements when you run this API request:
-
-* All Elasticsearch components must be included in the request, even if they are not enabled (that is, if they have a zero size). All components are included in this example.
-* The request requires a format that supports data tiers. Specifically, all Elasticsearch components must contain the following properties:
-
- * `id`
- * `node_attributes`
- * `node_roles`
-
-* The `size`, `autoscaling_min`, and `autoscaling_max` properties must be specified according to the following rules. This is because:
-
- * On data tiers only upward scaling is currently supported.
- * On machine learning nodes both upward and downward scaling is supported.
- * On all other components autoscaling is not currently supported.
-
-
-$$$ec-autoscaling-api-example-requirements-table$$$
-+
-
-| | | | |
-| --- | --- | --- | --- |
-| | `size` | `autoscaling_min` | `autoscaling_max` |
-| data tier | ✓ | ✕ | ✓ |
-| machine learning node | ✕ | ✓ | ✓ |
-| coordinating and master nodes | ✓ | ✕ | ✕ |
-| Kibana | ✓ | ✕ | ✕ |
-| APM | ✓ | ✕ | ✕ |
-
-+
-
-+ ✓ = Include the property.
-
-+ ✕ = Do not include the property.
-
-+ These rules match the behavior of the {{ecloud}} Console.
-
-+ * The `elasticsearch` object must contain the property `"autoscaling_enabled": true`.
-
-
-## API request example [ec_api_request_example]
-
-Run this example API request to create a deployment with autoscaling:
-
-
-```sh
-curl -XPOST \
--H 'Content-Type: application/json' \
--H "Authorization: ApiKey $EC_API_KEY" \
-"https://api.elastic-cloud.com/api/v1/deployments" \
--d '
-{
- "name": "my-first-autoscaling-deployment",
- "resources": {
- "elasticsearch": [
- {
- "ref_id": "main-elasticsearch",
- "region": "us-east-1",
- "plan": {
- "autoscaling_enabled": true,
- "cluster_topology": [
- {
- "id": "hot_content",
- "node_roles": [
- "remote_cluster_client",
- "data_hot",
- "transform",
- "data_content",
- "master",
- "ingest"
- ],
- "zone_count": 2,
- "elasticsearch": {
- "node_attributes": {
- "data": "hot"
- },
- "enabled_built_in_plugins": []
- },
- "instance_configuration_id": "aws.data.highio.i3",
- "size": {
- "resource": "memory",
- "value": 8192
- },
- "autoscaling_max": {
- "value": 118784,
- "resource": "memory"
- }
- },
- {
- "id": "warm",
- "node_roles": [
- "data_warm",
- "remote_cluster_client"
- ],
- "zone_count": 2,
- "elasticsearch": {
- "node_attributes": {
- "data": "warm"
- },
- "enabled_built_in_plugins": []
- },
- "instance_configuration_id": "aws.data.highstorage.d3",
- "size": {
- "value": 0,
- "resource": "memory"
- },
- "autoscaling_max": {
- "value": 118784,
- "resource": "memory"
- }
- },
- {
- "id": "cold",
- "node_roles": [
- "data_cold",
- "remote_cluster_client"
- ],
- "zone_count": 1,
- "elasticsearch": {
- "node_attributes": {
- "data": "cold"
- },
- "enabled_built_in_plugins": []
- },
- "instance_configuration_id": "aws.data.highstorage.d3",
- "size": {
- "value": 0,
- "resource": "memory"
- },
- "autoscaling_max": {
- "value": 59392,
- "resource": "memory"
- }
- },
- {
- "id": "coordinating",
- "zone_count": 2,
- "node_roles": [
- "ingest",
- "remote_cluster_client"
- ],
- "instance_configuration_id": "aws.coordinating.m5d",
- "size": {
- "value": 0,
- "resource": "memory"
- },
- "elasticsearch": {
- "enabled_built_in_plugins": []
- }
- },
- {
- "id": "master",
- "node_roles": [
- "master"
- ],
- "zone_count": 3,
- "instance_configuration_id": "aws.master.r5d",
- "size": {
- "value": 0,
- "resource": "memory"
- },
- "elasticsearch": {
- "enabled_built_in_plugins": []
- }
- },
- {
- "id": "ml",
- "node_roles": [
- "ml",
- "remote_cluster_client"
- ],
- "zone_count": 1,
- "instance_configuration_id": "aws.ml.m5d",
- "autoscaling_min": {
- "value": 0,
- "resource": "memory"
- },
- "autoscaling_max": {
- "value": 61440,
- "resource": "memory"
- },
- "elasticsearch": {
- "enabled_built_in_plugins": []
- }
- }
- ],
- "elasticsearch": {
- "version": "7.11.0"
- },
- "deployment_template": {
- "id": "aws-io-optimized-v2"
- }
- },
- "settings": {
- "dedicated_masters_threshold": 6
- }
- }
- ],
- "kibana": [
- {
- "elasticsearch_cluster_ref_id": "main-elasticsearch",
- "region": "us-east-1",
- "plan": {
- "cluster_topology": [
- {
- "instance_configuration_id": "aws.kibana.r5d",
- "zone_count": 1,
- "size": {
- "resource": "memory",
- "value": 1024
- }
- }
- ],
- "kibana": {
- "version": "7.11.0"
- }
- },
- "ref_id": "main-kibana"
- }
- ],
- "apm": [
- {
- "elasticsearch_cluster_ref_id": "main-elasticsearch",
- "region": "us-east-1",
- "plan": {
- "cluster_topology": [
- {
- "instance_configuration_id": "aws.apm.r5d",
- "zone_count": 1,
- "size": {
- "resource": "memory",
- "value": 512
- }
- }
- ],
- "apm": {
- "version": "7.11.0"
- }
- },
- "ref_id": "main-apm"
- }
- ],
- "enterprise_search": []
- }
-}
-'
-```
-
-::::{note}
-Although autoscaling can scale some tiers by CPU, the primary measurement of tier size is memory. Limits on tier size are in terms of memory.
-::::
-
-
diff --git a/deploy-manage/autoscaling/ec-autoscaling-example.md b/deploy-manage/autoscaling/ec-autoscaling-example.md
deleted file mode 100644
index 7ca8f25a50..0000000000
--- a/deploy-manage/autoscaling/ec-autoscaling-example.md
+++ /dev/null
@@ -1,58 +0,0 @@
----
-mapped_pages:
- - https://www.elastic.co/guide/en/cloud/current/ec-autoscaling-example.html
----
-
-# Autoscaling example [ec-autoscaling-example]
-
-To help you better understand the available autoscaling settings, this example describes a typical autoscaling workflow on sample {{ech}} deployment.
-
-1. Enable autoscaling:
-
- * On an **existing deployment**, open the deployment **Edit** page to find the option to turn on autoscaling.
- * When you create a new deployment, you can find the autoscaling option under **Advanced settings**.
-
- Once you confirm your changes or create a new deployment, autoscaling is activated with system default settings that you can adjust as needed (though for most use cases the default settings will likely suffice).
-
-2. View and adjust autoscaling settings on data tiers:
-
- 1. Open the **Edit** page for your deployment to get the current and maximum size per zone of each Elasticsearch data tier. In this example, the hot data and content tier has the following settings:
-
- | | | |
- | --- | --- | --- |
- | **Current size per zone** | **Maximum size per zone** | |
- | 45GB storage | 1.41TB storage | |
- | 1GB RAM | 32GB RAM | |
- | Up to 2.5 vCPU | 5 vCPU | |
-
- The fault tolerance for the data tier is set to 2 availability zones.
-
- :::{image} ../../images/cloud-ec-ce-autoscaling-data-summary2.png
- :alt: A screenshot showing sizing information for the autoscaled data tier
- :::
-
- 2. Use the dropdown boxes to adjust the current and/or the maximum size of the data tier. Capacity will be added to the hot content and data tier when required, based on its past and present storage usage, until it reaches the maximum size per zone. Any scaling events are applied simultaneously across availability zones. In this example, the tier has plenty of room to scale relative to its current size, and it will not scale above the maximum size setting. There is no minimum size setting since downward scaling is currently not supported on data tiers.
-
-3. View and adjust autoscaling settings on a machine learning instance:
-
- 1. From the deployment **Edit** page you can check the minimum and maximum size of your deployment’s machine learning instances. In this example, the machine learning instance has the following settings:
-
- | | | |
- | --- | --- | --- |
- | **Minimum size per zone** | **Maximum size per zone** | |
- | 1GB RAM | 64GB RAM | |
- | 0.5 vCPU up to 8 vCPU | 32 vCPU | |
-
- The fault tolerance for the machine learning instance is set to 1 availability zone.
-
- :::{image} ../../images/cloud-ec-ce-autoscaling-ml-summary2.png
- :alt: A screenshot showing sizing information for the autoscaled machine learning node
- :::
-
- 2. Use the dropdown boxes to adjust the minimum and/or the maximum size of the data tier. Capacity will be added to or removed from the machine learning instances as needed. The need for a scaling event is determined by the expected memory and vCPU requirements for the currently configured machine learning job. Any scaling events are applied simultaneously across availability zones. Note that unlike data tiers, machine learning nodes do not have a **Current size per zone** setting. That setting is not needed since machine learning nodes support both upward and downward scaling.
-
-4. Over time, the volume of data and the size of any machine learning jobs in your deployment are likely to grow. Let’s assume that to meet storage requirements your hot data tier has scaled up to its maximum allowed size of 64GB RAM and 32 vCPU. At this point, a notification appears on the deployment overview page letting you know that the tier has scaled to capacity. You’ll also receive an alert by email.
-5. If you expect a continued increase in either storage, memory, or vCPU requirements, you can use the **Maximum size per zone** dropdown box to adjust the maximum capacity settings for your data tiers and machine learning instances, as appropriate. And, you can always re-adjust these levels downward if the requirements change.
-
-As you can see, autoscaling greatly reduces the manual work involved to manage a deployment. The deployment capacity adjusts automatically as demands change, within the boundaries that you define. Check our main [Deployment autoscaling](../autoscaling.md) page for more information.
-
diff --git a/deploy-manage/autoscaling/ec-autoscaling.md b/deploy-manage/autoscaling/ec-autoscaling.md
deleted file mode 100644
index ace5653308..0000000000
--- a/deploy-manage/autoscaling/ec-autoscaling.md
+++ /dev/null
@@ -1,125 +0,0 @@
----
-mapped_pages:
- - https://www.elastic.co/guide/en/cloud/current/ec-autoscaling.html
----
-
-# Deployment autoscaling [ec-autoscaling]
-
-Autoscaling helps you to more easily manage your deployments by adjusting their available resources automatically, and currently supports scaling for both data and machine learning nodes, or machine learning nodes only. Check the following sections to learn more:
-
-* [Overview](../autoscaling.md#ec-autoscaling-intro)
-* [When does autoscaling occur?](../autoscaling.md#ec-autoscaling-factors)
-* [Notifications](../autoscaling.md#ec-autoscaling-notifications)
-* [Restrictions and limitations](../autoscaling.md#ec-autoscaling-restrictions)
-* [Enable or disable autoscaling](../autoscaling.md#ec-autoscaling-enable)
-* [Update your autoscaling settings](../autoscaling.md#ec-autoscaling-update)
-
-You can also have a look at our [autoscaling example](ec-autoscaling-example.md), as well as a sample request to [create an autoscaled deployment through the API](ec-autoscaling-api-example.md).
-
-
-## Overview [ec-autoscaling-intro]
-
-When you first create a deployment it can be challenging to determine the amount of storage your data nodes will require. The same is relevant for the amount of memory and CPU that you want to allocate to your machine learning nodes. It can become even more challenging to predict these requirements for weeks or months into the future. In an ideal scenario, these resources should be sized to both ensure efficient performance and resiliency, and to avoid excess costs. Autoscaling can help with this balance by adjusting the resources available to a deployment automatically as loads change over time, reducing the need for monitoring and manual intervention.
-
-::::{note}
-Autoscaling is enabled for the Machine Learning tier by default for new deployments.
-::::
-
-
-Currently, autoscaling behavior is as follows:
-
-* **Data tiers**
-
- * Each Elasticsearch [data tier](../../manage-data/lifecycle/data-tiers.md) scales upward based on the amount of available storage. When we detect more storage is needed, autoscaling will scale up each data tier independently to ensure you can continue and ingest more data to your hot and content tier, or move data to the warm, cold, or frozen data tiers.
- * In addition to scaling up existing data tiers, a new data tier will be automatically added when necessary, based on your [index lifecycle management policies](../../manage-data/lifecycle/index-lifecycle-management.md).
- * To control the maximum size of each data tier and ensure it will not scale above a certain size, you can use the maximum size per zone field.
- * Autoscaling based on memory or CPU, as well as autoscaling downward, is not currently supported. In case you want to adjust the size of your data tier to add more memory or CPU, or in case you deleted data and want to scale it down, you can set the current size per zone of each data tier manually.
-
-* **Machine learning nodes**
-
- * Machine learning nodes can scale upward and downward based on the configured machine learning jobs.
- * When a machine learning job is opened, or a machine learning trained model is deployed, if there are no machine learning nodes in your deployment, the autoscaling mechanism will automatically add machine learning nodes. Similarly, after a period of no active machine learning jobs, any enabled machine learning nodes are disabled automatically.
- * To control the maximum size of your machine learning nodes and ensure they will not scale above a certain size, you can use the maximum size per zone field.
- * To control the minimum size of your machine learning nodes and ensure the autoscaling mechanism will not scale machine learning below a certain size, you can use the minimum size per zone field.
- * The determination of when to scale is based on the expected memory and CPU requirements for the currently configured machine learning jobs and trained models.
-
-
-::::{note}
-The number of availability zones for each component of your {{ech}} deployments is not affected by autoscaling. You can always set the number of availability zones manually and the autoscaling mechanism will add or remove capacity per availability zone.
-::::
-
-
-
-## When does autoscaling occur? [ec-autoscaling-factors]
-
-Several factors determine when data tiers or machine learning nodes are scaled.
-
-For a data tier, an autoscaling event can be triggered in the following cases:
-
-* Based on an assessment of how shards are currently allocated, and the amount of storage and buffer space currently available.
-
-When past behavior on a hot tier indicates that the influx of data can increase significantly in the near future. Refer to [Reactive storage decider](autoscaling-deciders.md) and [Proactive storage decider](autoscaling-deciders.md) for more detail.
-
-* Through ILM policies. For example, if a deployment has only hot nodes and autoscaling is enabled, it automatically creates warm or cold nodes, if an ILM policy is trying to move data from hot to warm or cold nodes.
-
-On machine learning nodes, scaling is determined by an estimate of the memory and CPU requirements for the currently configured jobs and trained models. When a new machine learning job tries to start, it looks for a node with adequate native memory and CPU capacity. If one cannot be found, it stays in an `opening` state. If this waiting job exceeds the queueing limit set in the machine learning decider, a scale up is requested. Conversely, as machine learning jobs run, their memory and CPU usage might decrease or other running jobs might finish or close. In this case, if the duration of decreased resource usage exceeds the set value for `down_scale_delay`, a scale down is requested. Check [Machine learning decider](autoscaling-deciders.md) for more detail. To learn more about machine learning jobs in general, check [Create anomaly detection jobs](/explore-analyze/machine-learning/anomaly-detection/ml-ad-run-jobs.md).
-
-On a highly available deployment, autoscaling events are always applied to instances in each availability zone simultaneously, to ensure consistency.
-
-
-## Notifications [ec-autoscaling-notifications]
-
-In the event that a data tier or machine learning node scales up to its maximum possible size, you’ll receive an email, and a notice also appears on the deployment overview page prompting you to adjust your autoscaling settings to ensure optimal performance.
-
-
-## Restrictions and limitations [ec-autoscaling-restrictions]
-
-The following are known limitations and restrictions with autoscaling:
-
-* Autoscaling will not run if the cluster is unhealthy or if the last Elasticsearch plan failed.
-* Trial deployments cannot be configured to autoscale beyond the normal Trial deployment size limits. The maximum size per zone is increased automatically from the Trial limit when you convert to a paid subscription.
-* ELSER deployments do not scale automatically. For more information, refer to [ELSER](../../explore-analyze/machine-learning/nlp/ml-nlp-elser.md) and [Trained model autoscaling](../../explore-analyze/machine-learning/nlp/ml-nlp-auto-scale.md).
-
-
-## Enable or disable autoscaling [ec-autoscaling-enable]
-
-To enable or disable autoscaling on a deployment:
-
-1. Log in to the [{{ecloud}} Console](https://cloud.elastic.co?page=docs&placement=docs-body).
-2. On the **Deployments** page, select your deployment.
-
- On the **Deployments** page you can narrow your deployments by name, ID, or choose from several other filters. To customize your view, use a combination of filters, or change the format from a grid to a list.
-
-3. In your deployment menu, select **Edit**.
-4. Select desired autoscaling configuration for this deployment using **Enable Autoscaling for:** dropdown menu.
-5. Select **Confirm** to have the autoscaling change and any other settings take effect. All plan changes are shown on the Deployment **Activity** page.
-
-When autoscaling has been enabled, the autoscaled nodes resize according to the [autoscaling settings](../autoscaling.md#ec-autoscaling-update). Current sizes are shown on the deployment overview page.
-
-When autoscaling has been disabled, you need to adjust the size of data tiers and machine learning nodes manually.
-
-
-## Update your autoscaling settings [ec-autoscaling-update]
-
-Each autoscaling setting is configured with a default value. You can adjust these if necessary, as follows:
-
-1. Log in to the [{{ecloud}} Console](https://cloud.elastic.co?page=docs&placement=docs-body).
-2. On the **Deployments** page, select your deployment.
-
- On the **Deployments** page you can narrow your deployments by name, ID, or choose from several other filters. To customize your view, use a combination of filters, or change the format from a grid to a list.
-
-3. In your deployment menu, select **Edit**.
-4. To update a data tier:
-
- 1. Use the dropdown box to set the **Maximum size per zone** to the largest amount of resources that should be allocated to the data tier automatically. The resources will not scale above this value.
- 2. You can also update the **Current size per zone**. If you update this setting to match the **Maximum size per zone**, the data tier will remain fixed at that size.
- 3. For a hot data tier you can also adjust the **Forecast window**. This is the duration of time, up to the present, for which past storage usage is assessed in order to predict when additional storage is needed.
- 4. Select **Save** to apply the changes to your deployment.
-
-5. To update machine learning nodes:
-
- 1. Use the dropdown box to set the **Minimum size per zone** and **Maximum size per zone** to the smallest and largest amount of resources, respectively, that should be allocated to the nodes automatically. The resources allocated to machine learning will not exceed these values. If you set these two settings to the same value, the machine learning node will remain fixed at that size.
- 2. Select **Save** to apply the changes to your deployment.
-
-
-You can also view our [example](ec-autoscaling-example.md) of how the autoscaling settings work.
diff --git a/deploy-manage/autoscaling/ece-autoscaling-api-example.md b/deploy-manage/autoscaling/ece-autoscaling-api-example.md
deleted file mode 100644
index 881ac18ff8..0000000000
--- a/deploy-manage/autoscaling/ece-autoscaling-api-example.md
+++ /dev/null
@@ -1,264 +0,0 @@
----
-mapped_pages:
- - https://www.elastic.co/guide/en/cloud-enterprise/current/ece-autoscaling-api-example.html
----
-
-# Autoscaling through the API [ece-autoscaling-api-example]
-
-This example demonstrates how to use the Elastic Cloud Enterprise RESTful API to create a deployment with autoscaling enabled.
-
-The example deployment has a hot data and content tier, warm data tier, cold data tier, and a machine learning node, all of which will scale within the defined parameters. To learn about the autoscaling settings, check [Deployment autoscaling](../autoscaling.md) and [Autoscaling example](ece-autoscaling-example.md). For more information about using the Elastic Cloud Enterprise API in general, check [RESTful API](asciidocalypse://docs/cloud/docs/reference/cloud-enterprise/restful-api.md).
-
-
-## Requirements [ece_requirements_3]
-
-Note the following requirements when you run this API request:
-
-* On Elastic Cloud Enterprise, autoscaling is supported for custom deployment templates on version 2.12 and above. To learn more, refer to [Updating custom templates to support `node_roles` and autoscaling](../deploy/cloud-enterprise/ce-add-support-for-node-roles-autoscaling.md).
-* All Elasticsearch components must be included in the request, even if they are not enabled (that is, if they have a zero size). All components are included in this example.
-* The request requires a format that supports data tiers. Specifically, all Elasticsearch components must contain the following properties:
-
- * `id`
- * `node_attributes`
- * `node_roles`
-
-* The `size`, `autoscaling_min`, and `autoscaling_max` properties must be specified according to the following rules. This is because:
-
- * On data tiers only upward scaling is currently supported.
- * On machine learning nodes both upward and downward scaling is supported.
- * On all other components autoscaling is not currently supported.
-
-
-$$$ece-autoscaling-api-example-requirements-table$$$
-+
-
-| | | | |
-| --- | --- | --- | --- |
-| | `size` | `autoscaling_min` | `autoscaling_max` |
-| data tier | ✓ | ✕ | ✓ |
-| machine learning node | ✕ | ✓ | ✓ |
-| coordinating and master nodes | ✓ | ✕ | ✕ |
-| Kibana | ✓ | ✕ | ✕ |
-| APM | ✓ | ✕ | ✕ |
-
-+
-
-+ ✓ = Include the property.
-
-+ ✕ = Do not include the property.
-
-+ These rules match the behavior of the Elastic Cloud Enterprise user console.
-
-+ * The `elasticsearch` object must contain the property `"autoscaling_enabled": true`.
-
-
-## API request example [ece_api_request_example]
-
-Run this example API request to create a deployment with autoscaling:
-
-```sh
-curl -k -X POST -H "Authorization: ApiKey $ECE_API_KEY" https://$COORDINATOR_HOST:12443/api/v1/deployments -H 'content-type: application/json' -d '
-{
- "name": "my-first-autoscaling-deployment",
- "resources": {
- "elasticsearch": [
- {
- "ref_id": "main-elasticsearch",
- "region": "ece-region",
- "plan": {
- "autoscaling_enabled": true,
- "cluster_topology": [
- {
- "id": "hot_content",
- "node_roles": [
- "master",
- "ingest",
- "remote_cluster_client",
- "data_hot",
- "transform",
- "data_content"
- ],
- "zone_count": 1,
- "elasticsearch": {
- "node_attributes": {
- "data": "hot"
- },
- "enabled_built_in_plugins": []
- },
- "instance_configuration_id": "data.default",
- "size": {
- "value": 4096,
- "resource": "memory"
- },
- "autoscaling_max": {
- "value": 2097152,
- "resource": "memory"
- }
- },
- {
- "id": "warm",
- "node_roles": [
- "data_warm",
- "remote_cluster_client"
- ],
- "zone_count": 1,
- "elasticsearch": {
- "node_attributes": {
- "data": "warm"
- },
- "enabled_built_in_plugins": []
- },
- "instance_configuration_id": "data.highstorage",
- "size": {
- "value": 0,
- "resource": "memory"
- },
- "autoscaling_max": {
- "value": 2097152,
- "resource": "memory"
- }
- },
- {
- "id": "cold",
- "node_roles": [
- "data_cold",
- "remote_cluster_client"
- ],
- "zone_count": 1,
- "elasticsearch": {
- "node_attributes": {
- "data": "cold"
- },
- "enabled_built_in_plugins": []
- },
- "instance_configuration_id": "data.highstorage",
- "size": {
- "value": 0,
- "resource": "memory"
- },
- "autoscaling_max": {
- "value": 2097152,
- "resource": "memory"
- }
- },
- {
- "id": "coordinating",
- "node_roles": [
- "ingest",
- "remote_cluster_client"
- ],
- "zone_count": 1,
- "instance_configuration_id": "coordinating",
- "size": {
- "value": 0,
- "resource": "memory"
- },
- "elasticsearch": {
- "enabled_built_in_plugins": []
- }
- },
- {
- "id": "master",
- "node_roles": [
- "master"
- ],
- "zone_count": 1,
- "instance_configuration_id": "master",
- "size": {
- "value": 0,
- "resource": "memory"
- },
- "elasticsearch": {
- "enabled_built_in_plugins": []
- }
- },
- {
- "id": "ml",
- "node_roles": [
- "ml",
- "remote_cluster_client"
- ],
- "zone_count": 1,
- "instance_configuration_id": "ml",
- "autoscaling_min": {
- "value": 0,
- "resource": "memory"
- },
- "autoscaling_max": {
- "value": 2097152,
- "resource": "memory"
- },
- "elasticsearch": {
- "enabled_built_in_plugins": []
- }
- }
- ],
- "elasticsearch": {
- "version": "8.13.1"
- },
- "deployment_template": {
- "id": "default"
- }
- },
- "settings": {
- "dedicated_masters_threshold": 6
- }
- }
- ],
- "kibana": [
- {
- "ref_id": "main-kibana",
- "elasticsearch_cluster_ref_id": "main-elasticsearch",
- "region": "ece-region",
- "plan": {
- "zone_count": 1,
- "cluster_topology": [
- {
- "instance_configuration_id": "kibana",
- "size": {
- "value": 1024,
- "resource": "memory"
- },
- "zone_count": 1
- }
- ],
- "kibana": {
- "version": "8.13.1"
- }
- }
- }
- ],
- "apm": [
- {
- "ref_id": "main-apm",
- "elasticsearch_cluster_ref_id": "main-elasticsearch",
- "region": "ece-region",
- "plan": {
- "cluster_topology": [
- {
- "instance_configuration_id": "apm",
- "size": {
- "value": 512,
- "resource": "memory"
- },
- "zone_count": 1
- }
- ],
- "apm": {
- "version": "8.13.1"
- }
- }
- }
- ],
- "enterprise_search": []
- }
-}
-'
-```
-
-
-::::{note}
-Although autoscaling can scale some tiers by CPU, the primary measurement of tier size is memory. Limits on tier size are in terms of memory.
-::::
-
-
diff --git a/deploy-manage/autoscaling/ece-autoscaling-example.md b/deploy-manage/autoscaling/ece-autoscaling-example.md
deleted file mode 100644
index b0a9053598..0000000000
--- a/deploy-manage/autoscaling/ece-autoscaling-example.md
+++ /dev/null
@@ -1,58 +0,0 @@
----
-mapped_pages:
- - https://www.elastic.co/guide/en/cloud-enterprise/current/ece-autoscaling-example.html
----
-
-# Autoscaling example [ece-autoscaling-example]
-
-To help you better understand the available autoscaling settings, this example describes a typical autoscaling workflow on sample Elastic Cloud Enterprise deployment.
-
-1. Enable autoscaling:
-
- * On an **existing deployment**, open the deployment **Edit** page to find the option to turn on autoscaling.
- * When you create a new deployment, you can find the autoscaling option under **Advanced settings**.
-
- Once you confirm your changes or create a new deployment, autoscaling is activated with system default settings that you can adjust as needed (though for most use cases the default settings will likely suffice).
-
-2. View and adjust autoscaling settings on data tiers:
-
- 1. Open the **Edit** page for your deployment to get the current and maximum size per zone of each Elasticsearch data tier. In this example, the hot data and content tier has the following settings:
-
- | | | |
- | --- | --- | --- |
- | **Current size per zone** | **Maximum size per zone** | |
- | 45GB storage | 1.41TB storage | |
- | 1GB RAM | 32GB RAM | |
- | Up to 2.5 vCPU | 5 vCPU | |
-
- The fault tolerance for the data tier is set to 2 availability zones.
-
- :::{image} ../../images/cloud-enterprise-ec-ce-autoscaling-data-summary2.png
- :alt: A screenshot showing sizing information for the autoscaled data tier
- :::
-
- 2. Use the dropdown boxes to adjust the current and/or the maximum size of the data tier. Capacity will be added to the hot content and data tier when required, based on its past and present storage usage, until it reaches the maximum size per zone. Any scaling events are applied simultaneously across availability zones. In this example, the tier has plenty of room to scale relative to its current size, and it will not scale above the maximum size setting. There is no minimum size setting since downward scaling is currently not supported on data tiers.
-
-3. View and adjust autoscaling settings on a machine learning instance:
-
- 1. From the deployment **Edit** page you can check the minimum and maximum size of your deployment’s machine learning instances. In this example, the machine learning instance has the following settings:
-
- | | | |
- | --- | --- | --- |
- | **Minimum size per zone** | **Maximum size per zone** | |
- | 1GB RAM | 64GB RAM | |
- | 0.5 vCPU up to 8 vCPU | 32 vCPU | |
-
- The fault tolerance for the machine learning instance is set to 1 availability zone.
-
- :::{image} ../../images/cloud-enterprise-ec-ce-autoscaling-ml-summary2.png
- :alt: A screenshot showing sizing information for the autoscaled machine learning node
- :::
-
- 2. Use the dropdown boxes to adjust the minimum and/or the maximum size of the data tier. Capacity will be added to or removed from the machine learning instances as needed. The need for a scaling event is determined by the expected memory and vCPU requirements for the currently configured machine learning job. Any scaling events are applied simultaneously across availability zones. Note that unlike data tiers, machine learning nodes do not have a **Current size per zone** setting. That setting is not needed since machine learning nodes support both upward and downward scaling.
-
-4. Over time, the volume of data and the size of any machine learning jobs in your deployment are likely to grow. Let’s assume that to meet storage requirements your hot data tier has scaled up to its maximum allowed size of 64GB RAM and 32 vCPU. At this point, a notification appears on the deployment overview page indicating that the tier has scaled to capacity.
-5. If you expect a continued increase in either storage, memory, or vCPU requirements, you can use the **Maximum size per zone** dropdown box to adjust the maximum capacity settings for your data tiers and machine learning instances, as appropriate. And, you can always re-adjust these levels downward if the requirements change.
-
-As you can see, autoscaling greatly reduces the manual work involved to manage a deployment. The deployment capacity adjusts automatically as demands change, within the boundaries that you define. Check our main [Deployment autoscaling](../autoscaling.md) page for more information.
-
diff --git a/deploy-manage/autoscaling/ece-autoscaling.md b/deploy-manage/autoscaling/ece-autoscaling.md
deleted file mode 100644
index 682e76e054..0000000000
--- a/deploy-manage/autoscaling/ece-autoscaling.md
+++ /dev/null
@@ -1,130 +0,0 @@
----
-mapped_pages:
- - https://www.elastic.co/guide/en/cloud-enterprise/current/ece-autoscaling.html
----
-
-# Deployment autoscaling [ece-autoscaling]
-
-Autoscaling helps you to more easily manage your deployments by adjusting their available resources automatically, and currently supports scaling for both data and machine learning nodes, or machine learning nodes only. Check the following sections to learn more:
-
-* [Overview](../autoscaling.md#ece-autoscaling-intro)
-* [When does autoscaling occur?](../autoscaling.md#ece-autoscaling-factors)
-* [Notifications](../autoscaling.md#ece-autoscaling-notifications)
-* [Restrictions and limitations](../autoscaling.md#ece-autoscaling-restrictions)
-* [Enable or disable autoscaling](../autoscaling.md#ece-autoscaling-enable)
-* [Update your autoscaling settings](../autoscaling.md#ece-autoscaling-update)
-
-You can also have a look at our [autoscaling example](ece-autoscaling-example.md), as well as a sample request to [create an autoscaled deployment through the API](ece-autoscaling-api-example.md).
-
-
-## Overview [ece-autoscaling-intro]
-
-When you first create a deployment it can be challenging to determine the amount of storage your data nodes will require. The same is relevant for the amount of memory and CPU that you want to allocate to your machine learning nodes. It can become even more challenging to predict these requirements for weeks or months into the future. In an ideal scenario, these resources should be sized to both ensure efficient performance and resiliency, and to avoid excess costs. Autoscaling can help with this balance by adjusting the resources available to a deployment automatically as loads change over time, reducing the need for monitoring and manual intervention.
-
-::::{note}
-Autoscaling is enabled for the Machine Learning tier by default for new deployments.
-::::
-
-
-Currently, autoscaling behavior is as follows:
-
-* **Data tiers**
-
- * Each Elasticsearch [data tier](../../manage-data/lifecycle/data-tiers.md) scales upward based on the amount of available storage. When we detect more storage is needed, autoscaling will scale up each data tier independently to ensure you can continue and ingest more data to your hot and content tier, or move data to the warm, cold, or frozen data tiers.
- * In addition to scaling up existing data tiers, a new data tier will be automatically added when necessary, based on your [index lifecycle management policies](https://www.elastic.co/guide/en/cloud-enterprise/current/ece-configure-index-management.html).
- * To control the maximum size of each data tier and ensure it will not scale above a certain size, you can use the maximum size per zone field.
- * Autoscaling based on memory or CPU, as well as autoscaling downward, is not currently supported. In case you want to adjust the size of your data tier to add more memory or CPU, or in case you deleted data and want to scale it down, you can set the current size per zone of each data tier manually.
-
-* **Machine learning nodes**
-
- * Machine learning nodes can scale upward and downward based on the configured machine learning jobs.
- * When a machine learning job is opened, or a machine learning trained model is deployed, if there are no machine learning nodes in your deployment, the autoscaling mechanism will automatically add machine learning nodes. Similarly, after a period of no active machine learning jobs, any enabled machine learning nodes are disabled automatically.
- * To control the maximum size of your machine learning nodes and ensure they will not scale above a certain size, you can use the maximum size per zone field.
- * To control the minimum size of your machine learning nodes and ensure the autoscaling mechanism will not scale machine learning below a certain size, you can use the minimum size per zone field.
- * The determination of when to scale is based on the expected memory and CPU requirements for the currently configured machine learning jobs and trained models.
-
-
-::::{note}
-For any Elastic Cloud Enterprise Elasticsearch component the number of availability zones is not affected by autoscaling. You can always set the number of availability zones manually and the autoscaling mechanism will add or remove capacity per availability zone.
-::::
-
-
-
-## When does autoscaling occur? [ece-autoscaling-factors]
-
-Several factors determine when data tiers or machine learning nodes are scaled.
-
-For a data tier, an autoscaling event can be triggered in the following cases:
-
-* Based on an assessment of how shards are currently allocated, and the amount of storage and buffer space currently available.
-
-When past behavior on a hot tier indicates that the influx of data can increase significantly in the near future. Refer to [Reactive storage decider](autoscaling-deciders.md) and [Proactive storage decider](autoscaling-deciders.md) for more detail.
-
-* Through ILM policies. For example, if a deployment has only hot nodes and autoscaling is enabled, it automatically creates warm or cold nodes, if an ILM policy is trying to move data from hot to warm or cold nodes.
-
-On machine learning nodes, scaling is determined by an estimate of the memory and CPU requirements for the currently configured jobs and trained models. When a new machine learning job tries to start, it looks for a node with adequate native memory and CPU capacity. If one cannot be found, it stays in an `opening` state. If this waiting job exceeds the queueing limit set in the machine learning decider, a scale up is requested. Conversely, as machine learning jobs run, their memory and CPU usage might decrease or other running jobs might finish or close. In this case, if the duration of decreased resource usage exceeds the set value for `down_scale_delay`, a scale down is requested. Check [Machine learning decider](autoscaling-deciders.md) for more detail. To learn more about machine learning jobs in general, check [Create anomaly detection jobs](/explore-analyze/machine-learning/anomaly-detection/ml-ad-run-jobs.md).
-
-On a highly available deployment, autoscaling events are always applied to instances in each availability zone simultaneously, to ensure consistency.
-
-
-## Notifications [ece-autoscaling-notifications]
-
-In the event that a data tier or machine learning node scales up to its maximum possible size, a notice appears on the deployment overview page prompting you to adjust your autoscaling settings in order to ensure optimal performance.
-
-A warning is also issued in the ECE `service-constructor` logs with the field `labels.autoscaling_notification_type` and a value of `data-tier-at-limit` (for a fully scaled data tier) or `ml-tier-at-limit` (for a fully scaled machine learning node). The warning is indexed in the `logging-and-metrics` deployment, so you can use that event to [configure an email notification](../../explore-analyze/alerts-cases/watcher/actions-email.md).
-
-
-## Restrictions and limitations [ece-autoscaling-restrictions]
-
-The following are known limitations and restrictions with autoscaling:
-
-* Autoscaling will not run if the cluster is unhealthy or if the last Elasticsearch plan failed.
-* In the event that an override is set for the instance size or disk quota multiplier for an instance by means of the [Instance Overrides API](https://www.elastic.co/docs/api/doc/cloud-enterprise/operation/operation-set-all-instances-settings-overrides), autoscaling will be effectively disabled. It’s recommended to avoid adjusting the instance size or disk quota multiplier for an instance that uses autoscaling, since the setting prevents autoscaling.
-
-
-## Enable or disable autoscaling [ece-autoscaling-enable]
-
-To enable or disable autoscaling on a deployment:
-
-1. [Log into the Cloud UI](../deploy/cloud-enterprise/log-into-cloud-ui.md).
-2. On the **Deployments** page, select your deployment.
-
- Narrow the list by name, ID, or choose from several other filters. To further define the list, use a combination of filters.
-
-3. In your deployment menu, select **Edit**.
-4. Select desired autoscaling configuration for this deployment using **Enable Autoscaling for:** dropdown menu.
-5. Select **Confirm** to have the autoscaling change and any other settings take effect. All plan changes are shown on the Deployment **Activity** page.
-
-When autoscaling has been enabled, the autoscaled nodes resize according to the [autoscaling settings](../autoscaling.md#ece-autoscaling-update). Current sizes are shown on the deployment overview page.
-
-When autoscaling has been disabled, you need to adjust the size of data tiers and machine learning nodes manually.
-
-
-## Update your autoscaling settings [ece-autoscaling-update]
-
-Each autoscaling setting is configured with a default value. You can adjust these if necessary, as follows:
-
-1. [Log into the Cloud UI](../deploy/cloud-enterprise/log-into-cloud-ui.md).
-2. On the **Deployments** page, select your deployment.
-
- Narrow the list by name, ID, or choose from several other filters. To further define the list, use a combination of filters.
-
-3. In your deployment menu, select **Edit**.
-4. To update a data tier:
-
- 1. Use the dropdown box to set the **Maximum size per zone** to the largest amount of resources that should be allocated to the data tier automatically. The resources will not scale above this value.
- 2. You can also update the **Current size per zone**. If you update this setting to match the **Maximum size per zone**, the data tier will remain fixed at that size.
- 3. For a hot data tier you can also adjust the **Forecast window**. This is the duration of time, up to the present, for which past storage usage is assessed in order to predict when additional storage is needed.
- 4. Select **Save** to apply the changes to your deployment.
-
-5. To update machine learning nodes:
-
- 1. Use the dropdown box to set the **Minimum size per zone** and **Maximum size per zone** to the smallest and largest amount of resources, respectively, that should be allocated to the nodes automatically. The resources allocated to machine learning will not exceed these values. If you set these two settings to the same value, the machine learning node will remain fixed at that size.
- 2. Select **Save** to apply the changes to your deployment.
-
-
-You can also view our [example](ece-autoscaling-example.md) of how the autoscaling settings work.
-
-::::{note}
-On Elastic Cloud Enterprise, system-owned deployment templates include the default values for all deployment autoscaling settings.
-::::
diff --git a/deploy-manage/autoscaling/ech-autoscaling-example.md b/deploy-manage/autoscaling/ech-autoscaling-example.md
deleted file mode 100644
index 556d0f1a3b..0000000000
--- a/deploy-manage/autoscaling/ech-autoscaling-example.md
+++ /dev/null
@@ -1,58 +0,0 @@
----
-mapped_pages:
- - https://www.elastic.co/guide/en/cloud-heroku/current/ech-autoscaling-example.html
----
-
-# Autoscaling example [ech-autoscaling-example]
-
-To help you better understand the available autoscaling settings, this example describes a typical autoscaling workflow on sample Elasticsearch Add-On for Heroku deployment.
-
-1. Enable autoscaling:
-
- * On an **existing deployment**, open the deployment **Edit** page to find the option to turn on autoscaling.
- * When you create a new deployment, you can find the autoscaling option under **Advanced settings**.
-
- Once you confirm your changes or create a new deployment, autoscaling is activated with system default settings that you can adjust as needed (though for most use cases the default settings will likely suffice).
-
-2. View and adjust autoscaling settings on data tiers:
-
- 1. Open the **Edit** page for your deployment to get the current and maximum size per zone of each Elasticsearch data tier. In this example, the hot data and content tier has the following settings:
-
- | | | |
- | --- | --- | --- |
- | **Current size per zone** | **Maximum size per zone** | |
- | 45GB storage | 1.41TB storage | |
- | 1GB RAM | 32GB RAM | |
- | Up to 2.5 vCPU | 5 vCPU | |
-
- The fault tolerance for the data tier is set to 2 availability zones.
-
- :::{image} ../../images/cloud-heroku-ec-ce-autoscaling-data-summary2.png
- :alt: A screenshot showing sizing information for the autoscaled data tier
- :::
-
- 2. Use the dropdown boxes to adjust the current and/or the maximum size of the data tier. Capacity will be added to the hot content and data tier when required, based on its past and present storage usage, until it reaches the maximum size per zone. Any scaling events are applied simultaneously across availability zones. In this example, the tier has plenty of room to scale relative to its current size, and it will not scale above the maximum size setting. There is no minimum size setting since downward scaling is currently not supported on data tiers.
-
-3. View and adjust autoscaling settings on a machine learning instance:
-
- 1. From the deployment **Edit** page you can check the minimum and maximum size of your deployment’s machine learning instances. In this example, the machine learning instance has the following settings:
-
- | | | |
- | --- | --- | --- |
- | **Minimum size per zone** | **Maximum size per zone** | |
- | 1GB RAM | 64GB RAM | |
- | 0.5 vCPU up to 8 vCPU | 32 vCPU | |
-
- The fault tolerance for the machine learning instance is set to 1 availability zone.
-
- :::{image} ../../images/cloud-heroku-ec-ce-autoscaling-ml-summary2.png
- :alt: A screenshot showing sizing information for the autoscaled machine learning node
- :::
-
- 2. Use the dropdown boxes to adjust the minimum and/or the maximum size of the data tier. Capacity will be added to or removed from the machine learning instances as needed. The need for a scaling event is determined by the expected memory and vCPU requirements for the currently configured machine learning job. Any scaling events are applied simultaneously across availability zones. Note that unlike data tiers, machine learning nodes do not have a **Current size per zone** setting. That setting is not needed since machine learning nodes support both upward and downward scaling.
-
-4. Over time, the volume of data and the size of any machine learning jobs in your deployment are likely to grow. Let’s assume that to meet storage requirements your hot data tier has scaled up to its maximum allowed size of 64GB RAM and 32 vCPU. At this point, a notification appears on the deployment overview page letting you know that the tier has scaled to capacity. You’ll also receive an alert by email.
-5. If you expect a continued increase in either storage, memory, or vCPU requirements, you can use the **Maximum size per zone** dropdown box to adjust the maximum capacity settings for your data tiers and machine learning instances, as appropriate. And, you can always re-adjust these levels downward if the requirements change.
-
-As you can see, autoscaling greatly reduces the manual work involved to manage a deployment. The deployment capacity adjusts automatically as demands change, within the boundaries that you define. Check our main [Deployment autoscaling](../autoscaling.md) page for more information.
-
diff --git a/deploy-manage/autoscaling/ech-autoscaling.md b/deploy-manage/autoscaling/ech-autoscaling.md
deleted file mode 100644
index eeef25ab3c..0000000000
--- a/deploy-manage/autoscaling/ech-autoscaling.md
+++ /dev/null
@@ -1,123 +0,0 @@
----
-mapped_pages:
- - https://www.elastic.co/guide/en/cloud-heroku/current/ech-autoscaling.html
----
-
-# Deployment autoscaling [ech-autoscaling]
-
-Autoscaling helps you to more easily manage your deployments by adjusting their available resources automatically, and currently supports scaling for both data and machine learning nodes, or machine learning nodes only. Check the following sections to learn more:
-
-* [Overview](../autoscaling.md#ech-autoscaling-intro)
-* [When does autoscaling occur?](../autoscaling.md#ech-autoscaling-factors)
-* [Notifications](../autoscaling.md#ech-autoscaling-notifications)
-* [Restrictions and limitations](../autoscaling.md#ech-autoscaling-restrictions)
-* [Enable or disable autoscaling](../autoscaling.md#ech-autoscaling-enable)
-* [Update your autoscaling settings](../autoscaling.md#ech-autoscaling-update)
-
-You can also have a look at our [autoscaling example](ech-autoscaling-example.md).
-
-
-## Overview [ech-autoscaling-intro]
-
-When you first create a deployment it can be challenging to determine the amount of storage your data nodes will require. The same is relevant for the amount of memory and CPU that you want to allocate to your machine learning nodes. It can become even more challenging to predict these requirements for weeks or months into the future. In an ideal scenario, these resources should be sized to both ensure efficient performance and resiliency, and to avoid excess costs. Autoscaling can help with this balance by adjusting the resources available to a deployment automatically as loads change over time, reducing the need for monitoring and manual intervention.
-
-::::{note}
-Autoscaling is enabled for the Machine Learning tier by default for new deployments.
-::::
-
-
-Currently, autoscaling behavior is as follows:
-
-* **Data tiers**
-
- * Each Elasticsearch [data tier](../../manage-data/lifecycle/data-tiers.md) scales upward based on the amount of available storage. When we detect more storage is needed, autoscaling will scale up each data tier independently to ensure you can continue and ingest more data to your hot and content tier, or move data to the warm, cold, or frozen data tiers.
- * In addition to scaling up existing data tiers, a new data tier will be automatically added when necessary, based on your index lifecycle management policies.
- * To control the maximum size of each data tier and ensure it will not scale above a certain size, you can use the maximum size per zone field.
- * Autoscaling based on memory or CPU, as well as autoscaling downward, is not currently supported. In case you want to adjust the size of your data tier to add more memory or CPU, or in case you deleted data and want to scale it down, you can set the current size per zone of each data tier manually.
-
-* **Machine learning nodes**
-
- * Machine learning nodes can scale upward and downward based on the configured machine learning jobs.
- * When a machine learning job is opened, or a machine learning trained model is deployed, if there are no machine learning nodes in your deployment, the autoscaling mechanism will automatically add machine learning nodes. Similarly, after a period of no active machine learning jobs, any enabled machine learning nodes are disabled automatically.
- * To control the maximum size of your machine learning nodes and ensure they will not scale above a certain size, you can use the maximum size per zone field.
- * To control the minimum size of your machine learning nodes and ensure the autoscaling mechanism will not scale machine learning below a certain size, you can use the minimum size per zone field.
- * The determination of when to scale is based on the expected memory and CPU requirements for the currently configured machine learning jobs and trained models.
-
-
-::::{note}
-For any Elasticsearch Add-On for Heroku Elasticsearch component the number of availability zones is not affected by autoscaling. You can always set the number of availability zones manually and the autoscaling mechanism will add or remove capacity per availability zone.
-::::
-
-
-
-## When does autoscaling occur? [ech-autoscaling-factors]
-
-Several factors determine when data tiers or machine learning nodes are scaled.
-
-For a data tier, an autoscaling event can be triggered in the following cases:
-
-* Based on an assessment of how shards are currently allocated, and the amount of storage and buffer space currently available.
-
-When past behavior on a hot tier indicates that the influx of data can increase significantly in the near future. Refer to [Reactive storage decider](autoscaling-deciders.md) and [Proactive storage decider](autoscaling-deciders.md) for more detail.
-
-* Through ILM policies. For example, if a deployment has only hot nodes and autoscaling is enabled, it automatically creates warm or cold nodes, if an ILM policy is trying to move data from hot to warm or cold nodes.
-
-On machine learning nodes, scaling is determined by an estimate of the memory and CPU requirements for the currently configured jobs and trained models. When a new machine learning job tries to start, it looks for a node with adequate native memory and CPU capacity. If one cannot be found, it stays in an `opening` state. If this waiting job exceeds the queueing limit set in the machine learning decider, a scale up is requested. Conversely, as machine learning jobs run, their memory and CPU usage might decrease or other running jobs might finish or close. In this case, if the duration of decreased resource usage exceeds the set value for `down_scale_delay`, a scale down is requested. Check [Machine learning decider](autoscaling-deciders.md) for more detail. To learn more about machine learning jobs in general, check [Create anomaly detection jobs](/explore-analyze/machine-learning/anomaly-detection/ml-ad-run-jobs.md).
-
-On a highly available deployment, autoscaling events are always applied to instances in each availability zone simultaneously, to ensure consistency.
-
-
-## Notifications [ech-autoscaling-notifications]
-
-In the event that a data tier or machine learning node scales up to its maximum possible size, you’ll receive an email, and a notice also appears on the deployment overview page prompting you to adjust your autoscaling settings to ensure optimal performance.
-
-
-## Restrictions and limitations [ech-autoscaling-restrictions]
-
-The following are known limitations and restrictions with autoscaling:
-
-* Autoscaling will not run if the cluster is unhealthy or if the last Elasticsearch plan failed.
-
-
-## Enable or disable autoscaling [ech-autoscaling-enable]
-
-To enable or disable autoscaling on a deployment:
-
-1. Log in to the [Elasticsearch Add-On for Heroku console](https://cloud.elastic.co?page=docs&placement=docs-body).
-2. On the **Deployments** page, select your deployment.
-
- Narrow your deployments by name, ID, or choose from several other filters. To customize your view, use a combination of filters, or change the format from a grid to a list.
-
-3. In your deployment menu, select **Edit**.
-4. Select desired autoscaling configuration for this deployment using **Enable Autoscaling for:** dropdown menu.
-5. Select **Confirm** to have the autoscaling change and any other settings take effect. All plan changes are shown on the Deployment **Activity** page.
-
-When autoscaling has been enabled, the autoscaled nodes resize according to the [autoscaling settings](../autoscaling.md#ech-autoscaling-update). Current sizes are shown on the deployment overview page.
-
-When autoscaling has been disabled, you need to adjust the size of data tiers and machine learning nodes manually.
-
-
-## Update your autoscaling settings [ech-autoscaling-update]
-
-Each autoscaling setting is configured with a default value. You can adjust these if necessary, as follows:
-
-1. Log in to the [Elasticsearch Add-On for Heroku console](https://cloud.elastic.co?page=docs&placement=docs-body).
-2. On the **Deployments** page, select your deployment.
-
- Narrow your deployments by name, ID, or choose from several other filters. To customize your view, use a combination of filters, or change the format from a grid to a list.
-
-3. In your deployment menu, select **Edit**.
-4. To update a data tier:
-
- 1. Use the dropdown box to set the **Maximum size per zone** to the largest amount of resources that should be allocated to the data tier automatically. The resources will not scale above this value.
- 2. You can also update the **Current size per zone**. If you update this setting to match the **Maximum size per zone**, the data tier will remain fixed at that size.
- 3. For a hot data tier you can also adjust the **Forecast window**. This is the duration of time, up to the present, for which past storage usage is assessed in order to predict when additional storage is needed.
- 4. Select **Save** to apply the changes to your deployment.
-
-5. To update machine learning nodes:
-
- 1. Use the dropdown box to set the **Minimum size per zone** and **Maximum size per zone** to the smallest and largest amount of resources, respectively, that should be allocated to the nodes automatically. The resources allocated to machine learning will not exceed these values. If you set these two settings to the same value, the machine learning node will remain fixed at that size.
- 2. Select **Save** to apply the changes to your deployment.
-
-
-You can also view our [example](ech-autoscaling-example.md) of how the autoscaling settings work.
diff --git a/deploy-manage/autoscaling/trained-model-autoscaling.md b/deploy-manage/autoscaling/trained-model-autoscaling.md
index ce46bd974f..3871aef7a6 100644
--- a/deploy-manage/autoscaling/trained-model-autoscaling.md
+++ b/deploy-manage/autoscaling/trained-model-autoscaling.md
@@ -1,29 +1,215 @@
---
mapped_urls:
- https://www.elastic.co/guide/en/serverless/current/general-ml-nlp-auto-scale.html
- - https://www.elastic.co/guide/en/serverless/current/general-ml-nlp-auto-scale.html
+ - https://www.elastic.co/guide/en/machine-learning/current/ml-nlp-auto-scale.html
+applies_to:
+ deployment:
+ ess:
+ eck:
+ ece:
+ serverless:
---
# Trained model autoscaling
-% What needs to be done: Align serverless/stateful
+You can enable autoscaling for each of your trained model deployments. Autoscaling allows {{es}} to automatically adjust the resources the model deployment can use based on the workload demand.
+
+There are two ways to enable autoscaling:
+
+* through APIs by enabling adaptive allocations
+* in {{kib}} by enabling adaptive resources
+
+::::{important}
+To fully leverage model autoscaling in {{ech}}, {{ece}}, and {{eck}}, it is highly recommended to enable [{{es}} deployment autoscaling](../../deploy-manage/autoscaling.md).
+::::
+
+Trained model autoscaling is available for {{serverless-short}}, {{ech}}, {{ece}}, and {{eck}} deployments. In serverless deployments, processing power is managed differently across Search, Observability, and Security projects, which impacts their costs and resource limits.
+
+:::{admonition} Trained model auto-scaling for self-managed deployments
+The available resources of self-managed deployments are static, so trained model autoscaling is not applicable. However, available resources are still segmented based on the settings described in this section.
+:::
+
+{{serverless-full}} Security and Observability projects are only charged for data ingestion and retention. They are not charged for processing power (VCU usage), which is used for more complex operations, like running advanced search models. For example, in Search projects, models such as ELSER require significant processing power to provide more accurate search results.
+
+## Enabling autoscaling through APIs - adaptive allocations [enabling-autoscaling-through-apis-adaptive-allocations]
+$$$nlp-model-adaptive-resources$$$
+
+Model allocations are independent units of work for NLP tasks. If you set the numbers of threads and allocations for a model manually, they remain constant even when not all the available resources are fully used or when the load on the model requires more resources. Instead of setting the number of allocations manually, you can enable adaptive allocations to set the number of allocations based on the load on the process. This can help you to manage performance and cost more easily. (Refer to the [pricing calculator](https://cloud.elastic.co/pricing) to learn more about the possible costs.)
+
+When adaptive allocations are enabled, the number of allocations of the model is set automatically based on the current load. When the load is high, a new model allocation is automatically created. When the load is low, a model allocation is automatically removed. You can explicitely set the minimum and maximum number of allocations; autoscaling will occur within these limits.
+
+::::{note}
+If you set the minimum number of allocations to 1, you will be charged even if the system is not using those resources.
+
+::::
+
+You can enable adaptive allocations by using:
+
+* the create inference endpoint API for [ELSER](../../explore-analyze/elastic-inference/inference-api/elasticsearch-inference-integration.md), [E5 and models uploaded through Eland](../../explore-analyze/elastic-inference/inference-api/elasticsearch-inference-integration.md) that are used as inference services.
+* the [start trained model deployment](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-ml-start-trained-model-deployment) or [update trained model deployment](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-ml-update-trained-model-deployment) APIs for trained models that are deployed on {{ml}} nodes.
+
+If the new allocations fit on the current {{ml}} nodes, they are immediately started. If more resource capacity is needed for creating new model allocations, then your {{ml}} node will be scaled up if {{ml}} autoscaling is enabled to provide enough resources for the new allocation. The number of model allocations can be scaled down to 0. They cannot be scaled up to more than 32 allocations, unless you explicitly set the maximum number of allocations to more. Adaptive allocations must be set up independently for each deployment and [{{infer}} endpoint](https://www.elastic.co/docs/api/doc/elasticsearch/group/endpoint-inference).
+
+:::{note}
+When you create inference endpoints on {{serverless-short}} using Kibana, adaptive allocations are automatically turned on, and there is no option to disable them.
+:::
+
+### Optimizing for typical use cases [optimizing-for-typical-use-cases]
+You can optimize your model deployment for typical use cases, such as search and ingest. When you optimize for ingest, the throughput will be higher, which increases the number of {{infer}} requests that can be performed in parallel. When you optimize for search, the latency will be lower during search processes.
+
+* If you want to optimize for ingest, set the number of threads to `1` (`"threads_per_allocation": 1`).
+* If you want to optimize for search, set the number of threads to greater than `1`. Increasing the number of threads will make the search processes more performant.
+
+## Enabling autoscaling in {{kib}} - adaptive resources [enabling-autoscaling-in-kibana-adaptive-resources]
+
+You can enable adaptive resources for your models when starting or updating the model deployment. Adaptive resources make it possible for {{es}} to scale up or down the available resources based on the load on the process. This can help you to manage performance and cost more easily. When adaptive resources are enabled, the number of vCPUs that the model deployment uses is set automatically based on the current load. When the load is high, the number of vCPUs that the process can use is automatically increased. When the load is low, the number of vCPUs that the process can use is automatically decreased.
+
+You can choose from three levels of resource usage for your trained model deployment; autoscaling will occur within the selected level’s range.
+
+Refer to the tables in the [Model deployment resource matrix](#model-deployment-resource-matrix) section to find out the setings for the level you selected.
+
+:::{image} ../../images/machine-learning-ml-nlp-deployment-id-elser-v2.png
+:alt: ELSER deployment with adaptive resources enabled.
+:class: screenshot
+:width: 500px
+:::
+
+In {{serverless-full}}, Search projects are given access to more processing resources, while Security and Observability projects have lower limits. This difference is reflected in the UI configuration: Search projects have higher resource limits compared to Security and Observability projects to accommodate their more complex operations.
+
+On {{serverless-short}}, adaptive allocations are automatically enabled for all project types. However, the "Adaptive resources" control is not displayed in Kibana for Observability and Security projects.
+
+## Model deployment resource matrix [model-deployment-resource-matrix]
+
+The used resources for trained model deployments depend on three factors:
+
+* your cluster environment ({{serverless-short}}, Cloud (ECE, ECK, ECH), or self-managed)
+* the use case you optimize the model deployment for (ingest or search)
+* whether model autoscaling is enabled with adaptive allocations/resources to have dynamic resources, or disabled for static resources
+
+If you use a self-managed cluster or ECK, vCPUs level ranges are derived from the `total_ml_processors` and `max_single_ml_node_processors` values. Use the [get {{ml}} info API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-ml-info) to check these values.
+
+The following tables show you the number of allocations, threads, and vCPUs available in ECE and ECH when adaptive resources are enabled or disabled.
+
+::::{note}
+On {{serverless-short}}, adaptive allocations are automatically enabled for all project types. However, the "Adaptive resources" control is not displayed in {{kib}} for Observability and Security projects.
+::::
+
+### Ingest optimized
+
+In case of ingest-optimized deployments, we maximize the number of model allocations.
+
+#### Adaptive resources enabled
+
+::::{tab-set}
+
+:::{tab-item} ECH, ECE
+
+| Level | Allocations | Threads | vCPUs |
+| --- | --- | --- | --- |
+| Low | 0 to 2 if available, dynamically | 1 | 0 to 2 if available, dynamically |
+| Medium | 1 to 32 dynamically | 1 | 1 to the smaller of 32 or the limit set in the Cloud console, dynamically |
+| High | 1 to limit set in the Cloud console *, dynamically | 1 | 1 to limit set in the Cloud console, dynamically |
+
+\* The Cloud console doesn’t directly set an allocations limit; it only sets a vCPU limit. This vCPU limit indirectly determines the number of allocations, calculated as the vCPU limit divided by the number of threads.
+
+:::
+
+:::{tab-item} {{serverless-short}}
+
+| Level | Allocations | Threads | VCUs |
+| --- | --- | --- | --- |
+| Low | 0 to 2 dynamically | 1 | 0 to 16 dynamically |
+| Medium | 1 to 32 dynamically | 1 | 8 to 256 dynamically |
+| High | 1 to 512 for Search
1 to 128 for Security and Observability
| 1 | 8 to 4096 for Search
8 to 1024 for Security and Observability
|
+
+:::
+
+::::
+
+#### Adaptive resources disabled
+
+::::{tab-set}
+
+:::{tab-item} ECH, ECE
+
+| Level | Allocations | Threads | vCPUs |
+| --- | --- | --- | --- |
+| Low | 2 if available, otherwise 1, statically | 1 | 2 if available |
+| Medium | the smaller of 32 or the limit set in the Cloud console, statically | 1 | 32 if available |
+| High | Maximum available set in the Cloud console *, statically | 1 | Maximum available set in the Cloud console, statically |
+
+\* The Cloud console doesn’t directly set an allocations limit; it only sets a vCPU limit. This vCPU limit indirectly determines the number of allocations, calculated as the vCPU limit divided by the number of threads.
+
+:::
+
+:::{tab-item} {{serverless-short}}
+
+| Level | Allocations | Threads | VCUs |
+| --- | --- | --- | --- |
+| Low | Exactly 2 | 1 | 16 |
+| Medium | Exactly 32 | 1 | 256 |
+| High | 512 for Search
No static allocations for Security and Observability
| 1 | 4096 for Search
No static allocations for Security and Observability
|
+
+:::
+
+::::
+
+### Search optimized
+
+In case of search-optimized deployments, we maximize the number of threads. The maximum number of threads that can be claimed depends on the hardware your architecture has.
+
+#### Adaptive resources enabled
+
+::::{tab-set}
+
+:::{tab-item} ECH, ECE
+
+| Level | Allocations | Threads | vCPUs |
+| --- | --- | --- | --- |
+| Low | 1 | 2 | 2 |
+| Medium | 1 to 2 (if threads=16) dynamically | maximum that the hardware allows (for example, 16) | 1 to 32 dynamically |
+| High | 1 to limit set in the Cloud console *, dynamically | maximum that the hardware allows (for example, 16) | 1 to limit set in the Cloud console, dynamically |
+
+\* The Cloud console doesn’t directly set an allocations limit; it only sets a vCPU limit. This vCPU limit indirectly determines the number of allocations, calculated as the vCPU limit divided by the number of threads.
+
+:::
+
+:::{tab-item} {{serverless-short}}
+
+| Level | Allocations | Threads | VCUs |
+| --- | --- | --- | --- |
+| Low | 0 to 1 dynamically | Always 2 | 0 to 16 dynamically |
+| Medium | 1 to 2 (if threads=16), dynamically | Maximum (for example, 16) | 8 to 256 dynamically |
+| High | 1 to 32 (if threads=16), dynamically
1 to 128 for Security and Observability
| Maximum (for example, 16) | 8 to 4096 for Search
8 to 1024 for Security and Observability
|
+
+:::
+
+::::
+
+#### Adaptive resources disabled
-% GitHub issue: https://github.com/elastic/docs-projects/issues/344
+::::{tab-set}
-% Scope notes: Serverless and stateful pages are very similar, might need to merge them together or create subpages
+:::{tab-item} ECH, ECE
-% Use migrated content from existing pages that map to this page:
+| Level | Allocations | Threads | vCPUs |
+| --- | --- | --- | --- |
+| Low | 1 if available, statically | 2 | 2 if available |
+| Medium | 2 (if threads=16) statically | maximum that the hardware allows (for example, 16) | 32 if available |
+| High | Maximum available set in the Cloud console *, statically | maximum that the hardware allows (for example, 16) | Maximum available set in the Cloud console, statically |
-% - [ ] ./raw-migrated-files/docs-content/serverless/general-ml-nlp-auto-scale.md
-% - [ ] ./raw-migrated-files/docs-content/serverless/general-ml-nlp-auto-scale.md
+\* The Cloud console doesn’t directly set an allocations limit; it only sets a vCPU limit. This vCPU limit indirectly determines the number of allocations, calculated as the vCPU limit divided by the number of threads.
-% Internal links rely on the following IDs being on this page (e.g. as a heading ID, paragraph ID, etc):
+:::
-$$$enabling-autoscaling-in-kibana-adaptive-resources$$$
+:::{tab-item} {{serverless-short}}
-$$$enabling-autoscaling-through-apis-adaptive-allocations$$$
+| Level | Allocations | Threads | VCUs |
+| --- | --- | --- | --- |
+| Low | 1 statically | Always 2 | 16 |
+| Medium | 2 statically (if threads=16) | Maximum (for example, 16) | 256 |
+| High | 32 statically (if threads=16) for Search
No static allocations for Security and Observability
| Maximum (for example, 16) | 4096 for Search
No static allocations for Security and Observability
|
-**This page is a work in progress.** The documentation team is working to combine content pulled from the following pages:
+:::
-* [/raw-migrated-files/docs-content/serverless/general-ml-nlp-auto-scale.md](/raw-migrated-files/docs-content/serverless/general-ml-nlp-auto-scale.md)
-* [/raw-migrated-files/docs-content/serverless/general-ml-nlp-auto-scale.md](/raw-migrated-files/docs-content/serverless/general-ml-nlp-auto-scale.md)
\ No newline at end of file
+::::
diff --git a/deploy-manage/deploy/cloud-enterprise/ce-add-support-for-node-roles-autoscaling.md b/deploy-manage/deploy/cloud-enterprise/ce-add-support-for-node-roles-autoscaling.md
index bdb668f1a9..b6ac2f6004 100644
--- a/deploy-manage/deploy/cloud-enterprise/ce-add-support-for-node-roles-autoscaling.md
+++ b/deploy-manage/deploy/cloud-enterprise/ce-add-support-for-node-roles-autoscaling.md
@@ -1084,7 +1084,7 @@ Similar to the `node_roles` example, the following one is also based on the `def
To add support for autoscaling, the deployment template has to meet the following requirements:
1. Already has support for `node_roles`.
-2. Contains the `size`, `autoscaling_min`, and `autoscaling_max` fields, according to the rules specified in the [autoscaling requirements table](../../autoscaling/ece-autoscaling-api-example.md#ece-autoscaling-api-example-requirements-table).
+2. Contains the `size`, `autoscaling_min`, and `autoscaling_max` fields, according to the rules specified in the [autoscaling requirements table](../../autoscaling/autoscaling-in-ece-and-ech.md#ece-autoscaling-api-example-requirements-table).
3. Contains the `autoscaling_enabled` fields on the `elasticsearch` resource.
If necessary, the values chosen for each field can be based on the reference example.
@@ -1094,7 +1094,7 @@ If necessary, the values chosen for each field can be based on the reference exa
To update a custom deployment template:
-1. Add the `autoscaling_min` and `autoscaling_max` fields to the Elasticsearch topology elements (check [Autoscaling through the API](../../autoscaling/ece-autoscaling-api-example.md)).
+1. Add the `autoscaling_min` and `autoscaling_max` fields to the Elasticsearch topology elements (check [Autoscaling through the API](../../autoscaling/autoscaling-in-ece-and-ech.md#ec-autoscaling-api-example)).
2. Add the `autoscaling_enabled` fields to the `elasticsearch` resource. Set this field to `true` in case you want autoscaling enabled by default, and to `false` otherwise.
diff --git a/deploy-manage/deploy/cloud-enterprise/ece-configuring-ece-create-templates.md b/deploy-manage/deploy/cloud-enterprise/ece-configuring-ece-create-templates.md
index 7c60a6059f..f1ff134bf5 100644
--- a/deploy-manage/deploy/cloud-enterprise/ece-configuring-ece-create-templates.md
+++ b/deploy-manage/deploy/cloud-enterprise/ece-configuring-ece-create-templates.md
@@ -55,7 +55,7 @@ Before you start creating your own deployment templates, you should have: [tagge
* For data nodes, autoscaling up is supported based on the amount of available storage. You can set the default initial size of the node and the default maximum size that the node can be autoscaled up to.
* For machine learning nodes, autoscaling is supported based on the expected memory requirements for machine learning jobs. You can set the default minimum size that the node can be scaled down to and the default maximum size that the node can be scaled up to. If autoscaling is not enabled for the deployment, the "minimum" value will instead be the default initial size of the machine learning node.
- The default values provided by the deployment template can be adjusted at any time. Check our [Autoscaling example](../../autoscaling/ece-autoscaling-example.md) for details about these settings. Nodes and components that currently support autoscaling are indicated by a `supports autoscaling` badge on the **Configure instances** page.
+ The default values provided by the deployment template can be adjusted at any time. Check our [Autoscaling example](../../autoscaling/autoscaling-in-ece-and-ech.md#ec-autoscaling-example) for details about these settings. Nodes and components that currently support autoscaling are indicated by a `supports autoscaling` badge on the **Configure instances** page.
* Add [fault tolerance](ece-ha.md) (high availability) by using more than one availability zone.
diff --git a/deploy-manage/deploy/cloud-on-k8s/elasticsearch-configuration.md b/deploy-manage/deploy/cloud-on-k8s/elasticsearch-configuration.md
index 0ba4474c62..b06359d6ba 100644
--- a/deploy-manage/deploy/cloud-on-k8s/elasticsearch-configuration.md
+++ b/deploy-manage/deploy/cloud-on-k8s/elasticsearch-configuration.md
@@ -56,7 +56,7 @@ Other sections of the documentation also include relevant configuration options
* [Remote clusters](/deploy-manage/remote-clusters/eck-remote-clusters.md)
-* [Autoscaling](../../autoscaling/deployments-autoscaling-on-eck.md)
+* [Autoscaling](../../autoscaling/autoscaling-in-eck.md#k8s-autoscaling)
* [Stack monitoring](/deploy-manage/monitor/stack-monitoring/eck-stack-monitoring.md): Monitor your {{es}} cluster smoothly with the help of ECK.
diff --git a/deploy-manage/deploy/cloud-on-k8s/kibana-configuration.md b/deploy-manage/deploy/cloud-on-k8s/kibana-configuration.md
index 569de652b2..3bab8364b1 100644
--- a/deploy-manage/deploy/cloud-on-k8s/kibana-configuration.md
+++ b/deploy-manage/deploy/cloud-on-k8s/kibana-configuration.md
@@ -29,6 +29,6 @@ The following sections describe how to customize a {{kib}} deployment to suit yo
* [Disable TLS](k8s-kibana-http-configuration.md#k8s-kibana-http-disable-tls)
* [Install {{kib}} plugins](k8s-kibana-plugins.md)
-* [Autoscaling stateless applications](../../autoscaling/autoscaling-stateless-applications-on-eck.md): Use [Horizontal Pod Autoscaler](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/) for {{kib}} or other stateless applications.
+* [Autoscaling stateless applications](../../autoscaling/autoscaling-in-eck.md#k8s-stateless-autoscaling): Use [Horizontal Pod Autoscaler](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/) for {{kib}} or other stateless applications.
diff --git a/deploy-manage/deploy/cloud-on-k8s/orchestrate-other-elastic-applications.md b/deploy-manage/deploy/cloud-on-k8s/orchestrate-other-elastic-applications.md
index 2376f6ff08..4ac0a66688 100644
--- a/deploy-manage/deploy/cloud-on-k8s/orchestrate-other-elastic-applications.md
+++ b/deploy-manage/deploy/cloud-on-k8s/orchestrate-other-elastic-applications.md
@@ -22,7 +22,7 @@ When orchestrating any of these applications, also consider the following topics
* [Access Elastic Stack services](accessing-services.md)
* [Customize Pods](customize-pods.md)
* [Manage compute resources](manage-compute-resources.md)
-* [Autoscaling stateless applications](../../autoscaling/autoscaling-stateless-applications-on-eck.md)
+* [Autoscaling stateless applications](../../autoscaling/autoscaling-in-eck.md#k8s-stateless-autoscaling)
* [Elastic Stack configuration policies](elastic-stack-configuration-policies.md)
* [Upgrade the Elastic Stack version](../../upgrade/deployment-or-cluster.md)
* [Connect to external Elastic resources](connect-to-external-elastic-resources.md)
\ No newline at end of file
diff --git a/deploy-manage/toc.yml b/deploy-manage/toc.yml
index 809b55934c..28b1c78f55 100644
--- a/deploy-manage/toc.yml
+++ b/deploy-manage/toc.yml
@@ -469,19 +469,8 @@ toc:
- file: tools/cross-cluster-replication/_perform_update_or_delete_by_query.md
- file: autoscaling.md
children:
- - file: autoscaling/ech-autoscaling.md
- children:
- - file: autoscaling/ech-autoscaling-example.md
- - file: autoscaling/ec-autoscaling.md
- children:
- - file: autoscaling/ec-autoscaling-example.md
- - file: autoscaling/ec-autoscaling-api-example.md
- - file: autoscaling/ece-autoscaling.md
- children:
- - file: autoscaling/ece-autoscaling-example.md
- - file: autoscaling/ece-autoscaling-api-example.md
- - file: autoscaling/autoscaling-stateless-applications-on-eck.md
- - file: autoscaling/deployments-autoscaling-on-eck.md
+ - file: autoscaling/autoscaling-in-ece-and-ech.md
+ - file: autoscaling/autoscaling-in-eck.md
- file: autoscaling/autoscaling-deciders.md
- file: autoscaling/trained-model-autoscaling.md
- file: remote-clusters.md
diff --git a/explore-analyze/elastic-inference/inference-api/elasticsearch-inference-integration.md b/explore-analyze/elastic-inference/inference-api/elasticsearch-inference-integration.md
index 7feb809b32..f899600fed 100644
--- a/explore-analyze/elastic-inference/inference-api/elasticsearch-inference-integration.md
+++ b/explore-analyze/elastic-inference/inference-api/elasticsearch-inference-integration.md
@@ -163,7 +163,7 @@ PUT _inference/rerank/my-elastic-rerank
```
1. The `model_id` must be the ID of the built-in Elastic Rerank model: `.rerank-v1`.
-2. [Adaptive allocations](../../../explore-analyze/machine-learning/nlp/ml-nlp-auto-scale.md#nlp-model-adaptive-allocations) will be enabled with the minimum of 1 and the maximum of 10 allocations.
+2. [Adaptive allocations](../../../deploy-manage/autoscaling/trained-model-autoscaling.md#enabling-autoscaling-through-apis-adaptive-allocations) will be enabled with the minimum of 1 and the maximum of 10 allocations.
diff --git a/explore-analyze/elastic-inference/inference-api/elser-inference-integration.md b/explore-analyze/elastic-inference/inference-api/elser-inference-integration.md
index 5e2a828f91..dac64a6b4d 100644
--- a/explore-analyze/elastic-inference/inference-api/elser-inference-integration.md
+++ b/explore-analyze/elastic-inference/inference-api/elser-inference-integration.md
@@ -102,7 +102,7 @@ The `elser` service is deprecated and will be removed in a future release. Use t
When adaptive allocations are enabled, the number of allocations of the model is set automatically based on the current load.
::::{note}
-For more information on how to optimize your ELSER endpoints, refer to [the ELSER recommendations](../../../explore-analyze/machine-learning/nlp/ml-nlp-elser.md#elser-recommendations) section in the model documentation. To learn more about model autoscaling, refer to the [trained model autoscaling](../../../explore-analyze/machine-learning/nlp/ml-nlp-auto-scale.md) page.
+For more information on how to optimize your ELSER endpoints, refer to [the ELSER recommendations](../../../explore-analyze/machine-learning/nlp/ml-nlp-elser.md#elser-recommendations) section in the model documentation. To learn more about model autoscaling, refer to the [trained model autoscaling](../../../deploy-manage/autoscaling/trained-model-autoscaling.md) page.
::::
diff --git a/explore-analyze/machine-learning/nlp.md b/explore-analyze/machine-learning/nlp.md
index db8ef09785..ac11d70812 100644
--- a/explore-analyze/machine-learning/nlp.md
+++ b/explore-analyze/machine-learning/nlp.md
@@ -12,7 +12,6 @@ You can use {{stack-ml-features}} to analyze natural language data and make pred
* [Overview](nlp/ml-nlp-overview.md)
* [Deploy trained models](nlp/ml-nlp-deploy-models.md)
-* [Trained model autoscaling](nlp/ml-nlp-auto-scale.md)
* [Add NLP {{infer}} to ingest pipelines](nlp/ml-nlp-inference.md)
* [API quick reference](nlp/ml-nlp-apis.md)
* [ELSER](nlp/ml-nlp-elser.md)
diff --git a/explore-analyze/machine-learning/nlp/ml-nlp-auto-scale.md b/explore-analyze/machine-learning/nlp/ml-nlp-auto-scale.md
deleted file mode 100644
index 1f13c45567..0000000000
--- a/explore-analyze/machine-learning/nlp/ml-nlp-auto-scale.md
+++ /dev/null
@@ -1,115 +0,0 @@
----
-applies_to:
- stack: ga
- serverless: ga
-mapped_pages:
- - https://www.elastic.co/guide/en/machine-learning/current/ml-nlp-auto-scale.html
----
-
-# Trained model autoscaling [ml-nlp-auto-scale]
-
-You can enable autoscaling for each of your trained model deployments. Autoscaling allows {{es}} to automatically adjust the resources the model deployment can use based on the workload demand.
-
-There are two ways to enable autoscaling:
-
-* through APIs by enabling adaptive allocations
-* in {{kib}} by enabling adaptive resources
-
-::::{important}
-To fully leverage model autoscaling, it is highly recommended to enable [{{es}} deployment autoscaling](../../../deploy-manage/autoscaling.md).
-::::
-
-## Enabling autoscaling through APIs - adaptive allocations [nlp-model-adaptive-allocations]
-
-Model allocations are independent units of work for NLP tasks. If you set the numbers of threads and allocations for a model manually, they remain constant even when not all the available resources are fully used or when the load on the model requires more resources. Instead of setting the number of allocations manually, you can enable adaptive allocations to set the number of allocations based on the load on the process. This can help you to manage performance and cost more easily. (Refer to the [pricing calculator](https://cloud.elastic.co/pricing) to learn more about the possible costs.)
-
-When adaptive allocations are enabled, the number of allocations of the model is set automatically based on the current load. When the load is high, a new model allocation is automatically created. When the load is low, a model allocation is automatically removed. You can explicitely set the minimum and maximum number of allocations; autoscaling will occur within these limits.
-
-You can enable adaptive allocations by using:
-
-* the create inference endpoint API for [ELSER](../../elastic-inference/inference-api/elser-inference-integration.md), [E5 and models uploaded through Eland](../../elastic-inference/inference-api/elasticsearch-inference-integration.md) that are used as {{infer}} services.
-* the [start trained model deployment](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-ml-start-trained-model-deployment) or [update trained model deployment](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-ml-update-trained-model-deployment) APIs for trained models that are deployed on {{ml}} nodes.
-
-If the new allocations fit on the current {{ml}} nodes, they are immediately started. If more resource capacity is needed for creating new model allocations, then your {{ml}} node will be scaled up if {{ml}} autoscaling is enabled to provide enough resources for the new allocation. The number of model allocations can be scaled down to 0. They cannot be scaled up to more than 32 allocations, unless you explicitly set the maximum number of allocations to more. Adaptive allocations must be set up independently for each deployment and [{{infer}} endpoint](https://www.elastic.co/docs/api/doc/elasticsearch/group/endpoint-inference).
-
-### Optimizing for typical use cases [optimize-use-case]
-
-You can optimize your model deployment for typical use cases, such as search and ingest. When you optimize for ingest, the throughput will be higher, which increases the number of {{infer}} requests that can be performed in parallel. When you optimize for search, the latency will be lower during search processes.
-
-* If you want to optimize for ingest, set the number of threads to `1` (`"threads_per_allocation": 1`).
-* If you want to optimize for search, set the number of threads to greater than `1`. Increasing the number of threads will make the search processes more performant.
-
-## Enabling autoscaling in {{kib}} - adaptive resources [nlp-model-adaptive-resources]
-
-You can enable adaptive resources for your models when starting or updating the model deployment. Adaptive resources make it possible for {{es}} to scale up or down the available resources based on the load on the process. This can help you to manage performance and cost more easily. When adaptive resources are enabled, the number of vCPUs that the model deployment uses is set automatically based on the current load. When the load is high, the number of vCPUs that the process can use is automatically increased. When the load is low, the number of vCPUs that the process can use is automatically decreased.
-
-You can choose from three levels of resource usage for your trained model deployment; autoscaling will occur within the selected level’s range.
-
-Refer to the tables in the [Model deployment resource matrix](#auto-scaling-matrix) section to find out the setings for the level you selected.
-
-:::{image} ../../../images/machine-learning-ml-nlp-deployment-id-elser-v2.png
-:alt: ELSER deployment with adaptive resources enabled.
-:class: screenshot
-:::
-
-## Model deployment resource matrix [auto-scaling-matrix]
-
-The used resources for trained model deployments depend on three factors:
-
-* your cluster environment (Serverless, Cloud, or on-premises)
-* the use case you optimize the model deployment for (ingest or search)
-* whether model autoscaling is enabled with adaptive allocations/resources to have dynamic resources, or disabled for static resources
-
-If you use {{es}} on-premises, vCPUs level ranges are derived from the `total_ml_processors` and `max_single_ml_node_processors` values. Use the [get {{ml}} info API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-ml-info) to check these values. The following tables show you the number of allocations, threads, and vCPUs available in Cloud when adaptive resources are enabled or disabled.
-
-::::{note}
-On Serverless, adaptive allocations are automatically enabled for all project types. However, the "Adaptive resources" control is not displayed in {{kib}} for Observability and Security projects.
-::::
-
-### Deployments in Cloud optimized for ingest [_deployments_in_cloud_optimized_for_ingest]
-
-In case of ingest-optimized deployments, we maximize the number of model allocations.
-
-#### Adaptive resources enabled [_adaptive_resources_enabled]
-
-| Level | Allocations | Threads | vCPUs |
-| --- | --- | --- | --- |
-| Low | 0 to 2 if available, dynamically | 1 | 0 to 2 if available, dynamically |
-| Medium | 1 to 32 dynamically | 1 | 1 to the smaller of 32 or the limit set in the Cloud console, dynamically |
-| High | 1 to limit set in the Cloud console *, dynamically | 1 | 1 to limit set in the Cloud console, dynamically |
-
-* The Cloud console doesn’t directly set an allocations limit; it only sets a vCPU limit. This vCPU limit indirectly determines the number of allocations, calculated as the vCPU limit divided by the number of threads.
-
-#### Adaptive resources disabled [_adaptive_resources_disabled]
-
-| Level | Allocations | Threads | vCPUs |
-| --- | --- | --- | --- |
-| Low | 2 if available, otherwise 1, statically | 1 | 2 if available |
-| Medium | the smaller of 32 or the limit set in the Cloud console, statically | 1 | 32 if available |
-| High | Maximum available set in the Cloud console *, statically | 1 | Maximum available set in the Cloud console, statically |
-
-* The Cloud console doesn’t directly set an allocations limit; it only sets a vCPU limit. This vCPU limit indirectly determines the number of allocations, calculated as the vCPU limit divided by the number of threads.
-
-### Deployments in Cloud optimized for search [_deployments_in_cloud_optimized_for_search]
-
-In case of search-optimized deployments, we maximize the number of threads. The maximum number of threads that can be claimed depends on the hardware your architecture has.
-
-#### Adaptive resources enabled [_adaptive_resources_enabled_2]
-
-| Level | Allocations | Threads | vCPUs |
-| --- | --- | --- | --- |
-| Low | 1 | 2 | 2 |
-| Medium | 1 to 2 (if threads=16) dynamically | maximum that the hardware allows (for example, 16) | 1 to 32 dynamically |
-| High | 1 to limit set in the Cloud console *, dynamically | maximum that the hardware allows (for example, 16) | 1 to limit set in the Cloud console, dynamically |
-
-* The Cloud console doesn’t directly set an allocations limit; it only sets a vCPU limit. This vCPU limit indirectly determines the number of allocations, calculated as the vCPU limit divided by the number of threads.
-
-#### Adaptive resources disabled [_adaptive_resources_disabled_2]
-
-| Level | Allocations | Threads | vCPUs |
-| --- | --- | --- | --- |
-| Low | 1 if available, statically | 2 | 2 if available |
-| Medium | 2 (if threads=16) statically | maximum that the hardware allows (for example, 16) | 32 if available |
-| High | Maximum available set in the Cloud console *, statically | maximum that the hardware allows (for example, 16) | Maximum available set in the Cloud console, statically |
-
-\* The Cloud console doesn’t directly set an allocations limit; it only sets a vCPU limit. This vCPU limit indirectly determines the number of allocations, calculated as the vCPU limit divided by the number of threads.
diff --git a/explore-analyze/machine-learning/nlp/ml-nlp-deploy-model.md b/explore-analyze/machine-learning/nlp/ml-nlp-deploy-model.md
index c3736bb546..25b74169d8 100644
--- a/explore-analyze/machine-learning/nlp/ml-nlp-deploy-model.md
+++ b/explore-analyze/machine-learning/nlp/ml-nlp-deploy-model.md
@@ -25,13 +25,13 @@ Each deployment will be fine-tuned automatically based on its specific purpose y
Since eland uses APIs to deploy the models, you cannot see the models in {{kib}} until the saved objects are synchronized. You can follow the prompts in {{kib}}, wait for automatic synchronization, or use the [sync {{ml}} saved objects API](https://www.elastic.co/docs/api/doc/kibana/v8/group/endpoint-ml).
::::
-You can define the resource usage level of the NLP model during model deployment. The resource usage levels behave differently depending on [adaptive resources](ml-nlp-auto-scale.md#nlp-model-adaptive-resources) being enabled or disabled. When adaptive resources are disabled but {{ml}} autoscaling is enabled, vCPU usage of Cloud deployments derived from the Cloud console and functions as follows:
+You can define the resource usage level of the NLP model during model deployment. The resource usage levels behave differently depending on [adaptive resources](../../../deploy-manage/autoscaling/trained-model-autoscaling.md#enabling-autoscaling-through-apis-adaptive-allocations) being enabled or disabled. When adaptive resources are disabled but {{ml}} autoscaling is enabled, vCPU usage of Cloud deployments derived from the Cloud console and functions as follows:
* Low: This level limits resources to two vCPUs, which may be suitable for development, testing, and demos depending on your parameters. It is not recommended for production use
* Medium: This level limits resources to 32 vCPUs, which may be suitable for development, testing, and demos depending on your parameters. It is not recommended for production use.
* High: This level may use the maximum number of vCPUs available for this deployment from the Cloud console. If the maximum is 2 vCPUs or fewer, this level is equivalent to the medium or low level.
-For the resource levels when adaptive resources are enabled, refer to <[*Trained model autoscaling*](ml-nlp-auto-scale.md).
+For the resource levels when adaptive resources are enabled, refer to <[*Trained model autoscaling*](../../../deploy-manage/autoscaling/trained-model-autoscaling.md).
## Request queues and search priority [infer-request-queues]
diff --git a/explore-analyze/machine-learning/nlp/ml-nlp-e5.md b/explore-analyze/machine-learning/nlp/ml-nlp-e5.md
index 98a4d28d73..5c1fae55a9 100644
--- a/explore-analyze/machine-learning/nlp/ml-nlp-e5.md
+++ b/explore-analyze/machine-learning/nlp/ml-nlp-e5.md
@@ -21,7 +21,7 @@ Refer to the model cards of the [multilingual-e5-small](https://huggingface.co/e
To use E5, you must have the [appropriate subscription](https://www.elastic.co/subscriptions) level for semantic search or the trial period activated.
-Enabling trained model autoscaling for your E5 deployment is recommended. Refer to [*Trained model autoscaling*](ml-nlp-auto-scale.md) to learn more.
+Enabling trained model autoscaling for your E5 deployment is recommended. Refer to [*Trained model autoscaling*](../../../deploy-manage/autoscaling/trained-model-autoscaling.md) to learn more.
## Download and deploy E5 [download-deploy-e5]
diff --git a/explore-analyze/machine-learning/nlp/ml-nlp-elser.md b/explore-analyze/machine-learning/nlp/ml-nlp-elser.md
index 08874b9bdc..c985ac620f 100644
--- a/explore-analyze/machine-learning/nlp/ml-nlp-elser.md
+++ b/explore-analyze/machine-learning/nlp/ml-nlp-elser.md
@@ -33,7 +33,7 @@ To use ELSER, you must have the [appropriate subscription](https://www.elastic.c
The minimum dedicated ML node size for deploying and using the ELSER model is 4 GB in {{ech}} if [deployment autoscaling](../../../deploy-manage/autoscaling.md) is turned off. Turning on autoscaling is recommended because it allows your deployment to dynamically adjust resources based on demand. Better performance can be achieved by using more allocations or more threads per allocation, which requires bigger ML nodes. Autoscaling provides bigger nodes when required. If autoscaling is turned off, you must provide suitably sized nodes yourself.
::::
-Enabling trained model autoscaling for your ELSER deployment is recommended. Refer to [*Trained model autoscaling*](ml-nlp-auto-scale.md) to learn more.
+Enabling trained model autoscaling for your ELSER deployment is recommended. Refer to [*Trained model autoscaling*](../../../deploy-manage/autoscaling/trained-model-autoscaling.md) to learn more.
## ELSER v2 [elser-v2]
@@ -72,7 +72,7 @@ PUT _inference/sparse_embedding/my-elser-model
}
```
-The API request automatically initiates the model download and then deploy the model. This example uses [autoscaling](ml-nlp-auto-scale.md) through adaptive allocation.
+The API request automatically initiates the model download and then deploy the model. This example uses [autoscaling](../../../deploy-manage/autoscaling/trained-model-autoscaling.md) through adaptive allocation.
Refer to the [ELSER {{infer}} integration documentation](../../elastic-inference/inference-api/elser-inference-integration.md) to learn more about the available settings.
@@ -292,7 +292,7 @@ To gain the biggest value out of ELSER trained models, consider to follow this l
* If quick response time is important for your use case, keep {{ml}} resources available at all times by setting `min_allocations` to `1`.
* Setting `min_allocations` to `0` can save on costs for non-critical use cases or testing environments.
-* Enabling [autoscaling](ml-nlp-auto-scale.md) through adaptive allocations or adaptive resources makes it possible for {{es}} to scale up or down the available resources of your ELSER deployment based on the load on the process.
+* Enabling [autoscaling](../../../deploy-manage/autoscaling/trained-model-autoscaling.md) through adaptive allocations or adaptive resources makes it possible for {{es}} to scale up or down the available resources of your ELSER deployment based on the load on the process.
* Use dedicated, optimized ELSER {{infer}} endpoints for ingest and search use cases.
* When deploying a trained model in {{kib}}, you can select for which case you want to optimize your ELSER deployment.
* If you use the trained model or {{infer}} APIs and want to optimize your ELSER trained model deployment or {{infer}} endpoint for ingest, set the number of threads to `1` (`"num_threads": 1`).
diff --git a/explore-analyze/machine-learning/nlp/ml-nlp-rerank.md b/explore-analyze/machine-learning/nlp/ml-nlp-rerank.md
index c2e51af5a3..70af12766d 100644
--- a/explore-analyze/machine-learning/nlp/ml-nlp-rerank.md
+++ b/explore-analyze/machine-learning/nlp/ml-nlp-rerank.md
@@ -73,7 +73,7 @@ PUT _inference/rerank/my-rerank-model
```
::::{note}
-The API request automatically downloads and deploys the model. This example uses [autoscaling](ml-nlp-auto-scale.md) through adaptive allocation.
+The API request automatically downloads and deploys the model. This example uses [autoscaling](../../../deploy-manage/autoscaling/trained-model-autoscaling.md) through adaptive allocation.
::::
::::{note}
diff --git a/explore-analyze/toc.yml b/explore-analyze/toc.yml
index 038d3480de..41abc68da3 100644
--- a/explore-analyze/toc.yml
+++ b/explore-analyze/toc.yml
@@ -211,7 +211,6 @@ toc:
- file: machine-learning/nlp/ml-nlp-import-model.md
- file: machine-learning/nlp/ml-nlp-deploy-model.md
- file: machine-learning/nlp/ml-nlp-test-inference.md
- - file: machine-learning/nlp/ml-nlp-auto-scale.md
- file: machine-learning/nlp/ml-nlp-inference.md
- file: machine-learning/nlp/ml-nlp-apis.md
- file: machine-learning/nlp/ml-nlp-built-in-models.md
diff --git a/raw-migrated-files/cloud/cloud-enterprise/ece-autoscaling.md b/raw-migrated-files/cloud/cloud-enterprise/ece-autoscaling.md
deleted file mode 100644
index 87742db84c..0000000000
--- a/raw-migrated-files/cloud/cloud-enterprise/ece-autoscaling.md
+++ /dev/null
@@ -1,125 +0,0 @@
-# Deployment autoscaling [ece-autoscaling]
-
-Autoscaling helps you to more easily manage your deployments by adjusting their available resources automatically, and currently supports scaling for both data and machine learning nodes, or machine learning nodes only. Check the following sections to learn more:
-
-* [Overview](../../../deploy-manage/autoscaling.md#ece-autoscaling-intro)
-* [When does autoscaling occur?](../../../deploy-manage/autoscaling.md#ece-autoscaling-factors)
-* [Notifications](../../../deploy-manage/autoscaling.md#ece-autoscaling-notifications)
-* [Restrictions and limitations](../../../deploy-manage/autoscaling.md#ece-autoscaling-restrictions)
-* [Enable or disable autoscaling](../../../deploy-manage/autoscaling.md#ece-autoscaling-enable)
-* [Update your autoscaling settings](../../../deploy-manage/autoscaling.md#ece-autoscaling-update)
-
-You can also have a look at our [autoscaling example](../../../deploy-manage/autoscaling/ece-autoscaling-example.md), as well as a sample request to [create an autoscaled deployment through the API](../../../deploy-manage/autoscaling/ece-autoscaling-api-example.md).
-
-
-## Overview [ece-autoscaling-intro]
-
-When you first create a deployment it can be challenging to determine the amount of storage your data nodes will require. The same is relevant for the amount of memory and CPU that you want to allocate to your machine learning nodes. It can become even more challenging to predict these requirements for weeks or months into the future. In an ideal scenario, these resources should be sized to both ensure efficient performance and resiliency, and to avoid excess costs. Autoscaling can help with this balance by adjusting the resources available to a deployment automatically as loads change over time, reducing the need for monitoring and manual intervention.
-
-::::{note}
-Autoscaling is enabled for the Machine Learning tier by default for new deployments.
-::::
-
-
-Currently, autoscaling behavior is as follows:
-
-* **Data tiers**
-
- * Each Elasticsearch [data tier](../../../manage-data/lifecycle/data-tiers.md) scales upward based on the amount of available storage. When we detect more storage is needed, autoscaling will scale up each data tier independently to ensure you can continue and ingest more data to your hot and content tier, or move data to the warm, cold, or frozen data tiers.
- * In addition to scaling up existing data tiers, a new data tier will be automatically added when necessary, based on your [index lifecycle management policies](https://www.elastic.co/guide/en/cloud-enterprise/current/ece-configure-index-management.html).
- * To control the maximum size of each data tier and ensure it will not scale above a certain size, you can use the maximum size per zone field.
- * Autoscaling based on memory or CPU, as well as autoscaling downward, is not currently supported. In case you want to adjust the size of your data tier to add more memory or CPU, or in case you deleted data and want to scale it down, you can set the current size per zone of each data tier manually.
-
-* **Machine learning nodes**
-
- * Machine learning nodes can scale upward and downward based on the configured machine learning jobs.
- * When a machine learning job is opened, or a machine learning trained model is deployed, if there are no machine learning nodes in your deployment, the autoscaling mechanism will automatically add machine learning nodes. Similarly, after a period of no active machine learning jobs, any enabled machine learning nodes are disabled automatically.
- * To control the maximum size of your machine learning nodes and ensure they will not scale above a certain size, you can use the maximum size per zone field.
- * To control the minimum size of your machine learning nodes and ensure the autoscaling mechanism will not scale machine learning below a certain size, you can use the minimum size per zone field.
- * The determination of when to scale is based on the expected memory and CPU requirements for the currently configured machine learning jobs and trained models.
-
-
-::::{note}
-For any Elastic Cloud Enterprise Elasticsearch component the number of availability zones is not affected by autoscaling. You can always set the number of availability zones manually and the autoscaling mechanism will add or remove capacity per availability zone.
-::::
-
-
-
-## When does autoscaling occur? [ece-autoscaling-factors]
-
-Several factors determine when data tiers or machine learning nodes are scaled.
-
-For a data tier, an autoscaling event can be triggered in the following cases:
-
-* Based on an assessment of how shards are currently allocated, and the amount of storage and buffer space currently available.
-
-When past behavior on a hot tier indicates that the influx of data can increase significantly in the near future. Refer to [Reactive storage decider](../../../deploy-manage/autoscaling/autoscaling-deciders.md) and [Proactive storage decider](../../../deploy-manage/autoscaling/autoscaling-deciders.md) for more detail.
-
-* Through ILM policies. For example, if a deployment has only hot nodes and autoscaling is enabled, it automatically creates warm or cold nodes, if an ILM policy is trying to move data from hot to warm or cold nodes.
-
-On machine learning nodes, scaling is determined by an estimate of the memory and CPU requirements for the currently configured jobs and trained models. When a new machine learning job tries to start, it looks for a node with adequate native memory and CPU capacity. If one cannot be found, it stays in an `opening` state. If this waiting job exceeds the queueing limit set in the machine learning decider, a scale up is requested. Conversely, as machine learning jobs run, their memory and CPU usage might decrease or other running jobs might finish or close. In this case, if the duration of decreased resource usage exceeds the set value for `down_scale_delay`, a scale down is requested. Check [Machine learning decider](../../../deploy-manage/autoscaling/autoscaling-deciders.md) for more detail. To learn more about machine learning jobs in general, check [Create anomaly detection jobs](/explore-analyze/machine-learning/anomaly-detection/ml-ad-run-jobs.md#ml-ad-create-job).
-
-On a highly available deployment, autoscaling events are always applied to instances in each availability zone simultaneously, to ensure consistency.
-
-
-## Notifications [ece-autoscaling-notifications]
-
-In the event that a data tier or machine learning node scales up to its maximum possible size, a notice appears on the deployment overview page prompting you to adjust your autoscaling settings in order to ensure optimal performance.
-
-A warning is also issued in the ECE `service-constructor` logs with the field `labels.autoscaling_notification_type` and a value of `data-tier-at-limit` (for a fully scaled data tier) or `ml-tier-at-limit` (for a fully scaled machine learning node). The warning is indexed in the `logging-and-metrics` deployment, so you can use that event to [configure an email notification](../../../explore-analyze/alerts-cases/watcher/actions-email.md).
-
-
-## Restrictions and limitations [ece-autoscaling-restrictions]
-
-The following are known limitations and restrictions with autoscaling:
-
-* Autoscaling will not run if the cluster is unhealthy or if the last Elasticsearch plan failed.
-* In the event that an override is set for the instance size or disk quota multiplier for an instance by means of the [Instance Overrides API](https://www.elastic.co/docs/api/doc/cloud-enterprise/operation/operation-set-all-instances-settings-overrides), autoscaling will be effectively disabled. It’s recommended to avoid adjusting the instance size or disk quota multiplier for an instance that uses autoscaling, since the setting prevents autoscaling.
-
-
-## Enable or disable autoscaling [ece-autoscaling-enable]
-
-To enable or disable autoscaling on a deployment:
-
-1. [Log into the Cloud UI](../../../deploy-manage/deploy/cloud-enterprise/log-into-cloud-ui.md).
-2. On the **Deployments** page, select your deployment.
-
- Narrow the list by name, ID, or choose from several other filters. To further define the list, use a combination of filters.
-
-3. In your deployment menu, select **Edit**.
-4. Select desired autoscaling configuration for this deployment using **Enable Autoscaling for:** dropdown menu.
-5. Select **Confirm** to have the autoscaling change and any other settings take effect. All plan changes are shown on the Deployment **Activity** page.
-
-When autoscaling has been enabled, the autoscaled nodes resize according to the [autoscaling settings](../../../deploy-manage/autoscaling.md#ece-autoscaling-update). Current sizes are shown on the deployment overview page.
-
-When autoscaling has been disabled, you need to adjust the size of data tiers and machine learning nodes manually.
-
-
-## Update your autoscaling settings [ece-autoscaling-update]
-
-Each autoscaling setting is configured with a default value. You can adjust these if necessary, as follows:
-
-1. [Log into the Cloud UI](../../../deploy-manage/deploy/cloud-enterprise/log-into-cloud-ui.md).
-2. On the **Deployments** page, select your deployment.
-
- Narrow the list by name, ID, or choose from several other filters. To further define the list, use a combination of filters.
-
-3. In your deployment menu, select **Edit**.
-4. To update a data tier:
-
- 1. Use the dropdown box to set the **Maximum size per zone** to the largest amount of resources that should be allocated to the data tier automatically. The resources will not scale above this value.
- 2. You can also update the **Current size per zone**. If you update this setting to match the **Maximum size per zone**, the data tier will remain fixed at that size.
- 3. For a hot data tier you can also adjust the **Forecast window**. This is the duration of time, up to the present, for which past storage usage is assessed in order to predict when additional storage is needed.
- 4. Select **Save** to apply the changes to your deployment.
-
-5. To update machine learning nodes:
-
- 1. Use the dropdown box to set the **Minimum size per zone** and **Maximum size per zone** to the smallest and largest amount of resources, respectively, that should be allocated to the nodes automatically. The resources allocated to machine learning will not exceed these values. If you set these two settings to the same value, the machine learning node will remain fixed at that size.
- 2. Select **Save** to apply the changes to your deployment.
-
-
-You can also view our [example](../../../deploy-manage/autoscaling/ece-autoscaling-example.md) of how the autoscaling settings work.
-
-::::{note}
-On Elastic Cloud Enterprise, system-owned deployment templates include the default values for all deployment autoscaling settings.
-::::
diff --git a/raw-migrated-files/cloud/cloud-heroku/ech-autoscaling.md b/raw-migrated-files/cloud/cloud-heroku/ech-autoscaling.md
deleted file mode 100644
index b42cda57ca..0000000000
--- a/raw-migrated-files/cloud/cloud-heroku/ech-autoscaling.md
+++ /dev/null
@@ -1,118 +0,0 @@
-# Deployment autoscaling [ech-autoscaling]
-
-Autoscaling helps you to more easily manage your deployments by adjusting their available resources automatically, and currently supports scaling for both data and machine learning nodes, or machine learning nodes only. Check the following sections to learn more:
-
-* [Overview](../../../deploy-manage/autoscaling.md#ech-autoscaling-intro)
-* [When does autoscaling occur?](../../../deploy-manage/autoscaling.md#ech-autoscaling-factors)
-* [Notifications](../../../deploy-manage/autoscaling.md#ech-autoscaling-notifications)
-* [Restrictions and limitations](../../../deploy-manage/autoscaling.md#ech-autoscaling-restrictions)
-* [Enable or disable autoscaling](../../../deploy-manage/autoscaling.md#ech-autoscaling-enable)
-* [Update your autoscaling settings](../../../deploy-manage/autoscaling.md#ech-autoscaling-update)
-
-You can also have a look at our [autoscaling example](../../../deploy-manage/autoscaling/ech-autoscaling-example.md).
-
-
-## Overview [ech-autoscaling-intro]
-
-When you first create a deployment it can be challenging to determine the amount of storage your data nodes will require. The same is relevant for the amount of memory and CPU that you want to allocate to your machine learning nodes. It can become even more challenging to predict these requirements for weeks or months into the future. In an ideal scenario, these resources should be sized to both ensure efficient performance and resiliency, and to avoid excess costs. Autoscaling can help with this balance by adjusting the resources available to a deployment automatically as loads change over time, reducing the need for monitoring and manual intervention.
-
-::::{note}
-Autoscaling is enabled for the Machine Learning tier by default for new deployments.
-::::
-
-
-Currently, autoscaling behavior is as follows:
-
-* **Data tiers**
-
- * Each Elasticsearch [data tier](../../../manage-data/lifecycle/data-tiers.md) scales upward based on the amount of available storage. When we detect more storage is needed, autoscaling will scale up each data tier independently to ensure you can continue and ingest more data to your hot and content tier, or move data to the warm, cold, or frozen data tiers.
- * In addition to scaling up existing data tiers, a new data tier will be automatically added when necessary, based on your index lifecycle management policies.
- * To control the maximum size of each data tier and ensure it will not scale above a certain size, you can use the maximum size per zone field.
- * Autoscaling based on memory or CPU, as well as autoscaling downward, is not currently supported. In case you want to adjust the size of your data tier to add more memory or CPU, or in case you deleted data and want to scale it down, you can set the current size per zone of each data tier manually.
-
-* **Machine learning nodes**
-
- * Machine learning nodes can scale upward and downward based on the configured machine learning jobs.
- * When a machine learning job is opened, or a machine learning trained model is deployed, if there are no machine learning nodes in your deployment, the autoscaling mechanism will automatically add machine learning nodes. Similarly, after a period of no active machine learning jobs, any enabled machine learning nodes are disabled automatically.
- * To control the maximum size of your machine learning nodes and ensure they will not scale above a certain size, you can use the maximum size per zone field.
- * To control the minimum size of your machine learning nodes and ensure the autoscaling mechanism will not scale machine learning below a certain size, you can use the minimum size per zone field.
- * The determination of when to scale is based on the expected memory and CPU requirements for the currently configured machine learning jobs and trained models.
-
-
-::::{note}
-For any Elasticsearch Add-On for Heroku Elasticsearch component the number of availability zones is not affected by autoscaling. You can always set the number of availability zones manually and the autoscaling mechanism will add or remove capacity per availability zone.
-::::
-
-
-
-## When does autoscaling occur? [ech-autoscaling-factors]
-
-Several factors determine when data tiers or machine learning nodes are scaled.
-
-For a data tier, an autoscaling event can be triggered in the following cases:
-
-* Based on an assessment of how shards are currently allocated, and the amount of storage and buffer space currently available.
-
-When past behavior on a hot tier indicates that the influx of data can increase significantly in the near future. Refer to [Reactive storage decider](../../../deploy-manage/autoscaling/autoscaling-deciders.md) and [Proactive storage decider](../../../deploy-manage/autoscaling/autoscaling-deciders.md) for more detail.
-
-* Through ILM policies. For example, if a deployment has only hot nodes and autoscaling is enabled, it automatically creates warm or cold nodes, if an ILM policy is trying to move data from hot to warm or cold nodes.
-
-On machine learning nodes, scaling is determined by an estimate of the memory and CPU requirements for the currently configured jobs and trained models. When a new machine learning job tries to start, it looks for a node with adequate native memory and CPU capacity. If one cannot be found, it stays in an `opening` state. If this waiting job exceeds the queueing limit set in the machine learning decider, a scale up is requested. Conversely, as machine learning jobs run, their memory and CPU usage might decrease or other running jobs might finish or close. In this case, if the duration of decreased resource usage exceeds the set value for `down_scale_delay`, a scale down is requested. Check [Machine learning decider](../../../deploy-manage/autoscaling/autoscaling-deciders.md) for more detail. To learn more about machine learning jobs in general, check [Create anomaly detection jobs](/explore-analyze/machine-learning/anomaly-detection/ml-ad-run-jobs.md#ml-ad-create-job).
-
-On a highly available deployment, autoscaling events are always applied to instances in each availability zone simultaneously, to ensure consistency.
-
-
-## Notifications [ech-autoscaling-notifications]
-
-In the event that a data tier or machine learning node scales up to its maximum possible size, you’ll receive an email, and a notice also appears on the deployment overview page prompting you to adjust your autoscaling settings to ensure optimal performance.
-
-
-## Restrictions and limitations [ech-autoscaling-restrictions]
-
-The following are known limitations and restrictions with autoscaling:
-
-* Autoscaling will not run if the cluster is unhealthy or if the last Elasticsearch plan failed.
-
-
-## Enable or disable autoscaling [ech-autoscaling-enable]
-
-To enable or disable autoscaling on a deployment:
-
-1. Log in to the [Elasticsearch Add-On for Heroku console](https://cloud.elastic.co?page=docs&placement=docs-body).
-2. On the **Deployments** page, select your deployment.
-
- Narrow your deployments by name, ID, or choose from several other filters. To customize your view, use a combination of filters, or change the format from a grid to a list.
-
-3. In your deployment menu, select **Edit**.
-4. Select desired autoscaling configuration for this deployment using **Enable Autoscaling for:** dropdown menu.
-5. Select **Confirm** to have the autoscaling change and any other settings take effect. All plan changes are shown on the Deployment **Activity** page.
-
-When autoscaling has been enabled, the autoscaled nodes resize according to the [autoscaling settings](../../../deploy-manage/autoscaling.md#ech-autoscaling-update). Current sizes are shown on the deployment overview page.
-
-When autoscaling has been disabled, you need to adjust the size of data tiers and machine learning nodes manually.
-
-
-## Update your autoscaling settings [ech-autoscaling-update]
-
-Each autoscaling setting is configured with a default value. You can adjust these if necessary, as follows:
-
-1. Log in to the [Elasticsearch Add-On for Heroku console](https://cloud.elastic.co?page=docs&placement=docs-body).
-2. On the **Deployments** page, select your deployment.
-
- Narrow your deployments by name, ID, or choose from several other filters. To customize your view, use a combination of filters, or change the format from a grid to a list.
-
-3. In your deployment menu, select **Edit**.
-4. To update a data tier:
-
- 1. Use the dropdown box to set the **Maximum size per zone** to the largest amount of resources that should be allocated to the data tier automatically. The resources will not scale above this value.
- 2. You can also update the **Current size per zone**. If you update this setting to match the **Maximum size per zone**, the data tier will remain fixed at that size.
- 3. For a hot data tier you can also adjust the **Forecast window**. This is the duration of time, up to the present, for which past storage usage is assessed in order to predict when additional storage is needed.
- 4. Select **Save** to apply the changes to your deployment.
-
-5. To update machine learning nodes:
-
- 1. Use the dropdown box to set the **Minimum size per zone** and **Maximum size per zone** to the smallest and largest amount of resources, respectively, that should be allocated to the nodes automatically. The resources allocated to machine learning will not exceed these values. If you set these two settings to the same value, the machine learning node will remain fixed at that size.
- 2. Select **Save** to apply the changes to your deployment.
-
-
-You can also view our [example](../../../deploy-manage/autoscaling/ech-autoscaling-example.md) of how the autoscaling settings work.
diff --git a/raw-migrated-files/cloud/cloud/ec-autoscaling.md b/raw-migrated-files/cloud/cloud/ec-autoscaling.md
deleted file mode 100644
index a1399649a8..0000000000
--- a/raw-migrated-files/cloud/cloud/ec-autoscaling.md
+++ /dev/null
@@ -1,120 +0,0 @@
-# Deployment autoscaling [ec-autoscaling]
-
-Autoscaling helps you to more easily manage your deployments by adjusting their available resources automatically, and currently supports scaling for both data and machine learning nodes, or machine learning nodes only. Check the following sections to learn more:
-
-* [Overview](../../../deploy-manage/autoscaling.md#ec-autoscaling-intro)
-* [When does autoscaling occur?](../../../deploy-manage/autoscaling.md#ec-autoscaling-factors)
-* [Notifications](../../../deploy-manage/autoscaling.md#ec-autoscaling-notifications)
-* [Restrictions and limitations](../../../deploy-manage/autoscaling.md#ec-autoscaling-restrictions)
-* [Enable or disable autoscaling](../../../deploy-manage/autoscaling.md#ec-autoscaling-enable)
-* [Update your autoscaling settings](../../../deploy-manage/autoscaling.md#ec-autoscaling-update)
-
-You can also have a look at our [autoscaling example](../../../deploy-manage/autoscaling/ec-autoscaling-example.md), as well as a sample request to [create an autoscaled deployment through the API](../../../deploy-manage/autoscaling/ec-autoscaling-api-example.md).
-
-
-## Overview [ec-autoscaling-intro]
-
-When you first create a deployment it can be challenging to determine the amount of storage your data nodes will require. The same is relevant for the amount of memory and CPU that you want to allocate to your machine learning nodes. It can become even more challenging to predict these requirements for weeks or months into the future. In an ideal scenario, these resources should be sized to both ensure efficient performance and resiliency, and to avoid excess costs. Autoscaling can help with this balance by adjusting the resources available to a deployment automatically as loads change over time, reducing the need for monitoring and manual intervention.
-
-::::{note}
-Autoscaling is enabled for the Machine Learning tier by default for new deployments.
-::::
-
-
-Currently, autoscaling behavior is as follows:
-
-* **Data tiers**
-
- * Each Elasticsearch [data tier](../../../manage-data/lifecycle/data-tiers.md) scales upward based on the amount of available storage. When we detect more storage is needed, autoscaling will scale up each data tier independently to ensure you can continue and ingest more data to your hot and content tier, or move data to the warm, cold, or frozen data tiers.
- * In addition to scaling up existing data tiers, a new data tier will be automatically added when necessary, based on your [index lifecycle management policies](../../../manage-data/lifecycle/index-lifecycle-management.md).
- * To control the maximum size of each data tier and ensure it will not scale above a certain size, you can use the maximum size per zone field.
- * Autoscaling based on memory or CPU, as well as autoscaling downward, is not currently supported. In case you want to adjust the size of your data tier to add more memory or CPU, or in case you deleted data and want to scale it down, you can set the current size per zone of each data tier manually.
-
-* **Machine learning nodes**
-
- * Machine learning nodes can scale upward and downward based on the configured machine learning jobs.
- * When a machine learning job is opened, or a machine learning trained model is deployed, if there are no machine learning nodes in your deployment, the autoscaling mechanism will automatically add machine learning nodes. Similarly, after a period of no active machine learning jobs, any enabled machine learning nodes are disabled automatically.
- * To control the maximum size of your machine learning nodes and ensure they will not scale above a certain size, you can use the maximum size per zone field.
- * To control the minimum size of your machine learning nodes and ensure the autoscaling mechanism will not scale machine learning below a certain size, you can use the minimum size per zone field.
- * The determination of when to scale is based on the expected memory and CPU requirements for the currently configured machine learning jobs and trained models.
-
-
-::::{note}
-The number of availability zones for each component of your {{ech}} deployments is not affected by autoscaling. You can always set the number of availability zones manually and the autoscaling mechanism will add or remove capacity per availability zone.
-::::
-
-
-
-## When does autoscaling occur? [ec-autoscaling-factors]
-
-Several factors determine when data tiers or machine learning nodes are scaled.
-
-For a data tier, an autoscaling event can be triggered in the following cases:
-
-* Based on an assessment of how shards are currently allocated, and the amount of storage and buffer space currently available.
-
-When past behavior on a hot tier indicates that the influx of data can increase significantly in the near future. Refer to [Reactive storage decider](../../../deploy-manage/autoscaling/autoscaling-deciders.md) and [Proactive storage decider](../../../deploy-manage/autoscaling/autoscaling-deciders.md) for more detail.
-
-* Through ILM policies. For example, if a deployment has only hot nodes and autoscaling is enabled, it automatically creates warm or cold nodes, if an ILM policy is trying to move data from hot to warm or cold nodes.
-
-On machine learning nodes, scaling is determined by an estimate of the memory and CPU requirements for the currently configured jobs and trained models. When a new machine learning job tries to start, it looks for a node with adequate native memory and CPU capacity. If one cannot be found, it stays in an `opening` state. If this waiting job exceeds the queueing limit set in the machine learning decider, a scale up is requested. Conversely, as machine learning jobs run, their memory and CPU usage might decrease or other running jobs might finish or close. In this case, if the duration of decreased resource usage exceeds the set value for `down_scale_delay`, a scale down is requested. Check [Machine learning decider](../../../deploy-manage/autoscaling/autoscaling-deciders.md) for more detail. To learn more about machine learning jobs in general, check [Create anomaly detection jobs](/explore-analyze/machine-learning/anomaly-detection/ml-ad-run-jobs.md#ml-ad-create-job).
-
-On a highly available deployment, autoscaling events are always applied to instances in each availability zone simultaneously, to ensure consistency.
-
-
-## Notifications [ec-autoscaling-notifications]
-
-In the event that a data tier or machine learning node scales up to its maximum possible size, you’ll receive an email, and a notice also appears on the deployment overview page prompting you to adjust your autoscaling settings to ensure optimal performance.
-
-
-## Restrictions and limitations [ec-autoscaling-restrictions]
-
-The following are known limitations and restrictions with autoscaling:
-
-* Autoscaling will not run if the cluster is unhealthy or if the last Elasticsearch plan failed.
-* Trial deployments cannot be configured to autoscale beyond the normal Trial deployment size limits. The maximum size per zone is increased automatically from the Trial limit when you convert to a paid subscription.
-* ELSER deployments do not scale automatically. For more information, refer to [ELSER](../../../explore-analyze/machine-learning/nlp/ml-nlp-elser.md) and [Trained model autoscaling](../../../explore-analyze/machine-learning/nlp/ml-nlp-auto-scale.md).
-
-
-## Enable or disable autoscaling [ec-autoscaling-enable]
-
-To enable or disable autoscaling on a deployment:
-
-1. Log in to the [{{ecloud}} Console](https://cloud.elastic.co?page=docs&placement=docs-body).
-2. On the **Deployments** page, select your deployment.
-
- On the **Deployments** page you can narrow your deployments by name, ID, or choose from several other filters. To customize your view, use a combination of filters, or change the format from a grid to a list.
-
-3. In your deployment menu, select **Edit**.
-4. Select desired autoscaling configuration for this deployment using **Enable Autoscaling for:** dropdown menu.
-5. Select **Confirm** to have the autoscaling change and any other settings take effect. All plan changes are shown on the Deployment **Activity** page.
-
-When autoscaling has been enabled, the autoscaled nodes resize according to the [autoscaling settings](../../../deploy-manage/autoscaling.md#ec-autoscaling-update). Current sizes are shown on the deployment overview page.
-
-When autoscaling has been disabled, you need to adjust the size of data tiers and machine learning nodes manually.
-
-
-## Update your autoscaling settings [ec-autoscaling-update]
-
-Each autoscaling setting is configured with a default value. You can adjust these if necessary, as follows:
-
-1. Log in to the [{{ecloud}} Console](https://cloud.elastic.co?page=docs&placement=docs-body).
-2. On the **Deployments** page, select your deployment.
-
- On the **Deployments** page you can narrow your deployments by name, ID, or choose from several other filters. To customize your view, use a combination of filters, or change the format from a grid to a list.
-
-3. In your deployment menu, select **Edit**.
-4. To update a data tier:
-
- 1. Use the dropdown box to set the **Maximum size per zone** to the largest amount of resources that should be allocated to the data tier automatically. The resources will not scale above this value.
- 2. You can also update the **Current size per zone**. If you update this setting to match the **Maximum size per zone**, the data tier will remain fixed at that size.
- 3. For a hot data tier you can also adjust the **Forecast window**. This is the duration of time, up to the present, for which past storage usage is assessed in order to predict when additional storage is needed.
- 4. Select **Save** to apply the changes to your deployment.
-
-5. To update machine learning nodes:
-
- 1. Use the dropdown box to set the **Minimum size per zone** and **Maximum size per zone** to the smallest and largest amount of resources, respectively, that should be allocated to the nodes automatically. The resources allocated to machine learning will not exceed these values. If you set these two settings to the same value, the machine learning node will remain fixed at that size.
- 2. Select **Save** to apply the changes to your deployment.
-
-
-You can also view our [example](../../../deploy-manage/autoscaling/ec-autoscaling-example.md) of how the autoscaling settings work.
diff --git a/raw-migrated-files/docs-content/serverless/general-ml-nlp-auto-scale.md b/raw-migrated-files/docs-content/serverless/general-ml-nlp-auto-scale.md
deleted file mode 100644
index f8e4143bee..0000000000
--- a/raw-migrated-files/docs-content/serverless/general-ml-nlp-auto-scale.md
+++ /dev/null
@@ -1,117 +0,0 @@
-# Trained model autoscaling [general-ml-nlp-auto-scale]
-
-This content applies to: [](../../../solutions/search.md) [](../../../solutions/observability.md) [](../../../solutions/security/elastic-security-serverless.md)
-
-You can enable autoscaling for each of your trained model deployments. Autoscaling allows {{es}} to automatically adjust the resources the model deployment can use based on the workload demand.
-
-There are two ways to enable autoscaling:
-
-* through APIs by enabling adaptive allocations
-* in Kibana by enabling adaptive resources
-
-Trained model autoscaling is available for both serverless and Cloud deployments. In serverless deployments, processing power is managed differently across Search, Observability, and Security projects, which impacts their costs and resource limits.
-
-Security and Observability projects are only charged for data ingestion and retention. They are not charged for processing power (VCU usage), which is used for more complex operations, like running advanced search models. For example, in Search projects, models such as ELSER require significant processing power to provide more accurate search results.
-
-
-## Enabling autoscaling through APIs - adaptive allocations [enabling-autoscaling-through-apis-adaptive-allocations]
-
-Model allocations are independent units of work for NLP tasks. If you set a static number of allocations, they remain constant even when not all the available resources are fully used or when the load on the model requires more resources. Instead of setting the number of allocations manually, you can enable adaptive allocations to set the number of allocations based on the load on the process. This can help you to manage performance and cost more easily. (Refer to the [pricing calculator](https://cloud.elastic.co/pricing) to learn more about the possible costs.)
-
-When adaptive allocations are enabled, the number of allocations of the model is set automatically based on the current load. When the load is high, additional model allocations are automatically created as needed. When the load is low, a model allocation is automatically removed. You can explicitly set the minimum and maximum number of allocations; autoscaling will occur within these limits.
-
-::::{note}
-If you set the minimum number of allocations to 1, you will be charged even if the system is not using those resources.
-
-::::
-
-
-You can enable adaptive allocations by using:
-
-* the create inference endpoint API for [ELSER](../../../explore-analyze/elastic-inference/inference-api/elser-inference-integration.md ), [E5 and models uploaded through Eland](../../../explore-analyze/elastic-inference/inference-api/elasticsearch-inference-integration.md) that are used as inference services.
-* the [start trained model deployment](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-ml-start-trained-model-deployment) or [update trained model deployment](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-ml-update-trained-model-deployment) APIs for trained models that are deployed on machine learning nodes.
-
-If the new allocations fit on the current machine learning nodes, they are immediately started. If more resource capacity is needed for creating new model allocations, then your machine learning node will be scaled up if machine learning autoscaling is enabled to provide enough resources for the new allocation. The number of model allocations can be scaled down to 0. They cannot be scaled up to more than 32 allocations, unless you explicitly set the maximum number of allocations to more. Adaptive allocations must be set up independently for each deployment and [inference endpoint](https://www.elastic.co/docs/api/doc/elasticsearch/group/endpoint-inference).
-
-When you create inference endpoints on Serverless using Kibana, adaptive allocations are automatically turned on, and there is no option to disable them.
-
-
-### Optimizing for typical use cases [optimizing-for-typical-use-cases]
-
-You can optimize your model deployment for typical use cases, such as search and ingest. When you optimize for ingest, the throughput will be higher, which increases the number of inference requests that can be performed in parallel. When you optimize for search, the latency will be lower during search processes.
-
-* If you want to optimize for ingest, set the number of threads to `1` (`"threads_per_allocation": 1`).
-* If you want to optimize for search, set the number of threads to greater than `1`. Increasing the number of threads will make the search processes more performant.
-
-
-## Enabling autoscaling in {{kib}} - adaptive resources [enabling-autoscaling-in-kibana-adaptive-resources]
-
-You can enable adaptive resources for your models when starting or updating the model deployment. Adaptive resources make it possible for {{es}} to scale up or down the available resources based on the load on the process. This can help you to manage performance and cost more easily. When adaptive resources are enabled, the number of VCUs that the model deployment uses is set automatically based on the current load. When the load is high, the number of VCUs that the process can use is automatically increased. When the load is low, the number of VCUs that the process can use is automatically decreased.
-
-You can choose from three levels of resource usage for your trained model deployment; autoscaling will occur within the selected level’s range.
-
-Refer to the tables in the auto-scaling-matrix section to find out the settings for the level you selected.
-
-:::{image} ../../../images/serverless-ml-nlp-deployment.png
-:alt: ML model deployment with adaptive resources enabled.
-:::
-
-Search projects are given access to more processing resources, while Security and Observability projects have lower limits. This difference is reflected in the UI configuration: Search projects have higher resource limits compared to Security and Observability projects to accommodate their more complex operations.
-
-On Serverless, adaptive allocations are automatically enabled for all project types. However, the "Adaptive resources" control is not displayed in Kibana for Observability and Security projects.
-
-
-## Model deployment resource matrix [model-deployment-resource-matrix]
-
-The used resources for trained model deployments depend on three factors:
-
-* your cluster environment (Serverless, Cloud, or on-premises)
-* the use case you optimize the model deployment for (ingest or search)
-* whether model autoscaling is enabled with adaptive allocations/resources to have dynamic resources, or disabled for static resources
-
-The following tables show you the number of allocations, threads, and VCUs available on Serverless when adaptive resources are enabled or disabled.
-
-
-### Deployments on serverless optimized for ingest [deployments-on-serverless-optimized-for-ingest]
-
-In case of ingest-optimized deployments, we maximize the number of model allocations.
-
-
-#### Adaptive resources enabled [adaptive-resources-enabled]
-
-| Level | Allocations | Threads | VCUs |
-| --- | --- | --- | --- |
-| Low | 0 to 2 dynamically | 1 | 0 to 16 dynamically |
-| Medium | 1 to 32 dynamically | 1 | 8 to 256 dynamically |
-| High | 1 to 512 for Search
1 to 128 for Security and Observability
| 1 | 8 to 4096 for Search
8 to 1024 for Security and Observability
|
-
-
-#### Adaptive resources disabled (Search only) [adaptive-resources-disabled-search-only]
-
-| Level | Allocations | Threads | VCUs |
-| --- | --- | --- | --- |
-| Low | Exactly 2 | 1 | 16 |
-| Medium | Exactly 32 | 1 | 256 |
-| High | 512 for Search
No static allocations for Security and Observability
| 1 | 4096 for Search
No static allocations for Security and Observability
|
-
-
-### Deployments on serverless optimized for Search [deployments-on-serverless-optimized-for-search]
-
-
-#### Adaptive resources enabled [adaptive-resources-enabled-for-search]
-
-| Level | Allocations | Threads | VCUs |
-| --- | --- | --- | --- |
-| Low | 0 to 1 dynamically | Always 2 | 0 to 16 dynamically |
-| Medium | 1 to 2 (if threads=16), dynamically | Maximum (for example, 16) | 8 to 256 dynamically |
-| High | 1 to 32 (if threads=16), dynamically
1 to 128 for Security and Observability
| Maximum (for example, 16) | 8 to 4096 for Search
8 to 1024 for Security and Observability
|
-
-
-#### Adaptive resources disabled [adaptive-resources-disabled-for-search]
-
-| Level | Allocations | Threads | VCUs |
-| --- | --- | --- | --- |
-| Low | 1 statically | Always 2 | 16 |
-| Medium | 2 statically (if threads=16) | Maximum (for example, 16) | 256 |
-| High | 32 statically (if threads=16) for Search
No static allocations for Security and Observability
| Maximum (for example, 16) | 4096 for Search
No static allocations for Security and Observability
|
-
diff --git a/raw-migrated-files/toc.yml b/raw-migrated-files/toc.yml
index 963189d266..9d8c39c9d6 100644
--- a/raw-migrated-files/toc.yml
+++ b/raw-migrated-files/toc.yml
@@ -28,7 +28,6 @@ toc:
- file: cloud/cloud-enterprise/ece-add-user-settings.md
- file: cloud/cloud-enterprise/ece-administering-deployments.md
- file: cloud/cloud-enterprise/ece-api-console.md
- - file: cloud/cloud-enterprise/ece-autoscaling.md
- file: cloud/cloud-enterprise/ece-change-deployment.md
- file: cloud/cloud-enterprise/ece-configuring-keystore.md
- file: cloud/cloud-enterprise/ece-create-deployment.md
@@ -70,7 +69,6 @@ toc:
- file: cloud/cloud-heroku/ech-access-kibana.md
- file: cloud/cloud-heroku/ech-activity-page.md
- file: cloud/cloud-heroku/ech-add-user-settings.md
- - file: cloud/cloud-heroku/ech-autoscaling.md
- file: cloud/cloud-heroku/ech-configuring-keystore.md
- file: cloud/cloud-heroku/ech-custom-repository.md
- file: cloud/cloud-heroku/ech-delete-deployment.md
@@ -104,7 +102,6 @@ toc:
- file: cloud/cloud/ec-access-kibana.md
- file: cloud/cloud/ec-activity-page.md
- file: cloud/cloud/ec-add-user-settings.md
- - file: cloud/cloud/ec-autoscaling.md
- file: cloud/cloud/ec-billing-stop.md
- file: cloud/cloud/ec-cloud-ingest-data.md
- file: cloud/cloud/ec-configuring-keystore.md
@@ -163,7 +160,6 @@ toc:
- file: docs-content/serverless/elasticsearch-ingest-data-file-upload.md
- file: docs-content/serverless/elasticsearch-ingest-data-through-api.md
- file: docs-content/serverless/general-billing-stop-project.md
- - file: docs-content/serverless/general-ml-nlp-auto-scale.md
- file: docs-content/serverless/general-sign-up-trial.md
- file: docs-content/serverless/index-management.md
- file: docs-content/serverless/intro.md
diff --git a/redirects.yml b/redirects.yml
index c81e1c8ef4..3f902ad7cd 100644
--- a/redirects.yml
+++ b/redirects.yml
@@ -6,6 +6,8 @@ redirects:
'solutions/search/search-approaches/near-real-time-search.md': '!manage-data/data-store/near-real-time-search.md'
## deploy-manage
+ 'deploy-manage/autoscaling/ec-autoscaling-api-example.md': '!deploy-manage/autoscaling/autoscaling-in-ece-and-ech.md'
+ 'deploy-manage/autoscaling/ece-autoscaling-api-example.md': '!deploy-manage/autoscaling/autoscaling-in-ece-and-ech.md'
'deploy-manage/deploy/elastic-cloud/ec-configure-deployment-settings.md': '!deploy-manage/deploy/elastic-cloud/ec-customize-deployment-components.md'
'deploy-manage/users-roles/cluster-or-deployment-auth/user-authentication.md':
anchors:
@@ -14,6 +16,10 @@ redirects:
'http-authentication':
'deploy-manage/deploy/cloud-enterprise/deploy-large-installation-cloud.md': '!deploy-manage/deploy/cloud-enterprise/deploy-large-installation.md'
+## explore-analyze
+ 'explore-analyze/machine-learning/nlp/ml-nlp-auto-scale.md': '!deploy-manage/autoscaling/trained-model-autoscaling.md'
+
+
## reference
'reference/security/elastic-defend/index.md': 'solutions/security/configure-elastic-defend.md'
'reference/security/elastic-defend/elastic-endpoint-deploy-reqs.md': 'solutions/security/configure-elastic-defend/elastic-defend-requirements.md'
diff --git a/solutions/search/semantic-search/semantic-search-semantic-text.md b/solutions/search/semantic-search/semantic-search-semantic-text.md
index b6a4580fec..f00f157ad3 100644
--- a/solutions/search/semantic-search/semantic-search-semantic-text.md
+++ b/solutions/search/semantic-search/semantic-search-semantic-text.md
@@ -132,4 +132,4 @@ As a result, you receive the top 10 documents that are closest in meaning to the
* If you want to use `semantic_text` in hybrid search, refer to [this notebook](https://colab.research.google.com/github/elastic/elasticsearch-labs/blob/main/notebooks/search/09-semantic-text.ipynb) for a step-by-step guide.
* For more information on how to optimize your ELSER endpoints, refer to [the ELSER recommendations](/explore-analyze/machine-learning/nlp/ml-nlp-elser.md#elser-recommendations) section in the model documentation.
-* To learn more about model autoscaling, refer to the [trained model autoscaling](/explore-analyze/machine-learning/nlp/ml-nlp-auto-scale.md) page.
+* To learn more about model autoscaling, refer to the [trained model autoscaling](../../../deploy-manage/autoscaling/trained-model-autoscaling.md) page.