Updates document based on feedback

kosabogi · kosabogi · commit 9fdee403d1e6 · 2024-11-06T08:13:04.000+01:00
diff --git a/serverless/pages/ml-nlp-auto-scale.mdx b/serverless/pages/ml-nlp-auto-scale.mdx
@@ -13,20 +13,20 @@ There are two ways to enable autoscaling:
 - in Kibana by enabling adaptive resources
 
 
-Trained model autoscaling is available for Search, Observability, and Security projects on serverless deployments. However, these projects handle processing power differently, which impacts their costs and resource limits.
+Trained model autoscaling is available for both serverless and Cloud deployments. In serverless deployments, processing power is managed differently across Search, Observability, and Security projects, which impacts their costs and resource limits.
 
-Security and Observability projects are only charged for data ingestion and retention. They are not charged for processing power (vCU usage), which is used for more complex operations, like running advanced search models. For example, in Search projects, models such as ELSER require significant processing power to provide more accurate search results.
+Security and Observability projects are only charged for data ingestion and retention. They are not charged for processing power (VCU usage), which is used for more complex operations, like running advanced search models. For example, in Search projects, models such as ELSER require significant processing power to provide more accurate search results.
 
 ## Enabling autoscaling through APIs - adaptive allocations
 
 Model allocations are independent units of work for NLP tasks.
-If you set the numbers of threads and allocations for a model manually, they remain constant even when not all the available resources are fully used or when the load on the model requires more resources.
+If you set a static number of allocations, they remain constant even when not all the available resources are fully used or when the load on the model requires more resources.
 Instead of setting the number of allocations manually, you can enable adaptive allocations to set the number of allocations based on the load on the process.
 This can help you to manage performance and cost more easily.
 (Refer to the [pricing calculator](https://cloud.elastic.co/pricing) to learn more about the possible costs.)
 
 When adaptive allocations are enabled, the number of allocations of the model is set automatically based on the current load.
-When the load is high, a new model allocation is automatically created.
+When the load is high, additional model allocations are automatically created as needed.
 When the load is low, a model allocation is automatically removed.
 You can explicitely set the minimum and maximum number of allocations; autoscaling will occur within these limits.
 
@@ -63,8 +63,8 @@ You can enable adaptive resources for your models when starting or updating the
 Adaptive resources make it possible for Elasticsearch to scale up or down the available resources based on the load on the process.
 This can help you to manage performance and cost more easily.
 When adaptive resources are enabled, the number of VCUs that the model deployment uses is set automatically based on the current load.
-When the load is high, the number of vCUs that the process can use is automatically increased.
-When the load is low, the number of vCUs that the process can use is automatically decreased.
+When the load is high, the number of VCUs that the process can use is automatically increased.
+When the load is low, the number of VCUs that the process can use is automatically decreased.
 
 You can choose from three levels of resource usage for your trained model deployment; autoscaling will occur within the selected level's range.
 
@@ -85,44 +85,44 @@ The used resources for trained model deployments depend on three factors:
 - the use case you optimize the model deployment for (ingest or search)
 - whether model autoscaling is enabled with adaptive allocations/resources to have dynamic resources, or disabled for static resources
 
-The following tables show you the number of allocations, threads, and vCUs available on Serverless when adaptive resources are enabled or disabled.
+The following tables show you the number of allocations, threads, and VCUs available on Serverless when adaptive resources are enabled or disabled.
 
 ### Deployments on serverless optimized for ingest
 
 In case of ingest-optimized deployments, we maximize the number of model allocations.
 
-#### Adaptive resources enabled
+### Adaptive Resources Enabled
 
-| Level  | Allocations                                          | Threads | vCUs                                                |
-|--------|------------------------------------------------------|---------|------------------------------------------------------|
-| Low    | 0 to 2 dynamically                                  | 1       | 0 to 2 dynamically                                  |
-| Medium | 1 to 32 dynamically                                 | 1       | 1 to 32 dynamically                                 |
-| High   | 1 to 512 for Search <br /> 1 to 128 for Security and Observability | 1 | 1 to 512 for Search <br /> 1 to 128 for Security and Observability |
+#### Ingest-Optimized Deployments
 
-#### Adaptive resources disabled (Search only)
+| Level  | Allocations                                          | Threads | VCUs                                                |
+|--------|------------------------------------------------------|---------|-----------------------------------------------------|
+| Low    | 0 to 2 dynamically                                   | 1       | 0 to 16 dynamically                                 |
+| Medium | 1 to 32 dynamically                                  | 1       | 8 to 256 dynamically                                |
+| High   | - 1 to 512 for Search <br /> - 1 to 128 for Security and Observability | 1 | - 8 to 4096 for Search <br /> - 8 to 1024 for Security and Observability |
 
-| Level  | Allocations                                          | Threads | vCUs                                                |
-|--------|------------------------------------------------------|---------|------------------------------------------------------|
-| Low    | Exactly 2                                            | 1       | 2                                                    |
-| Medium | Exactly 32                                           | 1       | 32                                                   |
-| High   | 512 for Search <br /> No static allocations for Security and Observability | 1 | 512 for Search <br /> No static allocations for Security and Observability |
+#### Adaptive Resources Disabled (Search Only)
 
-### Deployments on serverless optimized for search
+| Level  | Allocations                                          | Threads | VCUs                                                |
+|--------|------------------------------------------------------|---------|-----------------------------------------------------|
+| Low    | Exactly 2                                            | 1       | 16                                                  |
+| Medium | Exactly 32                                           | 1       | 256                                                 |
+| High   | - 512 for Search <br /> - No static allocations for Security and Observability | 1 | - 4096 for Search <br /> - No static allocations for Security and Observability |
 
-In case of search-optimized deployments, we maximize the number of threads.
+### Deployments on Serverless Optimized for Search
 
-#### Adaptive resources enabled
+#### Adaptive Resources Enabled
 
-| Level  | Allocations                                          | Threads | vCUs                                                |
-|--------|------------------------------------------------------|---------|------------------------------------------------------|
-| Low    | 0 to 1 dynamically                                  | Always 2       | 0 to 2 dynamically                                  |
-| Medium | 1 to 2 (if threads=16), dinamically                                | Maximum (for example, 16)       | 1 to 32 dynamically                                 |
-| High   | 1 to 32 (if threads=16), dinamically | Maximum (for example, 16) | 1 to 512 in Search <br /> 1 to 128 for Security and Observability |
+| Level  | Allocations                                          | Threads | VCUs                                                |
+|--------|------------------------------------------------------|---------|-----------------------------------------------------|
+| Low    | 0 to 1 dynamically                                   | Always 2       | 0 to 16 dynamically                                 |
+| Medium | 1 to 2 (if threads=16), dynamically                  | Maximum (e.g., 16)       | 8 to 256 dynamically                               |
+| High   | - 1 to 32 (if threads=16), dynamically <br /> - 1 to 128 for Security and Observability | Maximum (e.g., 16) | - 8 to 4096 for Search <br /> - 8 to 1024 for Security and Observability |
 
-#### Adaptive resources disabled
+#### Adaptive Resources Disabled
 
-| Level  | Allocations                                             | Threads                | vCUs                                                |
-|--------|---------------------------------------------------------|------------------------|------------------------------------------------------|
-| Low    | 1 statically                                             | Always 2               | 2                                                     |
-| Medium | 2 statically (if threads=16)                             | Maximum (for example, 16)         | 32                                                    |
-| High   | 32 statically (if threads=16) for Search <br /> No static allocations for Security and Observability | Maximum (for example, 16) | 512 for Search <br /> No static allocations for Security and Observability |
+| Level  | Allocations                                          | Threads | VCUs                                                |
+|--------|------------------------------------------------------|---------|-----------------------------------------------------|
+| Low    | 1 statically                                         | Always 2               | 16                                                  |
+| Medium | 2 statically (if threads=16)                         | Maximum (e.g., 16)     | 256                                                 |
+| High   | - 32 statically (if threads=16) for Search <br /> - No static allocations for Security and Observability | Maximum (e.g., 16) | - 4096 for Search <br /> - No static allocations for Security and Observability |