Skip to content

Commit 9fdee40

Browse files
committed
Updates document based on feedback
1 parent 031c91f commit 9fdee40

File tree

1 file changed

+33
-33
lines changed

1 file changed

+33
-33
lines changed

serverless/pages/ml-nlp-auto-scale.mdx

Lines changed: 33 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -13,20 +13,20 @@ There are two ways to enable autoscaling:
1313
- in Kibana by enabling adaptive resources
1414

1515

16-
Trained model autoscaling is available for Search, Observability, and Security projects on serverless deployments. However, these projects handle processing power differently, which impacts their costs and resource limits.
16+
Trained model autoscaling is available for both serverless and Cloud deployments. In serverless deployments, processing power is managed differently across Search, Observability, and Security projects, which impacts their costs and resource limits.
1717

18-
Security and Observability projects are only charged for data ingestion and retention. They are not charged for processing power (vCU usage), which is used for more complex operations, like running advanced search models. For example, in Search projects, models such as ELSER require significant processing power to provide more accurate search results.
18+
Security and Observability projects are only charged for data ingestion and retention. They are not charged for processing power (VCU usage), which is used for more complex operations, like running advanced search models. For example, in Search projects, models such as ELSER require significant processing power to provide more accurate search results.
1919

2020
## Enabling autoscaling through APIs - adaptive allocations
2121

2222
Model allocations are independent units of work for NLP tasks.
23-
If you set the numbers of threads and allocations for a model manually, they remain constant even when not all the available resources are fully used or when the load on the model requires more resources.
23+
If you set a static number of allocations, they remain constant even when not all the available resources are fully used or when the load on the model requires more resources.
2424
Instead of setting the number of allocations manually, you can enable adaptive allocations to set the number of allocations based on the load on the process.
2525
This can help you to manage performance and cost more easily.
2626
(Refer to the [pricing calculator](https://cloud.elastic.co/pricing) to learn more about the possible costs.)
2727

2828
When adaptive allocations are enabled, the number of allocations of the model is set automatically based on the current load.
29-
When the load is high, a new model allocation is automatically created.
29+
When the load is high, additional model allocations are automatically created as needed.
3030
When the load is low, a model allocation is automatically removed.
3131
You can explicitely set the minimum and maximum number of allocations; autoscaling will occur within these limits.
3232

@@ -63,8 +63,8 @@ You can enable adaptive resources for your models when starting or updating the
6363
Adaptive resources make it possible for Elasticsearch to scale up or down the available resources based on the load on the process.
6464
This can help you to manage performance and cost more easily.
6565
When adaptive resources are enabled, the number of VCUs that the model deployment uses is set automatically based on the current load.
66-
When the load is high, the number of vCUs that the process can use is automatically increased.
67-
When the load is low, the number of vCUs that the process can use is automatically decreased.
66+
When the load is high, the number of VCUs that the process can use is automatically increased.
67+
When the load is low, the number of VCUs that the process can use is automatically decreased.
6868

6969
You can choose from three levels of resource usage for your trained model deployment; autoscaling will occur within the selected level's range.
7070

@@ -85,44 +85,44 @@ The used resources for trained model deployments depend on three factors:
8585
- the use case you optimize the model deployment for (ingest or search)
8686
- whether model autoscaling is enabled with adaptive allocations/resources to have dynamic resources, or disabled for static resources
8787

88-
The following tables show you the number of allocations, threads, and vCUs available on Serverless when adaptive resources are enabled or disabled.
88+
The following tables show you the number of allocations, threads, and VCUs available on Serverless when adaptive resources are enabled or disabled.
8989

9090
### Deployments on serverless optimized for ingest
9191

9292
In case of ingest-optimized deployments, we maximize the number of model allocations.
9393

94-
#### Adaptive resources enabled
94+
### Adaptive Resources Enabled
9595

96-
| Level | Allocations | Threads | vCUs |
97-
|--------|------------------------------------------------------|---------|------------------------------------------------------|
98-
| Low | 0 to 2 dynamically | 1 | 0 to 2 dynamically |
99-
| Medium | 1 to 32 dynamically | 1 | 1 to 32 dynamically |
100-
| High | 1 to 512 for Search <br /> 1 to 128 for Security and Observability | 1 | 1 to 512 for Search <br /> 1 to 128 for Security and Observability |
96+
#### Ingest-Optimized Deployments
10197

102-
#### Adaptive resources disabled (Search only)
98+
| Level | Allocations | Threads | VCUs |
99+
|--------|------------------------------------------------------|---------|-----------------------------------------------------|
100+
| Low | 0 to 2 dynamically | 1 | 0 to 16 dynamically |
101+
| Medium | 1 to 32 dynamically | 1 | 8 to 256 dynamically |
102+
| High | - 1 to 512 for Search <br /> - 1 to 128 for Security and Observability | 1 | - 8 to 4096 for Search <br /> - 8 to 1024 for Security and Observability |
103103

104-
| Level | Allocations | Threads | vCUs |
105-
|--------|------------------------------------------------------|---------|------------------------------------------------------|
106-
| Low | Exactly 2 | 1 | 2 |
107-
| Medium | Exactly 32 | 1 | 32 |
108-
| High | 512 for Search <br /> No static allocations for Security and Observability | 1 | 512 for Search <br /> No static allocations for Security and Observability |
104+
#### Adaptive Resources Disabled (Search Only)
109105

110-
### Deployments on serverless optimized for search
106+
| Level | Allocations | Threads | VCUs |
107+
|--------|------------------------------------------------------|---------|-----------------------------------------------------|
108+
| Low | Exactly 2 | 1 | 16 |
109+
| Medium | Exactly 32 | 1 | 256 |
110+
| High | - 512 for Search <br /> - No static allocations for Security and Observability | 1 | - 4096 for Search <br /> - No static allocations for Security and Observability |
111111

112-
In case of search-optimized deployments, we maximize the number of threads.
112+
### Deployments on Serverless Optimized for Search
113113

114-
#### Adaptive resources enabled
114+
#### Adaptive Resources Enabled
115115

116-
| Level | Allocations | Threads | vCUs |
117-
|--------|------------------------------------------------------|---------|------------------------------------------------------|
118-
| Low | 0 to 1 dynamically | Always 2 | 0 to 2 dynamically |
119-
| Medium | 1 to 2 (if threads=16), dinamically | Maximum (for example, 16) | 1 to 32 dynamically |
120-
| High | 1 to 32 (if threads=16), dinamically | Maximum (for example, 16) | 1 to 512 in Search <br /> 1 to 128 for Security and Observability |
116+
| Level | Allocations | Threads | VCUs |
117+
|--------|------------------------------------------------------|---------|-----------------------------------------------------|
118+
| Low | 0 to 1 dynamically | Always 2 | 0 to 16 dynamically |
119+
| Medium | 1 to 2 (if threads=16), dynamically | Maximum (e.g., 16) | 8 to 256 dynamically |
120+
| High | - 1 to 32 (if threads=16), dynamically <br /> - 1 to 128 for Security and Observability | Maximum (e.g., 16) | - 8 to 4096 for Search <br /> - 8 to 1024 for Security and Observability |
121121

122-
#### Adaptive resources disabled
122+
#### Adaptive Resources Disabled
123123

124-
| Level | Allocations | Threads | vCUs |
125-
|--------|---------------------------------------------------------|------------------------|------------------------------------------------------|
126-
| Low | 1 statically | Always 2 | 2 |
127-
| Medium | 2 statically (if threads=16) | Maximum (for example, 16) | 32 |
128-
| High | 32 statically (if threads=16) for Search <br /> No static allocations for Security and Observability | Maximum (for example, 16) | 512 for Search <br /> No static allocations for Security and Observability |
124+
| Level | Allocations | Threads | VCUs |
125+
|--------|------------------------------------------------------|---------|-----------------------------------------------------|
126+
| Low | 1 statically | Always 2 | 16 |
127+
| Medium | 2 statically (if threads=16) | Maximum (e.g., 16) | 256 |
128+
| High | - 32 statically (if threads=16) for Search <br /> - No static allocations for Security and Observability | Maximum (e.g., 16) | - 4096 for Search <br /> - No static allocations for Security and Observability |

0 commit comments

Comments
 (0)