Skip to content

Commit ba65e22

Browse files
trained model autoscaling restructure
1 parent 905ad9e commit ba65e22

File tree

1 file changed

+73
-58
lines changed

1 file changed

+73
-58
lines changed

deploy-manage/autoscaling/trained-model-autoscaling.md

Lines changed: 73 additions & 58 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ There are two ways to enable autoscaling:
2020
To fully leverage model autoscaling in {{ech}}, {{ece}}, and {{eck}}, it is highly recommended to enable [{{es}} deployment autoscaling](../../deploy-manage/autoscaling.md).
2121
::::
2222

23-
Trained model autoscaling is available for both serverless and Cloud deployments. In serverless deployments, processing power is managed differently across Search, Observability, and Security projects, which impacts their costs and resource limits.
23+
Trained model autoscaling is available for both {{serverless-short}} and Cloud deployments. In serverless deployments, processing power is managed differently across Search, Observability, and Security projects, which impacts their costs and resource limits.
2424

2525
Security and Observability projects are only charged for data ingestion and retention. They are not charged for processing power (VCU usage), which is used for more complex operations, like running advanced search models. For example, in Search projects, models such as ELSER require significant processing power to provide more accurate search results.
2626

@@ -43,7 +43,7 @@ You can enable adaptive allocations by using:
4343
If the new allocations fit on the current {{ml}} nodes, they are immediately started. If more resource capacity is needed for creating new model allocations, then your {{ml}} node will be scaled up if {{ml}} autoscaling is enabled to provide enough resources for the new allocation. The number of model allocations can be scaled down to 0. They cannot be scaled up to more than 32 allocations, unless you explicitly set the maximum number of allocations to more. Adaptive allocations must be set up independently for each deployment and [{{infer}} endpoint](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-inference-put).
4444

4545
:::{note}
46-
When you create inference endpoints on Serverless using Kibana, adaptive allocations are automatically turned on, and there is no option to disable them.
46+
When you create inference endpoints on {{serverless-short}} using Kibana, adaptive allocations are automatically turned on, and there is no option to disable them.
4747
:::
4848

4949
### Optimizing for typical use cases [optimizing-for-typical-use-cases]
@@ -68,31 +68,31 @@ Refer to the tables in the [Model deployment resource matrix](#model-deployment-
6868

6969
Search projects are given access to more processing resources, while Security and Observability projects have lower limits. This difference is reflected in the UI configuration: Search projects have higher resource limits compared to Security and Observability projects to accommodate their more complex operations.
7070

71-
On Serverless, adaptive allocations are automatically enabled for all project types. However, the "Adaptive resources" control is not displayed in Kibana for Observability and Security projects.
71+
On {{serverless-short}}, adaptive allocations are automatically enabled for all project types. However, the "Adaptive resources" control is not displayed in Kibana for Observability and Security projects.
7272

7373
## Model deployment resource matrix [model-deployment-resource-matrix]
7474

7575
The used resources for trained model deployments depend on three factors:
7676

77-
* your cluster environment (Serverless, Cloud, or on-premises)
77+
* your cluster environment ({{serverless-short}}, Cloud, or on-premises)
7878
* the use case you optimize the model deployment for (ingest or search)
7979
* whether model autoscaling is enabled with adaptive allocations/resources to have dynamic resources, or disabled for static resources
8080

8181
If you use {{es}} on-premises, vCPUs level ranges are derived from the `total_ml_processors` and `max_single_ml_node_processors` values. Use the [get {{ml}} info API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-ml-info) to check these values. The following tables show you the number of allocations, threads, and vCPUs available in Cloud when adaptive resources are enabled or disabled.
8282

8383
::::{note}
84-
On Serverless, adaptive allocations are automatically enabled for all project types. However, the "Adaptive resources" control is not displayed in {{kib}} for Observability and Security projects.
84+
On {{serverless-short}}, adaptive allocations are automatically enabled for all project types. However, the "Adaptive resources" control is not displayed in {{kib}} for Observability and Security projects.
8585
::::
8686

87-
### Deployments in Cloud optimized for ingest [_deployments_in_cloud_optimized_for_ingest]
88-
```{applies_to}
89-
deployment:
90-
ech: all
91-
```
87+
### Ingest optimized
9288

9389
In case of ingest-optimized deployments, we maximize the number of model allocations.
9490

95-
#### Adaptive resources enabled [_adaptive_resources_enabled]
91+
#### Adaptive resources enabled
92+
93+
::::{tab-set}
94+
95+
:::{tab-item} Cloud
9696

9797
| Level | Allocations | Threads | vCPUs |
9898
| --- | --- | --- | --- |
@@ -102,89 +102,104 @@ In case of ingest-optimized deployments, we maximize the number of model allocat
102102

103103
\* The Cloud console doesn’t directly set an allocations limit; it only sets a vCPU limit. This vCPU limit indirectly determines the number of allocations, calculated as the vCPU limit divided by the number of threads.
104104

105-
#### Adaptive resources disabled [_adaptive_resources_disabled]
105+
:::
106106

107-
| Level | Allocations | Threads | vCPUs |
107+
:::{tab-item} {{serverless-short}}
108+
109+
| Level | Allocations | Threads | VCUs |
108110
| --- | --- | --- | --- |
109-
| Low | 2 if available, otherwise 1, statically | 1 | 2 if available |
110-
| Medium | the smaller of 32 or the limit set in the Cloud console, statically | 1 | 32 if available |
111-
| High | Maximum available set in the Cloud console *, statically | 1 | Maximum available set in the Cloud console, statically |
111+
| Low | 0 to 2 dynamically | 1 | 0 to 16 dynamically |
112+
| Medium | 1 to 32 dynamically | 1 | 8 to 256 dynamically |
113+
| High | 1 to 512 for Search<br> 1 to 128 for Security and Observability<br> | 1 | 8 to 4096 for Search<br> 8 to 1024 for Security and Observability<br> |
112114

113-
\* The Cloud console doesn’t directly set an allocations limit; it only sets a vCPU limit. This vCPU limit indirectly determines the number of allocations, calculated as the vCPU limit divided by the number of threads.
115+
:::
114116

115-
### Deployments in Cloud optimized for search [_deployments_in_cloud_optimized_for_search]
116-
```{applies_to}
117-
deployment:
118-
ech: all
119-
```
117+
::::
120118

121-
In case of search-optimized deployments, we maximize the number of threads. The maximum number of threads that can be claimed depends on the hardware your architecture has.
119+
#### Adaptive resources disabled
120+
121+
::::{tab-set}
122122

123-
#### Adaptive resources enabled [_adaptive_resources_enabled_2]
123+
:::{tab-item} Cloud
124124

125125
| Level | Allocations | Threads | vCPUs |
126126
| --- | --- | --- | --- |
127-
| Low | 1 | 2 | 2 |
128-
| Medium | 1 to 2 (if threads=16) dynamically | maximum that the hardware allows (for example, 16) | 1 to 32 dynamically |
129-
| High | 1 to limit set in the Cloud console *, dynamically | maximum that the hardware allows (for example, 16) | 1 to limit set in the Cloud console, dynamically |
127+
| Low | 2 if available, otherwise 1, statically | 1 | 2 if available |
128+
| Medium | the smaller of 32 or the limit set in the Cloud console, statically | 1 | 32 if available |
129+
| High | Maximum available set in the Cloud console *, statically | 1 | Maximum available set in the Cloud console, statically |
130130

131131
\* The Cloud console doesn’t directly set an allocations limit; it only sets a vCPU limit. This vCPU limit indirectly determines the number of allocations, calculated as the vCPU limit divided by the number of threads.
132132

133-
#### Adaptive resources disabled [_adaptive_resources_disabled_2]
133+
:::
134134

135-
| Level | Allocations | Threads | vCPUs |
136-
| --- | --- | --- | --- |
137-
| Low | 1 if available, statically | 2 | 2 if available |
138-
| Medium | 2 (if threads=16) statically | maximum that the hardware allows (for example, 16) | 32 if available |
139-
| High | Maximum available set in the Cloud console *, statically | maximum that the hardware allows (for example, 16) | Maximum available set in the Cloud console, statically |
135+
:::{tab-item} {{serverless-short}}
140136

141-
\* The Cloud console doesn’t directly set an allocations limit; it only sets a vCPU limit. This vCPU limit indirectly determines the number of allocations, calculated as the vCPU limit divided by the number of threads.
137+
| Level | Allocations | Threads | VCUs |
138+
| --- | --- | --- | --- |
139+
| Low | Exactly 2 | 1 | 16 |
140+
| Medium | Exactly 32 | 1 | 256 |
141+
| High | 512 for Search<br> No static allocations for Security and Observability<br> | 1 | 4096 for Search<br> No static allocations for Security and Observability<br> |
142142

143-
### Deployments on serverless optimized for ingest [deployments-on-serverless-optimized-for-ingest]
144-
```{applies_to}
145-
serverless: all
146-
```
143+
:::
147144

148-
In case of ingest-optimized deployments, we maximize the number of model allocations.
145+
::::
149146

147+
### Search optimized
150148

151-
#### Adaptive resources enabled [adaptive-resources-enabled]
149+
In case of search-optimized deployments, we maximize the number of threads. The maximum number of threads that can be claimed depends on the hardware your architecture has.
152150

153-
| Level | Allocations | Threads | VCUs |
154-
| --- | --- | --- | --- |
155-
| Low | 0 to 2 dynamically | 1 | 0 to 16 dynamically |
156-
| Medium | 1 to 32 dynamically | 1 | 8 to 256 dynamically |
157-
| High | 1 to 512 for Search<br> 1 to 128 for Security and Observability<br> | 1 | 8 to 4096 for Search<br> 8 to 1024 for Security and Observability<br> |
151+
#### Adaptive resources enabled
158152

153+
::::{tab-set}
159154

160-
#### Adaptive resources disabled (Search only) [adaptive-resources-disabled-search-only]
155+
:::{tab-item} Cloud
161156

162-
| Level | Allocations | Threads | VCUs |
157+
| Level | Allocations | Threads | vCPUs |
163158
| --- | --- | --- | --- |
164-
| Low | Exactly 2 | 1 | 16 |
165-
| Medium | Exactly 32 | 1 | 256 |
166-
| High | 512 for Search<br> No static allocations for Security and Observability<br> | 1 | 4096 for Search<br> No static allocations for Security and Observability<br> |
167-
159+
| Low | 1 | 2 | 2 |
160+
| Medium | 1 to 2 (if threads=16) dynamically | maximum that the hardware allows (for example, 16) | 1 to 32 dynamically |
161+
| High | 1 to limit set in the Cloud console *, dynamically | maximum that the hardware allows (for example, 16) | 1 to limit set in the Cloud console, dynamically |
168162

169-
### Deployments on serverless optimized for Search [deployments-on-serverless-optimized-for-search]
170-
```{applies_to}
171-
serverless: all
172-
```
163+
\* The Cloud console doesn’t directly set an allocations limit; it only sets a vCPU limit. This vCPU limit indirectly determines the number of allocations, calculated as the vCPU limit divided by the number of threads.
173164

165+
:::
174166

175-
#### Adaptive resources enabled [adaptive-resources-enabled-for-search]
167+
:::{tab-item} {{serverless-short}}
176168

177169
| Level | Allocations | Threads | VCUs |
178170
| --- | --- | --- | --- |
179171
| Low | 0 to 1 dynamically | Always 2 | 0 to 16 dynamically |
180172
| Medium | 1 to 2 (if threads=16), dynamically | Maximum (for example, 16) | 8 to 256 dynamically |
181173
| High | 1 to 32 (if threads=16), dynamically<br> 1 to 128 for Security and Observability<br> | Maximum (for example, 16) | 8 to 4096 for Search<br> 8 to 1024 for Security and Observability<br> |
182174

175+
:::
183176

184-
#### Adaptive resources disabled [adaptive-resources-disabled-for-search]
177+
::::
178+
179+
#### Adaptive resources disabled
180+
181+
::::{tab-set}
182+
183+
:::{tab-item} Cloud
184+
185+
| Level | Allocations | Threads | vCPUs |
186+
| --- | --- | --- | --- |
187+
| Low | 1 if available, statically | 2 | 2 if available |
188+
| Medium | 2 (if threads=16) statically | maximum that the hardware allows (for example, 16) | 32 if available |
189+
| High | Maximum available set in the Cloud console *, statically | maximum that the hardware allows (for example, 16) | Maximum available set in the Cloud console, statically |
190+
191+
\* The Cloud console doesn’t directly set an allocations limit; it only sets a vCPU limit. This vCPU limit indirectly determines the number of allocations, calculated as the vCPU limit divided by the number of threads.
192+
193+
:::
194+
195+
:::{tab-item} {{serverless-short}}
185196

186197
| Level | Allocations | Threads | VCUs |
187198
| --- | --- | --- | --- |
188199
| Low | 1 statically | Always 2 | 16 |
189200
| Medium | 2 statically (if threads=16) | Maximum (for example, 16) | 256 |
190-
| High | 32 statically (if threads=16) for Search<br> No static allocations for Security and Observability<br> | Maximum (for example, 16) | 4096 for Search<br> No static allocations for Security and Observability<br> |
201+
| High | 32 statically (if threads=16) for Search<br> No static allocations for Security and Observability<br> | Maximum (for example, 16) | 4096 for Search<br> No static allocations for Security and Observability<br> |
202+
203+
:::
204+
205+
::::

0 commit comments

Comments
 (0)