Skip to content

Commit 47acfae

Browse files
kosabogiszabostevealaudazzi
authored
Adds information about the importance of adaptive allocations (elastic#1454)
### [📸 Preveiw](https://docs-v3-preview.elastic.dev/elastic/docs-content/pull/1454/explore-analyze/elastic-inference/inference-api) ### Description This PR updates the Inference integration documentation to: - Clearly state that not enabling adaptive allocations can result in unnecessary resource usage and higher costs. - Expand the scope of the page to cover not only third-party service integrations, but also the Elasticsearch service. ### Related issue: elastic#1393 --------- Co-authored-by: István Zoltán Szabó <[email protected]> Co-authored-by: Arianna Laudazzi <[email protected]>
1 parent 20bf46f commit 47acfae

File tree

1 file changed

+24
-8
lines changed

1 file changed

+24
-8
lines changed

explore-analyze/elastic-inference/inference-api.md

Lines changed: 24 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -9,15 +9,16 @@ products:
99
- id: kibana
1010
---
1111

12-
# Integrate with third-party services
12+
# Inference integrations
1313

14-
{{es}} provides a machine learning [inference API](https://www.elastic.co/docs/api/doc/elasticsearch/v8/operation/operation-inference-get-1) to create and manage inference endpoints to integrate with machine learning models provide by popular third-party services like Amazon Bedrock, Anthropic, Azure AI Studio, Cohere, Google AI, Mistral, OpenAI, Hugging Face, and more.
14+
{{es}} provides a machine learning [inference API](https://www.elastic.co/docs/api/doc/elasticsearch/v8/operation/operation-inference-get-1) to create and manage inference endpoints that integrate with services such as Elasticsearch (for built-in NLP models like [ELSER](/explore-analyze/machine-learning/nlp/ml-nlp-elser.md) and [E5](/explore-analyze/machine-learning/nlp/ml-nlp-e5.md)), as well as popular third-party services like Amazon Bedrock, Anthropic, Azure AI Studio, Cohere, Google AI, Mistral, OpenAI, Hugging Face, and more.
1515

16-
Learn how to integrate with specific services in the subpages of this section.
16+
You can create a new inference endpoint:
1717

18-
## Inference endpoints UI [inference-endpoints]
18+
- using the [Create an inference endpoint API](https://www.elastic.co/docs/api/doc/elasticsearch/v8/operation/operation-inference-put-1)
19+
- through the [Inference endpoints UI](#add-inference-endpoints).
1920

20-
You can also manage inference endpoints using the UI.
21+
## Inference endpoints UI [inference-endpoints]
2122

2223
The **Inference endpoints** page provides an interface for managing inference endpoints.
2324

@@ -33,7 +34,7 @@ Available actions:
3334
* Copy the inference endpoint ID
3435
* Delete endpoints
3536

36-
## Add new inference endpoint
37+
## Add new inference endpoint [add-inference-endpoints]
3738

3839
To add a new interference endpoint using the UI:
3940

@@ -42,18 +43,33 @@ To add a new interference endpoint using the UI:
4243
1. Provide the required configuration details.
4344
1. Select **Save** to create the endpoint.
4445

46+
If your inference endpoint uses a model deployed in Elastic’s infrastructure, such as ELSER, E5, or a model uploaded through Eland, you can configure [adaptive allocations](#adaptive-allocations) to dynamically adjust resource usage based on the current demand.
47+
4548
## Adaptive allocations [adaptive-allocations]
4649

4750
Adaptive allocations allow inference services to dynamically adjust the number of model allocations based on the current load.
51+
This feature is only supported for models deployed in Elastic’s infrastructure, such as ELSER, E5, or models uploaded through Eland. It is not available for third-party services (for example, Alibaba Cloud, Cohere, or OpenAI), because those models are hosted externally and not deployed within your Elasticsearch cluster.
4852

4953
When adaptive allocations are enabled:
5054

5155
* The number of allocations scales up automatically when the load increases.
5256
* Allocations scale down to a minimum of 0 when the load decreases, saving resources.
5357

54-
For more information about adaptive allocations and resources, refer to the trained model autoscaling documentation.
58+
### Allocation scaling behavior
59+
60+
The behavior of allocations depends on several factors:
61+
62+
- Deployment type (Elastic Cloud Hosted, Elastic Cloud Enterprise, or Serverless)
63+
- Usage level (low, medium, or high)
64+
- Optimization type ([ingest](/deploy-manage/autoscaling/trained-model-autoscaling.md#ingest-optimized) or [search](/deploy-manage/autoscaling/trained-model-autoscaling.md#search-optimized))
65+
66+
::::{important}
67+
If you enable adaptive allocations and set the `min_number_of_allocations` to a value greater than `0`, you will be charged for the machine learning resources, even if no inference requests are sent.
68+
69+
However, setting the `min_number_of_allocations` to a value greater than `0` keeps the model always available without scaling delays. Choose the configuration that best fits your workload and availability needs.
70+
::::
5571

56-
% TO DO: Add a link to trained model autoscaling when the page is available.%
72+
For more information about adaptive allocations and resources, refer to the [trained model autoscaling](/deploy-manage/autoscaling/trained-model-autoscaling.md) documentation.
5773

5874
## Default {{infer}} endpoints [default-enpoints]
5975

0 commit comments

Comments
 (0)