You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: explore-analyze/elastic-inference/inference-api.md
+24-8Lines changed: 24 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,15 +9,16 @@ products:
9
9
- id: kibana
10
10
---
11
11
12
-
# Integrate with third-party services
12
+
# Inference integrations
13
13
14
-
{{es}} provides a machine learning [inference API](https://www.elastic.co/docs/api/doc/elasticsearch/v8/operation/operation-inference-get-1) to create and manage inference endpoints to integrate with machinelearning models provide by popular third-party services like Amazon Bedrock, Anthropic, Azure AI Studio, Cohere, Google AI, Mistral, OpenAI, Hugging Face, and more.
14
+
{{es}} provides a machine learning [inference API](https://www.elastic.co/docs/api/doc/elasticsearch/v8/operation/operation-inference-get-1) to create and manage inference endpoints that integrate with services such as Elasticsearch (for built-in NLP models like [ELSER](/explore-analyze/machine-learning/nlp/ml-nlp-elser.md) and [E5](/explore-analyze/machine-learning/nlp/ml-nlp-e5.md)), as well as popular third-party services like Amazon Bedrock, Anthropic, Azure AI Studio, Cohere, Google AI, Mistral, OpenAI, Hugging Face, and more.
15
15
16
-
Learn how to integrate with specific services in the subpages of this section.
16
+
You can create a new inference endpoint:
17
17
18
-
## Inference endpoints UI [inference-endpoints]
18
+
- using the [Create an inference endpoint API](https://www.elastic.co/docs/api/doc/elasticsearch/v8/operation/operation-inference-put-1)
19
+
- through the [Inference endpoints UI](#add-inference-endpoints).
19
20
20
-
You can also manage inferenceendpoints using the UI.
21
+
## Inference endpoints UI [inference-endpoints]
21
22
22
23
The **Inference endpoints** page provides an interface for managing inference endpoints.
23
24
@@ -33,7 +34,7 @@ Available actions:
33
34
* Copy the inference endpoint ID
34
35
* Delete endpoints
35
36
36
-
## Add new inference endpoint
37
+
## Add new inference endpoint[add-inference-endpoints]
37
38
38
39
To add a new interference endpoint using the UI:
39
40
@@ -42,18 +43,33 @@ To add a new interference endpoint using the UI:
42
43
1. Provide the required configuration details.
43
44
1. Select **Save** to create the endpoint.
44
45
46
+
If your inference endpoint uses a model deployed in Elastic’s infrastructure, such as ELSER, E5, or a model uploaded through Eland, you can configure [adaptive allocations](#adaptive-allocations) to dynamically adjust resource usage based on the current demand.
47
+
45
48
## Adaptive allocations [adaptive-allocations]
46
49
47
50
Adaptive allocations allow inference services to dynamically adjust the number of model allocations based on the current load.
51
+
This feature is only supported for models deployed in Elastic’s infrastructure, such as ELSER, E5, or models uploaded through Eland. It is not available for third-party services (for example, Alibaba Cloud, Cohere, or OpenAI), because those models are hosted externally and not deployed within your Elasticsearch cluster.
48
52
49
53
When adaptive allocations are enabled:
50
54
51
55
* The number of allocations scales up automatically when the load increases.
52
56
* Allocations scale down to a minimum of 0 when the load decreases, saving resources.
53
57
54
-
For more information about adaptive allocations and resources, refer to the trained model autoscaling documentation.
58
+
### Allocation scaling behavior
59
+
60
+
The behavior of allocations depends on several factors:
61
+
62
+
- Deployment type (Elastic Cloud Hosted, Elastic Cloud Enterprise, or Serverless)
63
+
- Usage level (low, medium, or high)
64
+
- Optimization type ([ingest](/deploy-manage/autoscaling/trained-model-autoscaling.md#ingest-optimized) or [search](/deploy-manage/autoscaling/trained-model-autoscaling.md#search-optimized))
65
+
66
+
::::{important}
67
+
If you enable adaptive allocations and set the `min_number_of_allocations` to a value greater than `0`, you will be charged for the machine learning resources, even if no inference requests are sent.
68
+
69
+
However, setting the `min_number_of_allocations` to a value greater than `0` keeps the model always available without scaling delays. Choose the configuration that best fits your workload and availability needs.
70
+
::::
55
71
56
-
% TO DO: Add a link to trained model autoscaling when the page is available.%
72
+
For more information about adaptive allocations and resources, refer to the [trained model autoscaling](/deploy-manage/autoscaling/trained-model-autoscaling.md) documentation.
0 commit comments