Skip to content

Commit 5a068eb

Browse files
szabostevemaxjakob
andauthored
Adds ELSER on EIS conceptual docs (#2226)
## Overview This PR: * adds initial docs about the Elastic Inference Service (EIS) * adds docs about ELSER on EIS * moves default inference endpoint list to the top of the Inference integrations page * lists `.elser-2-elastic` under default inference endpoints * links the EIS docs page on the Elastic Inference landing page * changes `inference` to `{{infer}}` --------- Co-authored-by: Max Jakob <[email protected]>
1 parent b744895 commit 5a068eb

File tree

5 files changed

+120
-30
lines changed

5 files changed

+120
-30
lines changed

explore-analyze/elastic-inference.md

Lines changed: 14 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,18 @@ navigation_title: Elastic Inference
77

88
# Elastic {{infer-cap}}
99

10-
There are several ways to perform {{infer}} in the {{stack}}. This page provides a brief overview of the different methods:
10+
## Overview
1111

12-
* [Using the {{infer}} API](elastic-inference/inference-api.md)
13-
* [Trained models deployed in your cluster](machine-learning/nlp/ml-nlp-overview.md)
12+
{{infer-cap}} is a process of using a {{ml}} trained model to make predictions or operations - such as text embedding, or reranking - on your data.
13+
You can use {{infer}} during ingest time (for example, to create embeddings from textual data you ingest) or search time (to perform [semantic search](/solutions/search/semantic-search.md) based on the embeddings created previously).
14+
There are several ways to perform {{infer}} in the {{stack}}, depending on the underlying {{infer}} infrastructure and the interface you use:
15+
16+
- **{{infer-cap}} infrastructure:**
17+
18+
- [Elastic {{infer-cap}} Service](elastic-inference/eis.md): a managed service that runs {infer} outside your cluster resources.
19+
- [Trained models deployed in your cluster](machine-learning/nlp/ml-nlp-overview.md): models run on your own {{ml}} nodes
20+
21+
- **Access methods:**
22+
23+
- [The `semantic_text` workflow](/solutions/search/semantic-search/semantic-search-semantic-text.md): a simplified method that uses the {{infer}} API behind the scenes to enable semantic search.
24+
- [The {{infer}} API](elastic-inference/inference-api.md): a general-purpose API that enables you to run {{infer}} using EIS, your own models, or third-party services.
Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
---
2+
navigation_title: Elastic Inference Service (EIS)
3+
applies_to:
4+
stack: ga 9.0
5+
serverless: ga
6+
---
7+
8+
# Elastic {{infer-cap}} Service [elastic-inference-service-eis]
9+
10+
The Elastic {{infer-cap}} Service (EIS) enables you to leverage AI-powered search as a service without deploying a model in your cluster.
11+
With EIS, you don't need to manage the infrastructure and resources required for {{ml}} {{infer}} by adding, configuring, and scaling {{ml}} nodes.
12+
Instead, you can use {{ml}} models for ingest, search, and chat independently of your {{es}} infrastructure.
13+
14+
## AI features powered by EIS [ai-features-powered-by-eis]
15+
16+
* Your Elastic deployment or project comes with a default [`Elastic Managed LLM` connector](https://www.elastic.co/docs/reference/kibana/connectors-kibana/elastic-managed-llm). This connector is used in the AI Assistant, Attack Discovery, Automatic Import and Search Playground.
17+
18+
* You can use [ELSER](/explore-analyze/machine-learning/nlp/ml-nlp-elser.md) to perform semantic search as a service (ELSER on EIS). {applies_to}`stack: preview 9.1` {applies_to}`serverless: preview`
19+
20+
## Region and hosting [eis-regions]
21+
22+
Requests through the `Elastic Managed LLM` are currently proxying to AWS Bedrock in AWS US regions, beginning with `us-east-1`.
23+
The request routing does not restrict the location of your deployments.
24+
25+
ELSER requests are managed by Elastic's own EIS infrastructure.
26+
27+
## ELSER via Elastic {{infer-cap}} Service (ELSER on EIS) [elser-on-eis]
28+
29+
```{applies_to}
30+
stack: preview 9.1
31+
serverless: preview
32+
```
33+
34+
ELSER on EIS enables you to use the ELSER model without using ML nodes in your infrastructure and with that, it simplifies the semantic search and hybrid search experience.
35+
36+
### Private preview access
37+
38+
Private preview access is available by submitting the form provided [here](https://docs.google.com/forms/d/e/1FAIpQLSfp2rLsayhw6pLVQYYp4KM6BFtaaljplWdYowJfflpOICgViA/viewform).
39+
40+
### Limitations
41+
42+
While we do encourage experimentation, we do not recommend implementing production use cases on top of this feature while it is in Technical Preview.
43+
44+
#### Access
45+
46+
This feature is being gradually rolled out to Serverless and Cloud Hosted customers.
47+
It may not be available to all users at launch.
48+
49+
#### Uptime
50+
51+
There are no uptime guarantees during the Technical Preview.
52+
While Elastic will address issues promptly, the feature may be unavailable for extended periods.
53+
54+
#### Throughput and latency
55+
56+
{{infer-cap}} throughput via this endpoint is expected to exceed that of {{infer}} operations on an ML node.
57+
However, throughput and latency are not guaranteed.
58+
Performance may vary during the Technical Preview.
59+
60+
#### Batch size
61+
62+
Batches are limited to a maximum of 16 documents.
63+
This is particularly relevant when using the [_bulk API](https://www.elastic.co/docs/api/doc/elasticsearch/v9/operation/operation-bulk) for data ingestion.

explore-analyze/elastic-inference/inference-api.md

Lines changed: 38 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -9,18 +9,39 @@ products:
99
- id: kibana
1010
---
1111

12-
# Inference integrations
12+
# {{infer-cap}} integrations
1313

14-
{{es}} provides a machine learning [inference API](https://www.elastic.co/docs/api/doc/elasticsearch/v8/operation/operation-inference-get-1) to create and manage inference endpoints that integrate with services such as Elasticsearch (for built-in NLP models like [ELSER](/explore-analyze/machine-learning/nlp/ml-nlp-elser.md) and [E5](/explore-analyze/machine-learning/nlp/ml-nlp-e5.md)), as well as popular third-party services like Amazon Bedrock, Anthropic, Azure AI Studio, Cohere, Google AI, Mistral, OpenAI, Hugging Face, and more.
14+
{{es}} provides a machine learning [{{infer}} API](https://www.elastic.co/docs/api/doc/elasticsearch/v9/group/endpoint-inference) to create and manage {{infer}} endpoints that integrate with services such as {{es}} (for built-in NLP models like [ELSER](/explore-analyze/machine-learning/nlp/ml-nlp-elser.md) and [E5](/explore-analyze/machine-learning/nlp/ml-nlp-e5.md)), as well as popular third-party services like Amazon Bedrock, Anthropic, Azure AI Studio, Cohere, Google AI, Mistral, OpenAI, Hugging Face, and more.
1515

16-
You can create a new inference endpoint:
16+
You can use the default {{infer}} endpoints your deployment contains or create a new {{infer}} endpoint:
1717

18-
- using the [Create an inference endpoint API](https://www.elastic.co/docs/api/doc/elasticsearch/v8/operation/operation-inference-put-1)
18+
- using the [Create an inference endpoint API](https://www.elastic.co/docs/api/doc/elasticsearch/v9/operation/operation-inference-put)
1919
- through the [Inference endpoints UI](#add-inference-endpoints).
2020

21-
## Inference endpoints UI [inference-endpoints]
21+
## Default {{infer}} endpoints [default-enpoints]
22+
23+
Your {{es}} deployment contains preconfigured {{infer}} endpoints, which makes them easier to use when defining `semantic_text` fields or using {{infer}} processors. These endpoints come in two forms:
24+
25+
- **Elastic Inference Service (EIS) endpoints**, which provide {{infer}} as a managed service and do not consume resources from your own nodes.
26+
27+
- **ML node-based endpoints**, which run on your dedicated {{ml}} nodes.
28+
29+
The following section lists the default {{infer}} endpoints, identified by their `inference_id`, grouped by whether they are EIS- or ML node–based.
30+
31+
### Default endpoints for Elastic {{infer-cap}} Service (EIS)
32+
33+
- `.elser-2-elastic`: uses the [ELSER](/explore-analyze/machine-learning/nlp/ml-nlp-elser.md) trained model as an Elastic {{infer-cap}} Service for `sparse_embedding` tasks (recommended for English language text). The `model_id` is `.elser_model_2`. {applies_to}`stack: preview 9.1` {applies_to}`serverless: preview`
34+
35+
### Default endpoints used on ML-nodes
36+
37+
- `.elser-2-elasticsearch`: uses the [ELSER](/explore-analyze/machine-learning/nlp/ml-nlp-elser.md) built-in trained model for `sparse_embedding` tasks (recommended for English language text). The `model_id` is `.elser_model_2_linux-x86_64`.
38+
- `.multilingual-e5-small-elasticsearch`: uses the [E5](../../explore-analyze/machine-learning/nlp/ml-nlp-e5.md) built-in trained model for `text_embedding` tasks (recommended for non-English language texts). The `model_id` is `.e5_model_2_linux-x86_64`.
2239

23-
The **Inference endpoints** page provides an interface for managing inference endpoints.
40+
Use the `inference_id` of the endpoint in a [`semantic_text`](elasticsearch://reference/elasticsearch/mapping-reference/semantic-text.md) field definition or when creating an [{{infer}} processor](elasticsearch://reference/enrich-processor/inference-processor.md). The API call will automatically download and deploy the model which might take a couple of minutes. Default {{infer}} enpoints have adaptive allocations enabled. For these models, the minimum number of allocations is `0`. If there is no {{infer}} activity that uses the endpoint, the number of allocations will scale down to `0` automatically after 15 minutes.
41+
42+
## {{infer-cap}} endpoints UI [inference-endpoints]
43+
44+
The **{{infer-cap}} endpoints** page provides an interface for managing {{infer}} endpoints.
2445

2546
:::{image} /explore-analyze/images/kibana-inference-endpoints-ui.png
2647
:alt: Inference endpoints UI
@@ -29,31 +50,31 @@ The **Inference endpoints** page provides an interface for managing inference en
2950

3051
Available actions:
3152

32-
* Add new endpoint
33-
* View endpoint details
34-
* Copy the inference endpoint ID
35-
* Delete endpoints
53+
- Add new endpoint
54+
- View endpoint details
55+
- Copy the inference endpoint ID
56+
- Delete endpoints
3657

37-
## Add new inference endpoint [add-inference-endpoints]
58+
## Add new {{infer}} endpoint [add-inference-endpoints]
3859

39-
To add a new interference endpoint using the UI:
60+
To add a new {{infer}} endpoint using the UI:
4061

4162
1. Select the **Add endpoint** button.
4263
1. Select a service from the drop down menu.
4364
1. Provide the required configuration details.
4465
1. Select **Save** to create the endpoint.
4566

46-
If your inference endpoint uses a model deployed in Elastic’s infrastructure, such as ELSER, E5, or a model uploaded through Eland, you can configure [adaptive allocations](#adaptive-allocations) to dynamically adjust resource usage based on the current demand.
67+
If your {{infer}} endpoint uses a model deployed in Elastic’s infrastructure, such as ELSER, E5, or a model uploaded through Eland, you can configure [adaptive allocations](#adaptive-allocations) to dynamically adjust resource usage based on the current demand.
4768

4869
## Adaptive allocations [adaptive-allocations]
4970

50-
Adaptive allocations allow inference services to dynamically adjust the number of model allocations based on the current load.
51-
This feature is only supported for models deployed in Elastic’s infrastructure, such as ELSER, E5, or models uploaded through Eland. It is not available for third-party services (for example, Alibaba Cloud, Cohere, or OpenAI), because those models are hosted externally and not deployed within your Elasticsearch cluster.
71+
Adaptive allocations allow {{infer}} services to dynamically adjust the number of model allocations based on the current load.
72+
This feature is only supported for models deployed in Elastic’s infrastructure, such as ELSER, E5, or models uploaded through Eland. It is not available for models used through the Elastic {{infer-cap}} Service (EIS) and third-party services (for example, Alibaba Cloud, Cohere, or OpenAI), because those models are not deployed within your Elasticsearch cluster.
5273

5374
When adaptive allocations are enabled:
5475

55-
* The number of allocations scales up automatically when the load increases.
56-
* Allocations scale down to a minimum of 0 when the load decreases, saving resources.
76+
- The number of allocations scales up automatically when the load increases.
77+
- Allocations scale down to a minimum of 0 when the load decreases, saving resources.
5778

5879
### Allocation scaling behavior
5980

@@ -71,15 +92,6 @@ However, setting the `min_number_of_allocations` to a value greater than `0` kee
7192

7293
For more information about adaptive allocations and resources, refer to the [trained model autoscaling](/deploy-manage/autoscaling/trained-model-autoscaling.md) documentation.
7394

74-
## Default {{infer}} endpoints [default-enpoints]
75-
76-
Your {{es}} deployment contains preconfigured {{infer}} endpoints which makes them easier to use when defining `semantic_text` fields or using {{infer}} processors. The following list contains the default {{infer}} endpoints listed by `inference_id`:
77-
78-
* `.elser-2-elasticsearch`: uses the [ELSER](../../explore-analyze/machine-learning/nlp/ml-nlp-elser.md) built-in trained model for `sparse_embedding` tasks (recommended for English language tex). The `model_id` is `.elser_model_2_linux-x86_64`.
79-
* `.multilingual-e5-small-elasticsearch`: uses the [E5](../../explore-analyze/machine-learning/nlp/ml-nlp-e5.md) built-in trained model for `text_embedding` tasks (recommended for non-English language texts). The `model_id` is `.e5_model_2_linux-x86_64`.
80-
81-
Use the `inference_id` of the endpoint in a [`semantic_text`](elasticsearch://reference/elasticsearch/mapping-reference/semantic-text.md) field definition or when creating an [{{infer}} processor](elasticsearch://reference/enrich-processor/inference-processor.md). The API call will automatically download and deploy the model which might take a couple of minutes. Default {{infer}} enpoints have adaptive allocations enabled. For these models, the minimum number of allocations is `0`. If there is no {{infer}} activity that uses the endpoint, the number of allocations will scale down to `0` automatically after 15 minutes.
82-
8395
## Configuring chunking [infer-chunking-config]
8496

8597
{{infer-cap}} endpoints have a limit on the amount of text they can process at once, determined by the model's input capacity. Chunking is the process of splitting the input text into pieces that remain within these limits.

explore-analyze/machine-learning/nlp/ml-nlp-elser.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,10 @@ This approach provides a more understandable search experience compared to vecto
3232
To use ELSER, you must have the [appropriate subscription](https://www.elastic.co/subscriptions) level for semantic search or the trial period activated.
3333

3434
::::{note}
35-
The minimum dedicated ML node size for deploying and using the ELSER model is 4 GB in {{ech}} if [deployment autoscaling](../../../deploy-manage/autoscaling.md) is turned off. Turning on autoscaling is recommended because it allows your deployment to dynamically adjust resources based on demand. Better performance can be achieved by using more allocations or more threads per allocation, which requires bigger ML nodes. Autoscaling provides bigger nodes when required. If autoscaling is turned off, you must provide suitably sized nodes yourself.
35+
36+
- You can use the ELSER model through the [Elastic {{infer-cap}} Service (EIS)](/explore-analyze/elastic-inference/eis.md). If you use ELSER on EIS, you don't need to manage the infrastructure and resources required by the ELSER model as it doesn't use the resources of your nodes.
37+
38+
- The minimum dedicated ML node size for deploying and using the ELSER model is 4 GB in {{ech}} if [deployment autoscaling](../../../deploy-manage/autoscaling.md) is turned off. Turning on autoscaling is recommended because it allows your deployment to dynamically adjust resources based on demand. Better performance can be achieved by using more allocations or more threads per allocation, which requires bigger ML nodes. Autoscaling provides bigger nodes when required. If autoscaling is turned off, you must provide suitably sized nodes yourself.
3639
::::
3740

3841
Enabling trained model autoscaling for your ELSER deployment is recommended. Refer to [*Trained model autoscaling*](../../../deploy-manage/autoscaling/trained-model-autoscaling.md) to learn more.

explore-analyze/toc.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -122,6 +122,7 @@ toc:
122122
- file: transforms/transform-limitations.md
123123
- file: elastic-inference.md
124124
children:
125+
- file: elastic-inference/eis.md
125126
- file: elastic-inference/inference-api.md
126127
- file: machine-learning.md
127128
children:

0 commit comments

Comments
 (0)