Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions explore-analyze/elastic-inference.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
---
applies_to:
stack: ga
serverless: ga
navigation_title: Elastic Inference
---

# Elastic {{infer-cap}}

There are several ways to perform {{infer}} in the {{stack}}. This page provides a brief overview of the different methods:

* [Using EIS (Elastic Inference Service)](elastic-inference/eis.md)
* [Using the {{infer}} API](elastic-inference/inference-api.md)
* [Trained models deployed in your cluster](machine-learning/nlp/ml-nlp-overview.md)
10 changes: 10 additions & 0 deletions explore-analyze/elastic-inference/eis.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
---
applies_to:
stack: ga
serverless: ga
navigation_title: Elastic Inference Service (EIS)
---

# Elastic {{infer-cap}} Service

This is the documentation of the Elastic Inference Service.
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ Creates an {{infer}} endpoint to perform an {{infer}} task with the `elastic` se
::::{note}
The `chat_completion` task type only supports streaming and only through the `_stream` API.

For more information on how to use the `chat_completion` task type, please refer to the [chat completion documentation](/solutions/search/inference-api/chat-completion-inference-api.md).
For more information on how to use the `chat_completion` task type, please refer to the [chat completion documentation](chat-completion-inference-api.md).

::::

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ When adaptive allocations are enabled, the number of allocations of the model is

You can enable adaptive allocations by using:

* the create inference endpoint API for [ELSER](../../../solutions/search/inference-api/elser-inference-integration.md), [E5 and models uploaded through Eland](../../../solutions/search/inference-api/elasticsearch-inference-integration.md) that are used as {{infer}} services.
* the create inference endpoint API for [ELSER](../../elastic-inference/inference-api/elser-inference-integration.md), [E5 and models uploaded through Eland](../../elastic-inference/inference-api/elasticsearch-inference-integration.md) that are used as {{infer}} services.
* the [start trained model deployment](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-ml-start-trained-model-deployment) or [update trained model deployment](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-ml-update-trained-model-deployment) APIs for trained models that are deployed on {{ml}} nodes.

If the new allocations fit on the current {{ml}} nodes, they are immediately started. If more resource capacity is needed for creating new model allocations, then your {{ml}} node will be scaled up if {{ml}} autoscaling is enabled to provide enough resources for the new allocation. The number of model allocations can be scaled down to 0. They cannot be scaled up to more than 32 allocations, unless you explicitly set the maximum number of allocations to more. Adaptive allocations must be set up independently for each deployment and [{{infer}} endpoint](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-inference-put).
Expand Down
4 changes: 2 additions & 2 deletions explore-analyze/machine-learning/nlp/ml-nlp-e5.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ EmbEddings from bidirEctional Encoder rEpresentations - or E5 - is a {{nlp}} mo

[Semantic search](../../../solutions/search/semantic-search.md) provides you search results based on contextual meaning and user intent, rather than exact keyword matches.

E5 has two versions: one cross-platform version which runs on any hardware and one version which is optimized for Intel® silicon. The **Model Management** > **Trained Models** page shows you which version of E5 is recommended to deploy based on your cluster’s hardware. However, the recommended way to use E5 is through the [{{infer}} API](../../../solutions/search/inference-api/elasticsearch-inference-integration.md) as a service which makes it easier to download and deploy the model and you don’t need to select from different versions.
E5 has two versions: one cross-platform version which runs on any hardware and one version which is optimized for Intel® silicon. The **Model Management** > **Trained Models** page shows you which version of E5 is recommended to deploy based on your cluster’s hardware. However, the recommended way to use E5 is through the [{{infer}} API](../../elastic-inference/inference-api/elasticsearch-inference-integration.md) as a service which makes it easier to download and deploy the model and you don’t need to select from different versions.

Refer to the model cards of the [multilingual-e5-small](https://huggingface.co/elastic/multilingual-e5-small) and the [multilingual-e5-small-optimized](https://huggingface.co/elastic/multilingual-e5-small-optimized) models on HuggingFace for further information including licensing.

Expand Down Expand Up @@ -44,7 +44,7 @@ PUT _inference/text_embedding/my-e5-model

The API request automatically initiates the model download and then deploy the model.

Refer to the [`elasticsearch` {{infer}} service documentation](../../../solutions/search/inference-api/elasticsearch-inference-integration.md) to learn more about the available settings.
Refer to the [`elasticsearch` {{infer}} service documentation](../../elastic-inference/inference-api/elasticsearch-inference-integration.md) to learn more about the available settings.

After you created the E5 {{infer}} endpoint, it’s ready to be used for semantic search. The easiest way to perform semantic search in the {{stack}} is to [follow the `semantic_text` workflow](../../../solutions/search/semantic-search/semantic-search-semantic-text.md).

Expand Down
6 changes: 3 additions & 3 deletions explore-analyze/machine-learning/nlp/ml-nlp-elser.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ Enabling trained model autoscaling for your ELSER deployment is recommended. Ref

Compared to the initial version of the model, ELSER v2 offers improved retrieval accuracy and more efficient indexing. This enhancement is attributed to the extension of the training data set, which includes high-quality question and answer pairs and the improved FLOPS regularizer which reduces the cost of computing the similarity between a query and a document.

ELSER v2 has two versions: one cross-platform version which runs on any hardware and one version which is optimized for Intel® silicon. The **Model Management** > **Trained Models** page shows you which version of ELSER v2 is recommended to deploy based on your cluster’s hardware. However, the recommended way to use ELSER is through the [{{infer}} API](../../../solutions/search/inference-api/elser-inference-integration.md) as a service which makes it easier to download and deploy the model and you don’t need to select from different versions.
ELSER v2 has two versions: one cross-platform version which runs on any hardware and one version which is optimized for Intel® silicon. The **Model Management** > **Trained Models** page shows you which version of ELSER v2 is recommended to deploy based on your cluster’s hardware. However, the recommended way to use ELSER is through the [{{infer}} API](../../elastic-inference/inference-api/elser-inference-integration.md) as a service which makes it easier to download and deploy the model and you don’t need to select from different versions.

If you want to learn more about the ELSER V2 improvements, refer to [this blog post](https://www.elastic.co/search-labs/blog/introducing-elser-v2-part-1).

Expand Down Expand Up @@ -74,7 +74,7 @@ PUT _inference/sparse_embedding/my-elser-model

The API request automatically initiates the model download and then deploy the model. This example uses [autoscaling](ml-nlp-auto-scale.md) through adaptive allocation.

Refer to the [ELSER {{infer}} integration documentation](../../../solutions/search/inference-api/elser-inference-integration.md) to learn more about the available settings.
Refer to the [ELSER {{infer}} integration documentation](../../elastic-inference/inference-api/elser-inference-integration.md) to learn more about the available settings.

After you created the ELSER {{infer}} endpoint, it’s ready to be used for semantic search. The easiest way to perform semantic search in the {{stack}} is to [follow the `semantic_text` workflow](../../../solutions/search/semantic-search/semantic-search-semantic-text.md).

Expand Down Expand Up @@ -306,7 +306,7 @@ To gain the biggest value out of ELSER trained models, consider to follow this l
## Benchmark information [elser-benchmarks]

::::{important}
The recommended way to use ELSER is through the [{{infer}} API](../../../solutions/search/inference-api/elser-inference-integration.md) as a service.
The recommended way to use ELSER is through the [{{infer}} API](../../elastic-inference/inference-api/elser-inference-integration.md) as a service.
::::

The following sections provide information about how ELSER performs on different hardwares and compares the model performance to {{es}} BM25 and other strong baselines.
Expand Down
2 changes: 1 addition & 1 deletion explore-analyze/machine-learning/nlp/ml-nlp-overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ Elastic offers a wide range of possibilities to leverage natural language proces

You can **integrate NLP models from different providers** such as Cohere, HuggingFace, or OpenAI and use them as a service through the [semantic_text](../../../solutions/search/semantic-search/semantic-search-semantic-text.md) workflow. You can also use [ELSER](ml-nlp-elser.md) (the retrieval model trained by Elastic) and [E5](ml-nlp-e5.md) in the same way.

The [{{infer}} API](https://www.elastic.co/docs/api/doc/elasticsearch/group/endpoint-inference) enables you to use the same services with a more complex workflow, for greater control over your configurations settings. This [tutorial](../../../solutions/search/inference-api.md) walks you through the process of using the various services with the {{infer}} API.
The [{{infer}} API](https://www.elastic.co/docs/api/doc/elasticsearch/group/endpoint-inference) enables you to use the same services with a more complex workflow, for greater control over your configurations settings. This [tutorial](../../elastic-inference/inference-api.md) walks you through the process of using the various services with the {{infer}} API.

You can **upload and manage NLP models** using the Eland client and the [{{stack}}](ml-nlp-deploy-models.md). Find the [list of recommended and compatible models here](ml-nlp-model-ref.md). Refer to [*Examples*](ml-nlp-examples.md) to learn more about how to use {{ml}} models deployed in your cluster.

Expand Down
4 changes: 2 additions & 2 deletions explore-analyze/machine-learning/nlp/ml-nlp-rerank.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ Elastic Rerank is available in Elastic Stack version 8.17+:

## Download and deploy [ml-nlp-rerank-deploy]

To download and deploy Elastic Rerank, use the [create inference API](../../../solutions/search/inference-api/elasticsearch-inference-integration.md) to create an {{es}} service `rerank` endpoint.
To download and deploy Elastic Rerank, use the [create inference API](../../elastic-inference/inference-api/elasticsearch-inference-integration.md) to create an {{es}} service `rerank` endpoint.

::::{tip}
Refer to this [Python notebook](https://github.com/elastic/elasticsearch-labs/blob/main/notebooks/search/12-semantic-reranking-elastic-rerank.ipynb) for an end-to-end example using Elastic Rerank.
Expand Down Expand Up @@ -280,7 +280,7 @@ For detailed benchmark information, including complete dataset results and metho
**Documentation**:

* [Semantic re-ranking in {{es}} overview](../../../solutions/search/ranking/semantic-reranking.md#semantic-reranking-in-es)
* [Inference API example](../../../solutions/search/inference-api/elasticsearch-inference-integration.md#inference-example-elastic-reranker)
* [Inference API example](../../elastic-inference/inference-api/elasticsearch-inference-integration.md#inference-example-elastic-reranker)

**Blogs**:

Expand Down
21 changes: 21 additions & 0 deletions explore-analyze/toc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -116,6 +116,27 @@ toc:
- file: transforms/transform-examples.md
- file: transforms/transform-painless-examples.md
- file: transforms/transform-limitations.md
- file: elastic-inference.md
children:
- file: elastic-inference/eis.md
- file: elastic-inference/inference-api.md
children:
- file: elastic-inference/inference-api/elastic-inference-service-eis.md
- file: elastic-inference/inference-api/alibabacloud-ai-search-inference-integration.md
- file: elastic-inference/inference-api/amazon-bedrock-inference-integration.md
- file: elastic-inference/inference-api/anthropic-inference-integration.md
- file: elastic-inference/inference-api/azure-ai-studio-inference-integration.md
- file: elastic-inference/inference-api/azure-openai-inference-integration.md
- file: elastic-inference/inference-api/chat-completion-inference-api.md
- file: elastic-inference/inference-api/cohere-inference-integration.md
- file: elastic-inference/inference-api/elasticsearch-inference-integration.md
- file: elastic-inference/inference-api/elser-inference-integration.md
- file: elastic-inference/inference-api/google-ai-studio-inference-integration.md
- file: elastic-inference/inference-api/google-vertex-ai-inference-integration.md
- file: elastic-inference/inference-api/huggingface-inference-integration.md
- file: elastic-inference/inference-api/jinaai-inference-integration.md
- file: elastic-inference/inference-api/mistral-inference-integration.md
- file: elastic-inference/inference-api/openai-inference-integration.md
- file: machine-learning.md
children:
- file: machine-learning/setting-up-machine-learning.md
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ If you set the minimum number of allocations to 1, you will be charged even if t

You can enable adaptive allocations by using:

* the create inference endpoint API for [ELSER](../../../solutions/search/inference-api/elser-inference-integration.md), [E5 and models uploaded through Eland](../../../solutions/search/inference-api/elasticsearch-inference-integration.md) that are used as inference services.
* the create inference endpoint API for [ELSER](../../../explore-analyze/elastic-inference/inference-api/elser-inference-integration.md ), [E5 and models uploaded through Eland](../../../explore-analyze/elastic-inference/inference-api/elasticsearch-inference-integration.md) that are used as inference services.
* the [start trained model deployment](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-ml-start-trained-model-deployment) or [update trained model deployment](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-ml-update-trained-model-deployment) APIs for trained models that are deployed on machine learning nodes.

If the new allocations fit on the current machine learning nodes, they are immediately started. If more resource capacity is needed for creating new model allocations, then your machine learning node will be scaled up if machine learning autoscaling is enabled to provide enough resources for the new allocation. The number of model allocations can be scaled down to 0. They cannot be scaled up to more than 32 allocations, unless you explicitly set the maximum number of allocations to more. Adaptive allocations must be set up independently for each deployment and [inference endpoint](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-inference-put).
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ The following examples use the:
* `amazon.titan-embed-text-v1` model for [Amazon Bedrock](https://docs.aws.amazon.com/bedrock/latest/userguide/model-ids.md)
* `ops-text-embedding-zh-001` model for [AlibabaCloud AI](https://help.aliyun.com/zh/open-search/search-platform/developer-reference/text-embedding-api-details)

You can use any Cohere and OpenAI models, they are all supported by the {{infer}} API. For a list of recommended models available on HuggingFace, refer to [the supported model list](../../../solutions/search/inference-api/huggingface-inference-integration.md#inference-example-hugging-face-supported-models).
You can use any Cohere and OpenAI models, they are all supported by the {{infer}} API. For a list of recommended models available on HuggingFace, refer to [the supported model list](../../../explore-analyze/elastic-inference/inference-api/huggingface-inference-integration.md).

Click the name of the service you want to use on any of the widgets below to review the corresponding instructions.

Expand Down
2 changes: 1 addition & 1 deletion solutions/search/hybrid-semantic-text.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ This tutorial demonstrates how to perform hybrid search, combining semantic sear

In hybrid search, semantic search retrieves results based on the meaning of the text, while full-text search focuses on exact word matches. By combining both methods, hybrid search delivers more relevant results, particularly in cases where relying on a single approach may not be sufficient.

The recommended way to use hybrid search in the {{stack}} is following the `semantic_text` workflow. This tutorial uses the [`elasticsearch` service](inference-api/elasticsearch-inference-integration.md) for demonstration, but you can use any service and their supported models offered by the {{infer-cap}} API.
The recommended way to use hybrid search in the {{stack}} is following the `semantic_text` workflow. This tutorial uses the [`elasticsearch` service](../../explore-analyze/elastic-inference/inference-api/elasticsearch-inference-integration.md) for demonstration, but you can use any service and their supported models offered by the {{infer-cap}} API.


## Create an index mapping [hybrid-search-create-index-mapping]
Expand Down
Loading