diff --git a/explore-analyze/machine-learning/nlp.md b/explore-analyze/machine-learning/nlp.md index 44f6f13255..0c9bc6c90a 100644 --- a/explore-analyze/machine-learning/nlp.md +++ b/explore-analyze/machine-learning/nlp.md @@ -13,6 +13,7 @@ You can use {{stack-ml-features}} to analyze natural language data and make pred * [Add NLP {{infer}} to ingest pipelines](nlp/ml-nlp-inference.md) * [API quick reference](nlp/ml-nlp-apis.md) * [ELSER](nlp/ml-nlp-elser.md) +* [Elastic Rerank](nlp/ml-nlp-rerank.md) * [E5](nlp/ml-nlp-e5.md) * [Language identification](nlp/ml-nlp-lang-ident.md) * [Examples](nlp/ml-nlp-examples.md) diff --git a/explore-analyze/machine-learning/nlp/ml-nlp-built-in-models.md b/explore-analyze/machine-learning/nlp/ml-nlp-built-in-models.md index a1cbaefc1e..6aaf6ccd57 100644 --- a/explore-analyze/machine-learning/nlp/ml-nlp-built-in-models.md +++ b/explore-analyze/machine-learning/nlp/ml-nlp-built-in-models.md @@ -10,8 +10,3 @@ There are {{nlp}} models that are available for use in every cluster out-of-the- * [ELSER](ml-nlp-elser.md) trained by Elastic * [E5](ml-nlp-e5.md) * [{{lang-ident-cap}}](ml-nlp-lang-ident.md) - - - - - diff --git a/explore-analyze/machine-learning/nlp/ml-nlp-e5.md b/explore-analyze/machine-learning/nlp/ml-nlp-e5.md index fe269a5142..47b6d1f3e2 100644 --- a/explore-analyze/machine-learning/nlp/ml-nlp-e5.md +++ b/explore-analyze/machine-learning/nlp/ml-nlp-e5.md @@ -4,11 +4,8 @@ mapped_pages: - https://www.elastic.co/guide/en/machine-learning/current/ml-nlp-e5.html --- - - # E5 [ml-nlp-e5] - EmbEddings from bidirEctional Encoder rEpresentations - or E5 - is a {{nlp}} model that enables you to perform multi-lingual semantic search by using dense vector representations. This model is recommended for non-English language documents and queries. If you want to perform semantic search on English language documents, use the [ELSER](ml-nlp-elser.md) model. [Semantic search](../../../solutions/search/semantic-search.md) provides you search results based on contextual meaning and user intent, rather than exact keyword matches. @@ -17,14 +14,12 @@ E5 has two versions: one cross-platform version which runs on any hardware and o Refer to the model cards of the [multilingual-e5-small](https://huggingface.co/elastic/multilingual-e5-small) and the [multilingual-e5-small-optimized](https://huggingface.co/elastic/multilingual-e5-small-optimized) models on HuggingFace for further information including licensing. - ## Requirements [e5-req] To use E5, you must have the [appropriate subscription](https://www.elastic.co/subscriptions) level for semantic search or the trial period activated. Enabling trained model autoscaling for your E5 deployment is recommended. Refer to [*Trained model autoscaling*](ml-nlp-auto-scale.md) to learn more. - ## Download and deploy E5 [download-deploy-e5] The easiest and recommended way to download and deploy E5 is to use the [{{infer}} API](https://www.elastic.co/guide/en/elasticsearch/reference/current/inference-apis.html). @@ -32,8 +27,8 @@ The easiest and recommended way to download and deploy E5 is to use the [{{infer 1. In {{kib}}, navigate to the **Dev Console**. 2. Create an {{infer}} endpoint with the `elasticsearch` service by running the following API request: - ```console - PUT _inference/text_embedding/my-e5-model +```console +PUT _inference/text_embedding/my-e5-model { "service": "elasticsearch", "service_settings": { @@ -42,16 +37,14 @@ The easiest and recommended way to download and deploy E5 is to use the [{{infer "model_id": ".multilingual-e5-small" } } - ``` +``` The API request automatically initiates the model download and then deploy the model. - Refer to the [`elasticsearch` {{infer}} service documentation](../../../solutions/search/inference-api/elasticsearch-inference-integration.md) to learn more about the available settings. After you created the E5 {{infer}} endpoint, it’s ready to be used for semantic search. The easiest way to perform semantic search in the {{stack}} is to [follow the `semantic_text` workflow](../../../solutions/search/semantic-search/semantic-search-semantic-text.md). - ### Alternative methods to download and deploy E5 [alternative-download-deploy-e5] You can also download and deploy the E5 model either from **{{ml-app}}** > **Trained Models**, from **Search** > **Indices**, or by using the trained models API in Dev Console. @@ -60,7 +53,6 @@ You can also download and deploy the E5 model either from **{{ml-app}}** > **Tra For most cases, the preferred version is the **Intel and Linux optimized** model, it is recommended to download and deploy that version. :::: - ::::{dropdown} Using the Trained Models page #### Using the Trained Models page [trained-model-e5] @@ -87,7 +79,6 @@ For most cases, the preferred version is the **Intel and Linux optimized** model :::: - ::::{dropdown} Using the search indices UI #### Using the search indices UI [elasticsearch-e5] @@ -111,12 +102,10 @@ Alternatively, you can download and deploy the E5 model to an {{infer}} pipeline :class: screenshot ::: - When your E5 model is deployed and started, it is ready to be used in a pipeline. :::: - ::::{dropdown} Using the trained models API in Dev Console #### Using the trained models API in Dev Console [dev-console-e5] @@ -124,38 +113,37 @@ When your E5 model is deployed and started, it is ready to be used in a pipeline 1. In {{kib}}, navigate to the **Dev Console**. 2. Create the E5 model configuration by running the following API call: - ```console - PUT _ml/trained_models/.multilingual-e5-small +```console +PUT _ml/trained_models/.multilingual-e5-small { "input": { "field_names": ["text_field"] } } - ``` +``` The API call automatically initiates the model download if the model is not downloaded yet. 3. Deploy the model by using the [start trained model deployment API](https://www.elastic.co/guide/en/elasticsearch/reference/current/start-trained-model-deployment.html) with a delpoyment ID: - ```console - POST _ml/trained_models/.multilingual-e5-small/deployment/_start?deployment_id=for_search - ``` - +```console +POST _ml/trained_models/.multilingual-e5-small/deployment/_start?deployment_id=for_search +``` :::: - - ## Deploy the E5 model in an air-gapped environment [air-gapped-install-e5] -If you want to install E5 in an air-gapped environment, you have the following options: * put the model artifacts into a directory inside the config directory on all master-eligible nodes (for `multilingual-e5-small` and `multilingual-e5-small-linux-x86-64`) * install the model by using HuggingFace (for `multilingual-e5-small` model only). +If you want to install E5 in an air-gapped environment, you have the following options: +* put the model artifacts into a directory inside the config directory on all master-eligible nodes (for `multilingual-e5-small` and `multilingual-e5-small-linux-x86-64`) +* install the model by using HuggingFace (for `multilingual-e5-small` model only). ### Model artifact files [e5-model-artifacts] For the `multilingual-e5-small` model, you need the following files in your system: -``` +```url https://ml-models.elastic.co/multilingual-e5-small.metadata.json https://ml-models.elastic.co/multilingual-e5-small.pt https://ml-models.elastic.co/multilingual-e5-small.vocab.json @@ -163,13 +151,12 @@ https://ml-models.elastic.co/multilingual-e5-small.vocab.json For the optimized version, you need the following files in your system: -``` +```url https://ml-models.elastic.co/multilingual-e5-small_linux-x86_64.metadata.json https://ml-models.elastic.co/multilingual-e5-small_linux-x86_64.pt https://ml-models.elastic.co/multilingual-e5-small_linux-x86_64.vocab.json ``` - ### Using file-based access [_using_file_based_access_3] For a file-based access, follow these steps: @@ -178,7 +165,7 @@ For a file-based access, follow these steps: 2. Put the files into a `models` subdirectory inside the `config` directory of your {{es}} deployment. 3. Point your {{es}} deployment to the model directory by adding the following line to the `config/elasticsearch.yml` file: - ``` + ```yml xpack.ml.model_repository: file://${path.home}/config/models/` ``` @@ -190,7 +177,6 @@ For a file-based access, follow these steps: 9. Provide a deployment ID, select the priority, and set the number of allocations and threads per allocation values. 10. Click **Start**. - ### Using the HuggingFace repository [_using_the_huggingface_repository] You can install the `multilingual-e5-small` model in a restricted or closed network by pointing the `eland_import_hub_model` script to the model’s local files. @@ -230,8 +216,6 @@ For an offline install, the model first needs to be cloned locally, Git and [Git Once it’s uploaded to {{es}}, the model will have the ID specified by `--es-model-id`. If it is not set, the model ID is derived from `--hub-model-id`; spaces and path delimiters are converted to double underscores `__`. - - ## Disclaimer [terms-of-use-e5] Customers may add third party trained models for management in Elastic. These models are not owned by Elastic. While Elastic will support the integration with these models in the performance according to the documentation, you understand and agree that Elastic has no control over, or liability for, the third party models or the underlying training data they may utilize. diff --git a/explore-analyze/machine-learning/nlp/ml-nlp-elser.md b/explore-analyze/machine-learning/nlp/ml-nlp-elser.md index 750fc1c2fb..8f8cb44c7a 100644 --- a/explore-analyze/machine-learning/nlp/ml-nlp-elser.md +++ b/explore-analyze/machine-learning/nlp/ml-nlp-elser.md @@ -4,11 +4,8 @@ mapped_pages: - https://www.elastic.co/guide/en/machine-learning/current/ml-nlp-elser.html --- - - # ELSER [ml-nlp-elser] - Elastic Learned Sparse EncodeR - or ELSER - is a retrieval model trained by Elastic that enables you to perform [semantic search](../../../solutions/search/vector/sparse-vector-elser.md) to retrieve more relevant search results. This search type provides you search results based on contextual meaning and user intent, rather than exact keyword matches. ELSER is an out-of-domain model which means it does not require fine-tuning on your own data, making it adaptable for various use cases out of the box. @@ -19,15 +16,12 @@ This model is recommended for English language documents and queries. If you wan While ELSER V2 is generally available, ELSER V1 is in [preview] and will remain in technical preview. :::: - - ## Tokens - not synonyms [elser-tokens] ELSER expands the indexed and searched passages into collections of terms that are learned to co-occur frequently within a diverse set of training data. The terms that the text is expanded into by the model *are not* synonyms for the search terms; they are learned associations capturing relevance. These expanded terms are weighted as some of them are more significant than others. Then the {{es}} [sparse vector](https://www.elastic.co/guide/en/elasticsearch/reference/current/sparse-vector.html) (or [rank features](https://www.elastic.co/guide/en/elasticsearch/reference/current/rank-features.html)) field type is used to store the terms and weights at index time, and to search against later. This approach provides a more understandable search experience compared to vector embeddings. However, attempting to directly interpret the tokens and weights can be misleading, as the expansion essentially results in a vector in a very high-dimensional space. Consequently, certain tokens, especially those with low weight, contain information that is intertwined with other low-weight tokens in the representation. In this regard, they function similarly to a dense vector representation, making it challenging to separate their individual contributions. This complexity can potentially lead to misinterpretations if not carefully considered during analysis. - ## Requirements [elser-req] To use ELSER, you must have the [appropriate subscription](https://www.elastic.co/subscriptions) level for semantic search or the trial period activated. @@ -36,10 +30,8 @@ To use ELSER, you must have the [appropriate subscription](https://www.elastic.c The minimum dedicated ML node size for deploying and using the ELSER model is 4 GB in Elasticsearch Service if [deployment autoscaling](../../../deploy-manage/autoscaling.md) is turned off. Turning on autoscaling is recommended because it allows your deployment to dynamically adjust resources based on demand. Better performance can be achieved by using more allocations or more threads per allocation, which requires bigger ML nodes. Autoscaling provides bigger nodes when required. If autoscaling is turned off, you must provide suitably sized nodes yourself. :::: - Enabling trained model autoscaling for your ELSER deployment is recommended. Refer to [*Trained model autoscaling*](ml-nlp-auto-scale.md) to learn more. - ## ELSER v2 [elser-v2] Compared to the initial version of the model, ELSER v2 offers improved retrieval accuracy and more efficient indexing. This enhancement is attributed to the extension of the training data set, which includes high-quality question and answer pairs and the improved FLOPS regularizer which reduces the cost of computing the similarity between a query and a document. @@ -48,14 +40,12 @@ ELSER v2 has two versions: one cross-platform version which runs on any hardware If you want to learn more about the ELSER V2 improvements, refer to [this blog post](https://www.elastic.co/search-labs/blog/introducing-elser-v2-part-1). - ### Upgrading to ELSER v2 [upgrade-elser-v2] ELSER v2 is not backward compatible. If you indexed your data with ELSER v1, you need to reindex it with an ingest pipeline referencing ELSER v2 to be able to use v2 for search. This [tutorial](../../../solutions/search/vector/sparse-vector-elser.md) shows you how to create an ingest pipeline with an {{infer}} processor that uses ELSER v2, and how to reindex your data through the pipeline. Additionally, the `elasticearch-labs` GitHub repository contains an interactive [Python notebook](https://github.com/elastic/elasticsearch-labs/blob/main/notebooks/model-upgrades/upgrading-index-to-use-elser.ipynb) that walks through upgrading an index to ELSER V2. - ## Download and deploy ELSER [download-deploy-elser] The easiest and recommended way to download and deploy ELSER is to use the [{{infer}} API](https://www.elastic.co/guide/en/elasticsearch/reference/current/inference-apis.html). @@ -63,8 +53,8 @@ The easiest and recommended way to download and deploy ELSER is to use the [{{in 1. In {{kib}}, navigate to the **Dev Console**. 2. Create an {{infer}} endpoint with the ELSER service by running the following API request: - ```console - PUT _inference/sparse_embedding/my-elser-model +```console +PUT _inference/sparse_embedding/my-elser-model { "service": "elasticsearch", "service_settings": { @@ -77,16 +67,14 @@ The easiest and recommended way to download and deploy ELSER is to use the [{{in "model_id": ".elser_model_2_linux-x86_64" } } - ``` - - The API request automatically initiates the model download and then deploy the model. This example uses [autoscaling](ml-nlp-auto-scale.md) through adaptive allocation. +``` +The API request automatically initiates the model download and then deploy the model. This example uses [autoscaling](ml-nlp-auto-scale.md) through adaptive allocation. -Refer to the [ELSER {{infer}} service documentation](../../../solutions/search/inference-api/elser-inference-integration.md) to learn more about the available settings. +Refer to the [ELSER {{infer}} integration documentation](../../../solutions/search/inference-api/elser-inference-integration.md) to learn more about the available settings. After you created the ELSER {{infer}} endpoint, it’s ready to be used for semantic search. The easiest way to perform semantic search in the {{stack}} is to [follow the `semantic_text` workflow](../../../solutions/search/semantic-search/semantic-search-semantic-text.md). - ### Alternative methods to download and deploy ELSER [alternative-download-deploy] You can also download and deploy ELSER either from **{{ml-app}}** > **Trained Models**, from **Search** > **Indices**, or by using the trained models API in Dev Console. @@ -97,7 +85,6 @@ You can also download and deploy ELSER either from **{{ml-app}}** > **Trained Mo :::: - ::::{dropdown} Using the Trained Models page #### Using the Trained Models page [trained-model] @@ -124,7 +111,6 @@ You can also download and deploy ELSER either from **{{ml-app}}** > **Trained Mo :::: - ::::{dropdown} Using the search indices UI #### Using the search indices UI [elasticsearch] @@ -148,10 +134,8 @@ Alternatively, you can download and deploy ELSER to an {{infer}} pipeline using :class: screenshot ::: - :::: - ::::{dropdown} Using the trained models API in Dev Console #### Using the trained models API in Dev Console [dev-console] @@ -159,30 +143,27 @@ Alternatively, you can download and deploy ELSER to an {{infer}} pipeline using 1. In {{kib}}, navigate to the **Dev Console**. 2. Create the ELSER model configuration by running the following API call: - ```console - PUT _ml/trained_models/.elser_model_2 +```console +PUT _ml/trained_models/.elser_model_2 { "input": { "field_names": ["text_field"] } } - ``` +``` The API call automatically initiates the model download if the model is not downloaded yet. 3. Deploy the model by using the [start trained model deployment API](https://www.elastic.co/guide/en/elasticsearch/reference/current/start-trained-model-deployment.html) with a delpoyment ID: - ```console - POST _ml/trained_models/.elser_model_2/deployment/_start?deployment_id=for_search - ``` +```console +POST _ml/trained_models/.elser_model_2/deployment/_start?deployment_id=for_search +``` You can deploy the model multiple times with different deployment IDs. - :::: - - ## Deploy ELSER in an air-gapped environment [air-gapped-install] If you want to deploy ELSER in a restricted or closed network, you have two options: @@ -190,12 +171,11 @@ If you want to deploy ELSER in a restricted or closed network, you have two opti * create your own HTTP/HTTPS endpoint with the model artifacts on it, * put the model artifacts into a directory inside the config directory on all [master-eligible nodes](https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-node.html#master-node). - ### Model artifact files [elser-model-artifacts] For the cross-platform verison, you need the following files in your system: -``` +```url https://ml-models.elastic.co/elser_model_2.metadata.json https://ml-models.elastic.co/elser_model_2.pt https://ml-models.elastic.co/elser_model_2.vocab.json @@ -203,13 +183,12 @@ https://ml-models.elastic.co/elser_model_2.vocab.json For the optimized version, you need the following files in your system: -``` +```url https://ml-models.elastic.co/elser_model_2_linux-x86_64.metadata.json https://ml-models.elastic.co/elser_model_2_linux-x86_64.pt https://ml-models.elastic.co/elser_model_2_linux-x86_64.vocab.json ``` - ### Using an HTTP server [_using_an_http_server] INFO: If you use an existing HTTP server, note that the model downloader only supports passwordless HTTP servers. @@ -231,7 +210,7 @@ You can use any HTTP service to deploy ELSER. This example uses the official Ngi 4. Verify that Nginx runs properly by visiting the following URL in your browser: - ``` + ```url http://{IP_ADDRESS_OR_HOSTNAME}:8080/elser_model_2.metadata.json ``` @@ -239,7 +218,7 @@ You can use any HTTP service to deploy ELSER. This example uses the official Ngi 5. Point your Elasticsearch deployment to the model artifacts on the HTTP server by adding the following line to the `config/elasticsearch.yml` file: - ``` + ```yml xpack.ml.model_repository: http://{IP_ADDRESS_OR_HOSTNAME}:8080 ``` @@ -259,7 +238,6 @@ The HTTP server is only required for downloading the model. After the download h docker stop ml-models ``` - ### Using file-based access [_using_file_based_access] For a file-based access, follow these steps: @@ -268,7 +246,7 @@ For a file-based access, follow these steps: 2. Put the files into a `models` subdirectory inside the `config` directory of your Elasticsearch deployment. 3. Point your Elasticsearch deployment to the model directory by adding the following line to the `config/elasticsearch.yml` file: - ``` + ```yml xpack.ml.model_repository: file://${path.home}/config/models/ ``` @@ -280,7 +258,6 @@ For a file-based access, follow these steps: 9. Provide a deployment ID, select the priority, and set the number of allocations and threads per allocation values. 10. Click **Start**. - ## Testing ELSER [_testing_elser] You can test the deployed model in {{kib}}. Navigate to **Model Management** > **Trained Models** from the main menu, or use the [global search field](../../overview/kibana-quickstart.md#_finding_your_apps_and_objects) in {{kib}}. Locate the deployed ELSER model in the list of trained models, then select **Test model** from the Actions menu. @@ -294,7 +271,6 @@ The results contain a list of ten random values for the selected field along wit :class: screenshot ::: - ## Performance considerations [performance] * ELSER works best on small-to-medium sized fields that contain natural language. For connector or web crawler use cases, this aligns best with fields like *title*, *description*, *summary*, or *abstract*. As ELSER encodes the first 512 tokens of a field, it may not provide as relevant of results for large fields. For example, `body_content` on web crawler documents, or body fields resulting from extracting text from office documents with connectors. For larger fields like these, consider "chunking" the content into multiple values, where each chunk can be under 512 tokens. @@ -303,12 +279,10 @@ The results contain a list of ten random values for the selected field along wit To learn more about ELSER performance, refer to the [Benchmark information](#elser-benchmarks). - ## Pre-cleaning input text [pre-cleaning] The quality of the input text significantly affects the quality of the embeddings. To achieve the best results, it’s recommended to clean the input text before generating embeddings. The exact preprocessing you may need to do heavily depends on your text. For example, if your text contains HTML tags, use the [HTML strip processor](https://www.elastic.co/guide/en/elasticsearch/reference/current/htmlstrip-processor.html) in an ingest pipeline to remove unnecessary elements. Always review and clean your input text before ingestion to eliminate any irrelevant entities that might affect the results. - ## Recommendations for using ELSER [elser-recommendations] To gain the biggest value out of ELSER trained models, consider to follow this list of recommendations. @@ -317,39 +291,31 @@ To gain the biggest value out of ELSER trained models, consider to follow this l * Setting `min_allocations` to `0` can save on costs for non-critical use cases or testing environments. * Enabling [autoscaling](ml-nlp-auto-scale.md) through adaptive allocations or adaptive resources makes it possible for {{es}} to scale up or down the available resources of your ELSER deployment based on the load on the process. * Use dedicated, optimized ELSER {{infer}} endpoints for ingest and search use cases. - - * When deploying a trained model in {{kib}}, you can select for which case you want to optimize your ELSER deployment. - * If you use the trained model or {{infer}} APIs and want to optimize your ELSER trained model deployment or {{infer}} endpoint for ingest, set the number of threads to `1` (`"num_threads": 1`). - * If you use the trained model or {{infer}} APIs and want to optimize your ELSER trained model deployment or {{infer}} endpoint for search, set the number of threads to greater than `1`. - - + * When deploying a trained model in {{kib}}, you can select for which case you want to optimize your ELSER deployment. + * If you use the trained model or {{infer}} APIs and want to optimize your ELSER trained model deployment or {{infer}} endpoint for ingest, set the number of threads to `1` (`"num_threads": 1`). + * If you use the trained model or {{infer}} APIs and want to optimize your ELSER trained model deployment or {{infer}} endpoint for search, set the number of threads to greater than `1`. ## Further reading [further-readings] * [Perform semantic search with `semantic_text` using the ELSER endpoint](../../../solutions/search/semantic-search/semantic-search-semantic-text.md) * [Perform semantic search with ELSER](../../../solutions/search/vector/sparse-vector-elser.md) - ## Benchmark information [elser-benchmarks] ::::{important} The recommended way to use ELSER is through the [{{infer}} API](../../../solutions/search/inference-api/elser-inference-integration.md) as a service. :::: - The following sections provide information about how ELSER performs on different hardwares and compares the model performance to {{es}} BM25 and other strong baselines. - ### Version overview [version-overview] ELSER V2 has a **optimized** version that is designed to run only on Linux with an x86-64 CPU architecture and a **cross-platform** version that can be run on any platform. - #### ELSER V2 [version-overview-v2] Besides the performance improvements, the biggest change in ELSER V2 is the introduction of the first platform specific ELSER model - that is, a model optimized to run only on Linux with an x86-64 CPU architecture. The optimized model is designed to work best on newer Intel CPUs, but it works on AMD CPUs as well. It is recommended to use the new optimized Linux-x86-64 model for all new users of ELSER as it is significantly faster than the cross-platform model which can be run on any platform. ELSER V2 produces significantly higher quality embeddings than ELSER V1. Regardless of which ELSER V2 model you use (optimized or cross-platform), the particular embeddings produced are the same. - ### Qualitative benchmarks [elser-qualitative-benchmarks] The metric that is used to evaluate ELSER’s ranking ability is the Normalized Discounted Cumulative Gain (NDCG) which can handle multiple relevant documents and fine-grained document ratings. The metric is applied to a fixed-sized list of retrieved documents which, in this case, is the top 10 documents (NDCG@10). @@ -359,9 +325,7 @@ The table below shows the performance of ELSER V2 compared to BM 25. ELSER V2 ha :::{image} ../../../images/machine-learning-ml-nlp-bm25-elser-v2.png :alt: ELSER V2 benchmarks compared to BM25 ::: - -*NDCG@10 for BEIR data sets for BM25 and ELSER V2 - higher values are better)* - +*NDCG@10 for BEIR data sets for BM25 and ELSER V2 - higher values are better* ### Hardware benchmarks [elser-hw-benchmarks] @@ -369,8 +333,6 @@ The table below shows the performance of ELSER V2 compared to BM 25. ELSER V2 ha While the goal is to create a model that is as performant as possible, retrieval accuracy always take precedence over speed, this is one of the design principles of ELSER. Consult with the tables below to learn more about the expected model performance. The values refer to operations performed on two data sets and different hardware configurations. Your data set has an impact on the model performance. Run tests on your own data to have a more realistic view on the model performance for your use case. :::: - - #### ELSER V2 [_elser_v2] Overall the optimized V2 model ingested at a max rate of 26 docs/s, compared with the ELSER V1 max rate of 14 docs/s from the ELSER V1 benchamrk, resulting in a 90% increase in throughput. @@ -381,7 +343,6 @@ The performance of virtual cores (that is, when the number of allocations is gre The length of the documents in your particular dataset will have a significant impact on your throughput numbers. :::: - Refer to [this blog post](https://www.elastic.co/search-labs/blog/introducing-elser-v2-part-1) to learn more about ELSER V2 improved performance. :::{image} ../../../images/machine-learning-ml-nlp-elser-bm-summary.png diff --git a/explore-analyze/machine-learning/nlp/ml-nlp-model-ref.md b/explore-analyze/machine-learning/nlp/ml-nlp-model-ref.md index 39d9fe7d59..3e5e7f058e 100644 --- a/explore-analyze/machine-learning/nlp/ml-nlp-model-ref.md +++ b/explore-analyze/machine-learning/nlp/ml-nlp-model-ref.md @@ -4,16 +4,12 @@ mapped_pages: - https://www.elastic.co/guide/en/machine-learning/current/ml-nlp-model-ref.html --- - - # Compatible third party models [ml-nlp-model-ref] - ::::{note} The minimum dedicated ML node size for deploying and using the {{nlp}} models is 16 GB in Elasticsearch Service if [deployment autoscaling](../../../deploy-manage/autoscaling.md) is turned off. Turning on autoscaling is recommended because it allows your deployment to dynamically adjust resources based on demand. Better performance can be achieved by using more allocations or more threads per allocation, which requires bigger ML nodes. Autoscaling provides bigger nodes when required. If autoscaling is turned off, you must provide suitably sized nodes yourself. :::: - The {{stack-ml-features}} support transformer models that conform to the standard BERT model interface and use the WordPiece tokenization algorithm. The current list of supported architectures is: @@ -37,7 +33,6 @@ These models are listed by NLP task; for more information about those tasks, ref **Models highlighted in bold** in the list below are recommended for evaluation purposes and to get started with the Elastic {{nlp}} features. - ## Third party fill-mask models [ml-nlp-model-ref-mask] * [BERT base model](https://huggingface.co/bert-base-uncased) @@ -45,7 +40,6 @@ These models are listed by NLP task; for more information about those tasks, ref * [MPNet base model](https://huggingface.co/microsoft/mpnet-base) * [RoBERTa large model](https://huggingface.co/roberta-large) - ## Third party named entity recognition models [ml-nlp-model-ref-ner] * [BERT base NER](https://huggingface.co/dslim/bert-base-NER) @@ -54,7 +48,6 @@ These models are listed by NLP task; for more information about those tasks, ref * [**DistilBERT base uncased finetuned conll03 English**](https://huggingface.co/elastic/distilbert-base-uncased-finetuned-conll03-english) * [DistilBERT fa zwnj base NER](https://huggingface.co/HooshvareLab/distilbert-fa-zwnj-base-ner) - ## Third party question answering models [ml-nlp-model-ref-question-answering] * [BERT large model (uncased) whole word masking finetuned on SQuAD](https://huggingface.co/bert-large-uncased-whole-word-masking-finetuned-squad) @@ -62,7 +55,6 @@ These models are listed by NLP task; for more information about those tasks, ref * [Electra base squad2](https://huggingface.co/deepset/electra-base-squad2) * [TinyRoBERTa squad2](https://huggingface.co/deepset/tinyroberta-squad2) - ## Third party sparse embedding models [ml-nlp-model-ref-sparse-embedding] Sparse embedding models should be configured with the `text_expansion` task type. @@ -71,7 +63,6 @@ Sparse embedding models should be configured with the `text_expansion` task type * [aken12/splade-japanese-v3](https://huggingface.co/aken12/splade-japanese-v3) * [hotchpotch/japanese-splade-v2](https://huggingface.co/hotchpotch/japanese-splade-v2) - ## Third party text embedding models [ml-nlp-model-ref-text-embedding] Text Embedding models are designed to work with specific scoring functions for calculating the similarity between the embeddings they produce. Examples of typical scoring functions are: `cosine`, `dot product` and `euclidean distance` (also known as `l2_norm`). @@ -103,7 +94,6 @@ Using `DPREncoderWrapper`: * [dpr-question_encoder single nq base](https://huggingface.co/facebook/dpr-question_encoder-single-nq-base) * [dpr-question_encoder multiset base](https://huggingface.co/facebook/dpr-question_encoder-multiset-base) - ## Third party text classification models [ml-nlp-model-ref-text-classification] * [BERT base uncased emotion](https://huggingface.co/nateraw/bert-base-uncased-emotion) @@ -113,7 +103,6 @@ Using `DPREncoderWrapper`: * [FinBERT](https://huggingface.co/ProsusAI/finbert) * [Twitter roBERTa base for Sentiment Analysis](https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment) - ## Third party text similarity models [ml-nlp-model-ref-text-similarity] You can use these text similarity models for [semantic re-ranking](../../../solutions/search/ranking/semantic-reranking.md#semantic-reranking-in-es). @@ -122,7 +111,6 @@ You can use these text similarity models for [semantic re-ranking](../../../solu * [ms marco MiniLM L6 v2](https://huggingface.co/cross-encoder/ms-marco-MiniLM-L-6-v2) * [BAAI/bge-reranker-base](https://huggingface.co/BAAI/bge-reranker-base) - ## Third party zero-shot text classification models [ml-nlp-model-ref-zero-shot] * [BART large mnli](https://huggingface.co/facebook/bart-large-mnli) @@ -133,14 +121,12 @@ You can use these text similarity models for [semantic re-ranking](../../../solu * [NLI RoBERTa base](https://huggingface.co/cross-encoder/nli-roberta-base) * [SqueezeBERT](https://huggingface.co/typeform/squeezebert-mnli) - ## Expected model output [_expected_model_output] Models used for each NLP task type must output tensors of a specific format to be used in the Elasticsearch NLP pipelines. Here are the expected outputs for each task type. - ### Fill mask expected model output [_fill_mask_expected_model_output] Fill mask is a specific kind of token classification; it is the base training task of many transformer models. @@ -151,7 +137,7 @@ Here is an example with a single sequence `"The capital of [MASK] is Paris"` and Should output: -``` +```json [ [ [ 0, 0, 0, 0, 0, 0, 0 ], // The @@ -166,7 +152,6 @@ Should output: The predicted value here for `[MASK]` is `"France"` with a score of 1.2. - ### Named entity recognition expected model output [_named_entity_recognition_expected_model_output] Named entity recognition is a specific token classification task. Each token in the sequence is scored related to a specific set of classification labels. For the Elastic Stack, we use Inside-Outside-Beginning (IOB) tagging. Elastic supports any NER entities as long as they are IOB tagged. The default values are: "O", "B_MISC", "I_MISC", "B_PER", "I_PER", "B_ORG", "I_ORG", "B_LOC", "I_LOC". @@ -177,7 +162,7 @@ The response format must be a float tensor with `shape(, , )`. Here is an example with two sequences for a binary classification model of "happy" and "sad": -``` +```json [ [ // happy, sad @@ -215,14 +198,13 @@ Here is an example with two sequences for a binary classification model of "happ ] ``` - ### Zero-shot text classification expected model output [_zero_shot_text_classification_expected_model_output] Zero-shot text classification allows text to be classified for arbitrary labels not necessarily part of the original training. Each sequence is combined with the label given some hypothesis template. The model then scores each of these combinations according to `[entailment, neutral, contradiction]`. The output of the model must be a float tensor with `shape(, , 3)`. Here is an example with a single sequence classified against 4 labels: -``` +```json [ [ // entailment, neutral, contradiction diff --git a/explore-analyze/machine-learning/nlp/ml-nlp-rerank.md b/explore-analyze/machine-learning/nlp/ml-nlp-rerank.md index 24490c339e..421e8dd43c 100644 --- a/explore-analyze/machine-learning/nlp/ml-nlp-rerank.md +++ b/explore-analyze/machine-learning/nlp/ml-nlp-rerank.md @@ -17,35 +17,29 @@ The model can significantly improve search result quality by reordering results When reranking BM25 results, it provides an average 40% improvement in ranking quality on a diverse benchmark of retrieval tasks— matching the performance of models 11x its size. +## Availability and requirements [ml-nlp-rerank-availability] -## Availability and requirements [ml-nlp-rerank-availability] - -::::{warning} +::::{warning} This functionality is in technical preview and may be changed or removed in a future release. Elastic will work to fix any issues, but features in technical preview are not subject to the support SLA of official GA features. :::: - - -### Elastic Cloud Serverless [ml-nlp-rerank-availability-serverless] +### Elastic Cloud Serverless [ml-nlp-rerank-availability-serverless] Elastic Rerank is available in {{es}} Serverless projects as of November 25, 2024. - -### Elastic Cloud Hosted and self-managed deployments [ml-nlp-rerank-availability-elastic-stack] +### Elastic Cloud Hosted and self-managed deployments [ml-nlp-rerank-availability-elastic-stack] Elastic Rerank is available in Elastic Stack version 8.17+: * To use Elastic Rerank, you must have the appropriate subscription level or the trial period activated. * A 4GB ML node - ::::{important} + ::::{important} Deploying the Elastic Rerank model in combination with ELSER (or other hosted models) requires at minimum an 8GB ML node. The current maximum size for trial ML nodes is 4GB (defaults to 1GB). :::: - - -## Download and deploy [ml-nlp-rerank-deploy] +## Download and deploy [ml-nlp-rerank-deploy] To download and deploy Elastic Rerank, use the [create inference API](../../../solutions/search/inference-api/elasticsearch-inference-integration.md) to create an {{es}} service `rerank` endpoint. @@ -54,15 +48,13 @@ Refer to this [Python notebook](https://github.com/elastic/elasticsearch-labs/bl :::: - - -### Create an inference endpoint [ml-nlp-rerank-deploy-steps] +### Create an inference endpoint [ml-nlp-rerank-deploy-steps] 1. In {{kib}}, navigate to the **Dev Console**. 2. Create an {{infer}} endpoint with the Elastic Rerank service by running: - ```console - PUT _inference/rerank/my-rerank-model +```console +PUT _inference/rerank/my-rerank-model { "service": "elasticsearch", "service_settings": { @@ -75,42 +67,37 @@ Refer to this [Python notebook](https://github.com/elastic/elasticsearch-labs/bl "model_id": ".rerank-v1" } } - ``` - - ::::{note} - The API request automatically downloads and deploys the model. This example uses [autoscaling](ml-nlp-auto-scale.md) through adaptive allocation. - :::: +``` +::::{note} +The API request automatically downloads and deploys the model. This example uses [autoscaling](ml-nlp-auto-scale.md) through adaptive allocation. +:::: -::::{note} +::::{note} You might see a 502 bad gateway error in the response when using the {{kib}} Console. This error usually just reflects a timeout, while the model downloads in the background. You can check the download progress in the {{ml-app}} UI. If using the Python client, you can set the `timeout` parameter to a higher value. :::: - After creating the Elastic Rerank {{infer}} endpoint, it’s ready to use with a [`text_similarity_reranker`](https://www.elastic.co/guide/en/elasticsearch/reference/current/retriever.html#text-similarity-reranker-retriever-example-elastic-rerank) retriever. - -## Deploy in an air-gapped environment [ml-nlp-rerank-deploy-verify] +## Deploy in an air-gapped environment [ml-nlp-rerank-deploy-verify] If you want to deploy the Elastic Rerank model in a restricted or closed network, you have two options: * Create your own HTTP/HTTPS endpoint with the model artifacts on it * Put the model artifacts into a directory inside the config directory on all master-eligible nodes. - -### Model artifact files [ml-nlp-rerank-model-artifacts] +### Model artifact files [ml-nlp-rerank-model-artifacts] For the cross-platform version, you need the following files in your system: -``` +```url https://ml-models.elastic.co/rerank-v1.metadata.json https://ml-models.elastic.co/rerank-v1.pt https://ml-models.elastic.co/rerank-v1.vocab.json ``` - -### Using an HTTP server [_using_an_http_server_2] +### Using an HTTP server [_using_an_http_server_2] INFO: If you use an existing HTTP server, note that the model downloader only supports passwordless HTTP servers. @@ -131,7 +118,7 @@ You can use any HTTP service to deploy the model. This example uses the official 4. Verify that Nginx runs properly by visiting the following URL in your browser: - ``` + ```url http://{IP_ADDRESS_OR_HOSTNAME}:8080/rerank-v1.metadata.json ``` @@ -139,7 +126,7 @@ You can use any HTTP service to deploy the model. This example uses the official 5. Point your {{es}} deployment to the model artifacts on the HTTP server by adding the following line to the `config/elasticsearch.yml` file: - ``` + ```yml xpack.ml.model_repository: http://{IP_ADDRESS_OR_HOSTNAME}:8080 ``` @@ -155,8 +142,7 @@ The HTTP server is only required for downloading the model. After the download h docker stop ml-models ``` - -### Using file-based access [_using_file_based_access_2] +### Using file-based access [_using_file_based_access_2] For a file-based access, follow these steps: @@ -164,7 +150,7 @@ For a file-based access, follow these steps: 2. Put the files into a `models` subdirectory inside the `config` directory of your {{es}} deployment. 3. Point your {{es}} deployment to the model directory by adding the following line to the `config/elasticsearch.yml` file: - ``` + ```yml xpack.ml.model_repository: file://${path.home}/config/models/ ``` @@ -172,8 +158,7 @@ For a file-based access, follow these steps: 5. [Restart](../../../deploy-manage/maintenance/start-stop-services/full-cluster-restart-rolling-restart-procedures.md#restart-cluster-rolling) the master-eligible nodes one by one. 6. Create an inference endpoint to deploy the model per [these steps](#ml-nlp-rerank-deploy-steps). - -## Limitations [ml-nlp-rerank-limitations] +## Limitations [ml-nlp-rerank-limitations] * English language only * Maximum context window of 512 tokens @@ -182,61 +167,49 @@ For a file-based access, follow these steps: When the combined inputs exceed the 512 token limit, a balanced truncation strategy is used. If both the query and input text are longer than 255 tokens each then both are truncated, otherwise the longest is truncated. - - -## Performance considerations [ml-nlp-rerank-perf-considerations] +## Performance considerations [ml-nlp-rerank-perf-considerations] It’s important to note that if you rerank to depth `n` then you will need to run `n` inferences per query. This will include the document text and will therefore be significantly more expensive than inference for query embeddings. Hardware can be scaled to run these inferences in parallel, but we would recommend shallow reranking for CPU inference: no more than top-30 results. You may find that the preview version is cost prohibitive for high query rates and low query latency requirements. We plan to address performance issues for GA. - -## Model specifications [ml-nlp-rerank-model-specs] +## Model specifications [ml-nlp-rerank-model-specs] * Purpose-built for English language content * Relatively small: 184M parameters (86M backbone + 98M embedding layer) * Matches performance of billion-parameter reranking models * Built directly into {{es}} - no external services or dependencies needed - -## Model architecture [ml-nlp-rerank-arch-overview] +## Model architecture [ml-nlp-rerank-arch-overview] Elastic Rerank is built on the [DeBERTa v3](https://arxiv.org/abs/2111.09543) language model architecture. The model employs several key architectural features that make it particularly effective for reranking: * **Disentangled attention mechanism** enables the model to: - - * Process word content and position separately - * Learn more nuanced relationships between query and document text - * Better understand the semantic importance of word positions and relationships + * Process word content and position separately + * Learn more nuanced relationships between query and document text + * Better understand the semantic importance of word positions and relationships * **ELECTRA-style pre-training** uses: + * A GAN-like approach to token prediction + * Simultaneous training of token generation and detection + * Enhanced parameter efficiency compared to traditional masked language modeling - * A GAN-like approach to token prediction - * Simultaneous training of token generation and detection - * Enhanced parameter efficiency compared to traditional masked language modeling - - - -## Training process [ml-nlp-rerank-arch-training] +## Training process [ml-nlp-rerank-arch-training] Here is an overview of the Elastic Rerank model training process: * **Initial relevance extraction** - - * Fine-tunes the pre-trained DeBERTa [CLS] token representation - * Uses a GeLU activation and dropout layer - * Preserves important pre-trained knowledge while adapting to the reranking task + * Fine-tunes the pre-trained DeBERTa [CLS] token representation + * Uses a GeLU activation and dropout layer + * Preserves important pre-trained knowledge while adapting to the reranking task * **Trained by distillation** + * Uses an ensemble of bi-encoder and cross-encoder models as a teacher + * Bi-encoder provides nuanced negative example assessment + * Cross-encoder helps differentiate between positive and negative examples + * Combines strengths of both model types - * Uses an ensemble of bi-encoder and cross-encoder models as a teacher - * Bi-encoder provides nuanced negative example assessment - * Cross-encoder helps differentiate between positive and negative examples - * Combines strengths of both model types - - - -### Training data [ml-nlp-rerank-arch-data] +### Training data [ml-nlp-rerank-arch-data] The training data consists of: @@ -250,14 +223,11 @@ The data preparation process includes: * Basic cleaning and fuzzy deduplication * Multi-stage prompting for diverse topics (on the synthetic portion of the training data only) * Varied query types: + * Keyword search + * Exact phrase matching + * Short and long natural language questions - * Keyword search - * Exact phrase matching - * Short and long natural language questions - - - -### Negative sampling [ml-nlp-rerank-arch-sampling] +### Negative sampling [ml-nlp-rerank-arch-sampling] The model uses an advanced sampling strategy to ensure high-quality rankings: @@ -265,14 +235,11 @@ The model uses an advanced sampling strategy to ensure high-quality rankings: * Uses five negative samples per query - more than typical approaches * Applies probability distribution shaped by document scores for sampling * Deep sampling benefits: + * Improves model robustness across different retrieval depths + * Enhances score calibration + * Provides better handling of document diversity - * Improves model robustness across different retrieval depths - * Enhances score calibration - * Provides better handling of document diversity - - - -### Training optimization [ml-nlp-rerank-arch-optimization] +### Training optimization [ml-nlp-rerank-arch-optimization] The training process incorporates several key optimizations: @@ -286,20 +253,17 @@ Implemented parameter averaging along optimization trajectory: * Eliminates need for traditional learning rate scheduling and provides improvement in the final model quality - -## Performance [ml-nlp-rerank-performance] +## Performance [ml-nlp-rerank-performance] Elastic Rerank shows significant improvements in search quality across a wide range of retrieval tasks. - -### Overview [ml-nlp-rerank-performance-overview] +### Overview [ml-nlp-rerank-performance-overview] * Average 40% improvement in ranking quality when reranking BM25 results * 184M parameter model matches performance of 2B parameter alternatives * Evaluated across 21 different datasets using the BEIR benchmark suite - -### Key benchmark results [ml-nlp-rerank-performance-benchmarks] +### Key benchmark results [ml-nlp-rerank-performance-benchmarks] * Natural Questions: 90% improvement * MS MARCO: 85% improvement @@ -308,8 +272,7 @@ Elastic Rerank shows significant improvements in search quality across a wide ra For detailed benchmark information, including complete dataset results and methodology, refer to the [Introducing Elastic Rerank blog](https://www.elastic.co/search-labs/blog/elastic-semantic-reranker-part-2). - -## Further resources [ml-nlp-rerank-resources] +## Further resources [ml-nlp-rerank-resources] **Documentation**: @@ -325,4 +288,3 @@ For detailed benchmark information, including complete dataset results and metho **Python notebooks**: * [End-to-end example using Elastic Rerank in Python](https://github.com/elastic/elasticsearch-labs/blob/main/notebooks/search/12-semantic-reranking-elastic-rerank.ipynb) - diff --git a/explore-analyze/machine-learning/nlp/nlp-example.md b/explore-analyze/machine-learning/nlp/nlp-example.md index 73e5b70b0f..f9094ecdc2 100644 --- a/explore-analyze/machine-learning/nlp/nlp-example.md +++ b/explore-analyze/machine-learning/nlp/nlp-example.md @@ -4,11 +4,8 @@ mapped_pages: - https://www.elastic.co/guide/en/elasticsearch/reference/current/nlp-example.html --- - - # NLP example [nlp-example] - This guide focuses on a concrete task: getting a machine learning trained model loaded into Elasticsearch and set up to enrich your documents. Elasticsearch supports many different ways to use machine learning models. In this guide, we will use a trained model to enrich documents at ingest time using ingest pipelines configured within Kibana’s **Content** UI. @@ -32,8 +29,7 @@ Follow the instructions to load a text classification model and set it up to enr * [Summary](#nlp-example-summary) * [Learn more](#nlp-example-learn-more) - -## Create an {{ecloud}} deployment [nlp-example-cloud-deployment] +## Create an {{ecloud}} deployment [nlp-example-cloud-deployment] Your deployment will need a machine learning instance to upload and deploy trained models. @@ -45,7 +41,6 @@ Follow the steps to **Create** a new deployment. Make sure to add capacity to th Enriching documents using machine learning was introduced in Enterprise Search **8.5.0**, so be sure to use version **8.5.0 or later**. - ## Clone Eland [nlp-example-clone-eland] Elastic’s [Eland](https://github.com/elastic/eland) tool makes it easy to upload trained models to your deployment via Docker. @@ -60,8 +55,7 @@ cd eland docker build -t elastic/eland . ``` - -## Deploy the trained model [nlp-example-deploy-model] +## Deploy the trained model [nlp-example-deploy-model] Now that you have a deployment and a way to upload models, you will need to choose a trained model that fits your data. [Hugging Face](https://huggingface.co/) has a large repository of publicly available trained models. The model you choose will depend on your data and what you would like to do with it. @@ -89,8 +83,7 @@ docker run -it --rm --network host \ This script should take roughly 2-3 minutes to run. Once your model has been successfully deployed to your Elastic deployment, navigate to Kibana’s **Trained Models** page to verify it is ready. You can find this page under **Machine Learning > Analytics** menu and then **Trained Models > Model Management**. If you do not see your model in the list, you may need to click **Synchronize your jobs and trained models**. Your model is now ready to be used. - -## Create an index and define an ML inference pipeline [nlp-example-create-index-and-define-ml-inference-pipeline] +## Create an index and define an ML inference pipeline [nlp-example-create-index-and-define-ml-inference-pipeline] We are now ready to use Kibana’s **Content** UI to enrich our documents with inference data. Before we ingest photo comments into Elasticsearch, we will first create an ML inference pipeline. The pipeline will enrich the incoming photo comments with inference data indicating if the comments are positive. @@ -136,8 +129,7 @@ Next, we’ll add an inference pipeline. You can also run example documents through a simulator and review the pipeline before creating it. - -## Index documents [nlp-example-index-documents] +## Index documents [nlp-example-index-documents] At this point, everything is ready to enrich documents at index time. @@ -184,8 +176,7 @@ The document has new fields with the enriched data. The `ml.inference.positivity From here, we can write search queries to boost on `ml.inference.positivity_result.predicted_value`. This field will also be stored in a top-level `positivity_result` field if the model was confident enough. - -## Summary [nlp-example-summary] +## Summary [nlp-example-summary] In this guide, we covered how to: @@ -195,8 +186,7 @@ In this guide, we covered how to: * Enrich documents with inference results from the trained model at ingest time. * Query your search engine and sort by `positivity_result`. - -## Learn more [nlp-example-learn-more] +## Learn more [nlp-example-learn-more] * [Compatible third party models^](ml-nlp-model-ref.md) * [NLP Overview^](ml-nlp-overview.md) @@ -204,4 +194,3 @@ In this guide, we covered how to: * [Deploying a model ML guide^](ml-nlp-deploy-models.md) * [Eland Authentication methods^](ml-nlp-import-model.md#ml-nlp-authentication) * [Adding inference pipelines](inference-processing.md#ingest-pipeline-search-inference-add-inference-processors) -