elastic · szabosteve · Feb 5, 2025 · Feb 5, 2025 · Feb 5, 2025
@@ -9,7 +9,7 @@ There are many advanced configuration options for {{anomaly-jobs}}, some of them
 
 In this guide, you’ll learn how to:
 
-* Understand the impact of configuration options on the performance of {anomaly-jobs}
+* Understand the impact of configuration options on the performance of {{anomaly-jobs}}
 
 Prerequisites:
 

@@ -7,12 +7,13 @@ mapped_pages:
 
 You can use {{stack-ml-features}} to analyze natural language data and make predictions.
 
-* [*Overview*](nlp/ml-nlp-overview.md)
-* [*Deploy trained models*](nlp/ml-nlp-deploy-models.md)
-* [*Trained model autoscaling*](nlp/ml-nlp-auto-scale.md)
-* [*Add NLP {{infer}} to ingest pipelines*](nlp/ml-nlp-inference.md)
-* [*API quick reference*](nlp/ml-nlp-apis.md)
+* [Overview](nlp/ml-nlp-overview.md)
+* [Deploy trained models](nlp/ml-nlp-deploy-models.md)
+* [Trained model autoscaling](nlp/ml-nlp-auto-scale.md)
+* [Add NLP {{infer}} to ingest pipelines](nlp/ml-nlp-inference.md)
+* [API quick reference](nlp/ml-nlp-apis.md)
 * [ELSER](nlp/ml-nlp-elser.md)
-* [*Examples*](nlp/ml-nlp-examples.md)
-* [*Limitations*](nlp/ml-nlp-limitations.md)
-
+* [E5](nlp/ml-nlp-e5.md)
+* [Language identification](nlp/ml-nlp-lang-ident.md)
+* [Examples](nlp/ml-nlp-examples.md)
+* [Limitations](nlp/ml-nlp-limitations.md)
@@ -34,4 +34,3 @@ The {{infer}} APIs have the following base:
 * [Delete inference endpoint](https://www.elastic.co/guide/en/elasticsearch/reference/current/delete-inference-api.html)
 * [Get inference endpoint](https://www.elastic.co/guide/en/elasticsearch/reference/current/get-inference-api.html)
 * [Perform inference](https://www.elastic.co/guide/en/elasticsearch/reference/current/post-inference-api.html)
-
@@ -16,8 +16,6 @@ There are two ways to enable autoscaling:
 To fully leverage model autoscaling, it is highly recommended to enable [{{es}} deployment autoscaling](../../../deploy-manage/autoscaling.md).
 ::::
 
-
-
 ## Enabling autoscaling through APIs - adaptive allocations [nlp-model-adaptive-allocations]
 
 Model allocations are independent units of work for NLP tasks. If you set the numbers of threads and allocations for a model manually, they remain constant even when not all the available resources are fully used or when the load on the model requires more resources. Instead of setting the number of allocations manually, you can enable adaptive allocations to set the number of allocations based on the load on the process. This can help you to manage performance and cost more easily. (Refer to the [pricing calculator](https://cloud.elastic.co/pricing) to learn more about the possible costs.)
@@ -31,15 +29,13 @@ You can enable adaptive allocations by using:
 
 If the new allocations fit on the current {{ml}} nodes, they are immediately started. If more resource capacity is needed for creating new model allocations, then your {{ml}} node will be scaled up if {{ml}} autoscaling is enabled to provide enough resources for the new allocation. The number of model allocations can be scaled down to 0. They cannot be scaled up to more than 32 allocations, unless you explicitly set the maximum number of allocations to more. Adaptive allocations must be set up independently for each deployment and [{{infer}} endpoint](https://www.elastic.co/guide/en/elasticsearch/reference/current/put-inference-api.html).
 
-
 ### Optimizing for typical use cases [optimize-use-case]
 
 You can optimize your model deployment for typical use cases, such as search and ingest. When you optimize for ingest, the throughput will be higher, which increases the number of {{infer}} requests that can be performed in parallel. When you optimize for search, the latency will be lower during search processes.
 
 * If you want to optimize for ingest, set the number of threads to `1` (`"threads_per_allocation": 1`).
 * If you want to optimize for search, set the number of threads to greater than `1`. Increasing the number of threads will make the search processes more performant.
 
-
 ## Enabling autoscaling in {{kib}} - adaptive resources [nlp-model-adaptive-resources]
 
 You can enable adaptive resources for your models when starting or updating the model deployment. Adaptive resources make it possible for {{es}} to scale up or down the available resources based on the load on the process. This can help you to manage performance and cost more easily. When adaptive resources are enabled, the number of vCPUs that the model deployment uses is set automatically based on the current load. When the load is high, the number of vCPUs that the process can use is automatically increased. When the load is low, the number of vCPUs that the process can use is automatically decreased.
@@ -53,7 +49,6 @@ Refer to the tables in the [Model deployment resource matrix](#auto-scaling-matr
 :class: screenshot
 :::
 
-
 ## Model deployment resource matrix [auto-scaling-matrix]
 
 The used resources for trained model deployments depend on three factors:
@@ -68,13 +63,10 @@ If you use {{es}} on-premises, vCPUs level ranges are derived from the `total_ml
 On Serverless, adaptive allocations are automatically enabled for all project types. However, the "Adaptive resources" control is not displayed in {{kib}} for Observability and Security projects.
 ::::
 
-
-
 ### Deployments in Cloud optimized for ingest [_deployments_in_cloud_optimized_for_ingest]
 
 In case of ingest-optimized deployments, we maximize the number of model allocations.
 
-
 #### Adaptive resources enabled [_adaptive_resources_enabled]
 
 | Level | Allocations | Threads | vCPUs |
@@ -85,7 +77,6 @@ In case of ingest-optimized deployments, we maximize the number of model allocat
 
 * The Cloud console doesn’t directly set an allocations limit; it only sets a vCPU limit. This vCPU limit indirectly determines the number of allocations, calculated as the vCPU limit divided by the number of threads.
 
-
 #### Adaptive resources disabled [_adaptive_resources_disabled]
 
 | Level | Allocations | Threads | vCPUs |
@@ -96,12 +87,10 @@ In case of ingest-optimized deployments, we maximize the number of model allocat
 
 * The Cloud console doesn’t directly set an allocations limit; it only sets a vCPU limit. This vCPU limit indirectly determines the number of allocations, calculated as the vCPU limit divided by the number of threads.
 
-
 ### Deployments in Cloud optimized for search [_deployments_in_cloud_optimized_for_search]
 
 In case of search-optimized deployments, we maximize the number of threads. The maximum number of threads that can be claimed depends on the hardware your architecture has.
 
-
 #### Adaptive resources enabled [_adaptive_resources_enabled_2]
 
 | Level | Allocations | Threads | vCPUs |
@@ -112,7 +101,6 @@ In case of search-optimized deployments, we maximize the number of threads. The
 
 * The Cloud console doesn’t directly set an allocations limit; it only sets a vCPU limit. This vCPU limit indirectly determines the number of allocations, calculated as the vCPU limit divided by the number of threads.
 
-
 #### Adaptive resources disabled [_adaptive_resources_disabled_2]
 
 | Level | Allocations | Threads | vCPUs |
@@ -121,5 +109,4 @@ In case of search-optimized deployments, we maximize the number of threads. The
 | Medium | 2 (if threads=16) statically | maximum that the hardware allows (for example, 16) | 32 if available |
 | High | Maximum available set in the Cloud console *, statically | maximum that the hardware allows (for example, 16) | Maximum available set in the Cloud console, statically |
 
-* The Cloud console doesn’t directly set an allocations limit; it only sets a vCPU limit. This vCPU limit indirectly determines the number of allocations, calculated as the vCPU limit divided by the number of threads.
-
+\* The Cloud console doesn’t directly set an allocations limit; it only sets a vCPU limit. This vCPU limit indirectly determines the number of allocations, calculated as the vCPU limit divided by the number of threads.
@@ -11,13 +11,11 @@ These NLP tasks enable you to identify the language of text and classify or labe
 * [Text classification](#ml-nlp-text-classification)
 * [Zero-shot text classification](#ml-nlp-zero-shot)
 
-
-## {{lang-ident-cap}} [_lang_ident_cap] 
+## {{lang-ident-cap}} [_lang_ident_cap]
 
 The {{lang-ident}} model is provided out-of-the box in your {{es}} cluster. You can find the documentation of the model on the [{{lang-ident-cap}}](ml-nlp-lang-ident.md) page under the Built-in models section.
 
-
-## Text classification [ml-nlp-text-classification] 
+## Text classification [ml-nlp-text-classification]
 
 Text classification assigns the input text to one of multiple classes that best describe the text. The classes used depend on the model and the data set that was used to train it. Based on the number of classes, two main types of classification exist: binary classification, where the number of classes is exactly two, and multi-class classification, where the number of classes is more than two.
 
@@ -39,8 +37,7 @@ Likewise, you might use a trained model to perform multi-class classification an
 ...
 ```
 
-
-## Zero-shot text classification [ml-nlp-zero-shot] 
+## Zero-shot text classification [ml-nlp-zero-shot]
 
 The zero-shot classification task offers the ability to classify text without training a model on a specific set of classes. Instead, you provide the classes when you deploy the model or at {{infer}} time. It uses a model trained on a large data set that has gained a general language understanding and asks the model how well the labels you provided fit with your text.
 
@@ -95,4 +92,3 @@ The task returns the following result:
 ```
 
 Since you can adjust the labels while you perform {{infer}}, this type of task is exceptionally flexible. If you are consistently using the same labels, however, it might be better to use a fine-tuned text classification model.
-
@@ -22,7 +22,6 @@ Each deployment will be fine-tuned automatically based on its specific purpose y
 Since eland uses APIs to deploy the models, you cannot see the models in {{kib}} until the saved objects are synchronized. You can follow the prompts in {{kib}}, wait for automatic synchronization, or use the [sync {{ml}} saved objects API](https://www.elastic.co/guide/en/kibana/current/machine-learning-api-sync.html).
 ::::
 
-
 You can define the resource usage level of the NLP model during model deployment. The resource usage levels behave differently depending on [adaptive resources](ml-nlp-auto-scale.md#nlp-model-adaptive-resources) being enabled or disabled. When adaptive resources are disabled but {{ml}} autoscaling is enabled, vCPU usage of Cloud deployments derived from the Cloud console and functions as follows:
 
 * Low: This level limits resources to two vCPUs, which may be suitable for development, testing, and demos depending on your parameters. It is not recommended for production use
@@ -31,7 +30,6 @@ You can define the resource usage level of the NLP model during model deployment
 
 For the resource levels when adaptive resources are enabled, refer to <[*Trained model autoscaling*](ml-nlp-auto-scale.md).
 
-
 ## Request queues and search priority [infer-request-queues]
 
 Each allocation of a model deployment has a dedicated queue to buffer {{infer}} requests. The size of this queue is determined by the `queue_capacity` parameter in the [start trained model deployment API](https://www.elastic.co/guide/en/elasticsearch/reference/current/start-trained-model-deployment.html). When the queue reaches its maximum capacity, new requests are declined until some of the queued requests are processed, creating available capacity once again. When multiple ingest pipelines reference the same deployment, the queue can fill up, resulting in rejected requests. Consider using dedicated deployments to prevent this situation.

@@ -11,8 +11,3 @@ If you want to perform {{nlp}} tasks in your cluster, you must deploy an appropr
 2. [Import the trained model and vocabulary](ml-nlp-import-model.md).
 3. [Deploy the model in your cluster](ml-nlp-deploy-model.md).
 4. [Try it out](ml-nlp-test-inference.md).
-
-
-
-
-
@@ -11,7 +11,6 @@ These NLP tasks enable you to extract information from your unstructured text:
 * [Fill-mask](#ml-nlp-mask)
 * [Question answering](#ml-nlp-question-answering)
 
-
 ## Named entity recognition [ml-nlp-ner] 
 
 The named entity recognition (NER) task can identify and categorize certain entities - typically proper nouns - in your unstructured text. Named entities usually refer to objects in the real world such as persons, locations, organizations, and other miscellaneous entities that are consistently referenced by a proper name.
@@ -53,7 +52,6 @@ The task returns the following result:
 ...
 ```
 
-
 ## Fill-mask [ml-nlp-mask] 
 
 The objective of the fill-mask task is to predict a missing word from a text sequence. The model uses the context of the masked word to predict the most likely word to complete the text.
@@ -80,7 +78,6 @@ The task returns the following result:
 ...
 ```
 
-
 ## Question answering [ml-nlp-question-answering] 
 
 The question answering (or extractive question answering) task makes it possible to get answers to certain questions by extracting information from the provided text.
@@ -105,4 +102,3 @@ The answer is shown by the object below:
 }
 ...
 ```
-
@@ -9,17 +9,14 @@ mapped_pages:
 If you want to install a trained model in a restricted or closed network, refer to [these instructions](https://www.elastic.co/guide/en/elasticsearch/client/eland/current/machine-learning.html#ml-nlp-pytorch-air-gapped).
 ::::
 
-
 After you choose a model, you must import it and its tokenizer vocabulary to your cluster. When you import the model, it must be chunked and imported one chunk at a time for storage in parts due to its size.
 
 ::::{note}
 Trained models must be in a TorchScript representation for use with {{stack-ml-features}}.
 ::::
 
-
 [Eland](https://github.com/elastic/eland) is an {{es}} Python client that provides a simple script to perform the conversion of Hugging Face transformer models to their TorchScript representations, the chunking process, and upload to {{es}}; it is therefore the recommended import method. You can either install the Python Eland client on your machine or use a Docker image to build Eland and run the model import script.
 
-
 ## Import with the Eland client installed [ml-nlp-import-script]
 
 1. Install the [Eland Python client](https://www.elastic.co/guide/en/elasticsearch/client/eland/current/installation.html) with PyTorch extra dependencies.
@@ -30,7 +27,7 @@ Trained models must be in a TorchScript representation for use with {{stack-ml-f
 
 2. Run the `eland_import_hub_model` script to download the model from Hugging Face, convert it to TorchScript format, and upload to the {{es}} cluster. For example:
 
-    ```shell
+    ```
     eland_import_hub_model \
     --cloud-id <cloud-id> \ <1>
     -u <username> -p <password> \ <2>
@@ -43,10 +40,8 @@ Trained models must be in a TorchScript representation for use with {{stack-ml-f
     3. Specify the identifier for the model in the Hugging Face model hub.
     4. Specify the type of NLP task. Supported values are `fill_mask`, `ner`, `question_answering`, `text_classification`, `text_embedding`, `text_expansion`, `text_similarity`, and `zero_shot_classification`.
 
-
 For more details, refer to [https://www.elastic.co/guide/en/elasticsearch/client/eland/current/machine-learning.html#ml-nlp-pytorch](https://www.elastic.co/guide/en/elasticsearch/client/eland/current/machine-learning.html#ml-nlp-pytorch).
 
-
 ## Import with Docker [ml-nlp-import-docker]
 
 If you want to use Eland without installing it, run the following command:
@@ -65,9 +60,26 @@ docker run -it --rm docker.elastic.co/eland/eland \
       --start
 ```
 
-Replace the `$ELASTICSEARCH_URL` with the URL for your {{es}} cluster. Refer to [Authentication methods](https://www.elastic.co/guide/en/machine-learning/current/ml-nlp-authentication.html) to learn more.
+Replace the `$ELASTICSEARCH_URL` with the URL for your {{es}} cluster. Refer to [Authentication methods](#ml-nlp-authentication) to learn more.
+
+## Authentication methods [ml-nlp-authentication]
 
+The following authentication options are available when using the import script:
 
+* username/password authentication (specified with the `-u` and `-p` options):
+
+```bash
+eland_import_hub_model --url https://<hostname>:<port> -u <username> -p <password> ...
+```
+
+* username/password authentication (embedded in the URL):
+
+```bash
+eland_import_hub_model --url https://<user>:<password>@<hostname>:<port> ...
+```
 
+* API key authentication:
 
-$$$ml-nlp-authentication$$$
+```bash
+eland_import_hub_model --url https://<hostname>:<port> --es-api-key <api-key> ...
+```
@@ -12,7 +12,6 @@ After you [deploy a trained model in your cluster](ml-nlp-deploy-models.md), you
 3. [Ingest documents](#ml-nlp-inference-ingest-docs).
 4. [View the results](#ml-nlp-inference-discover).
 
-
 ## Add an {{infer}} processor to an ingest pipeline [ml-nlp-inference-processor]
 
 In {{kib}}, you can create and edit pipelines in **{{stack-manage-app}}** > **Ingest Pipelines**. To open **Ingest Pipelines**, find **{{stack-manage-app}}** in the main menu, or use the [global search field](../../overview/kibana-quickstart.md#_finding_your_apps_and_objects).
@@ -94,8 +93,6 @@ In {{kib}}, you can create and edit pipelines in **{{stack-manage-app}}** > **In
 
     3. If everything looks correct, close the panel, and click **Create pipeline**. The pipeline is now ready for use.
 
-
-
 ## Ingest documents [ml-nlp-inference-ingest-docs]
 
 You can now use your ingest pipeline to perform NLP tasks on your data.
@@ -120,7 +117,6 @@ PUT ner-test
 To use the `annotated_text` data type in this example, you must install the [mapper annotated text plugin](https://www.elastic.co/guide/en/elasticsearch/plugins/current/mapper-annotated-text.html). For more installation details, refer to [Add plugins provided with {{ess}}](https://www.elastic.co/guide/en/cloud/current/ec-adding-elastic-plugins.html).
 ::::
 
-
 You can then use the new pipeline to index some documents. For example, use a bulk indexing request with the `pipeline` query parameter for your NER pipeline:
 
 ```console
@@ -168,8 +164,6 @@ However, those web log messages are unlikely to contain enough words for the mod
 Set the reindex `size` option to a value smaller than the `queue_capacity` for the trained model deployment. Otherwise, requests might be rejected with a "too many requests" 429 error code.
 ::::
 
-
-
 ## View the results [ml-nlp-inference-discover]
 
 Before you can verify the results of the pipelines, you must [create {{data-sources}}](../../find-and-organize/data-views.md). Then you can explore your data in **Discover**:
@@ -190,7 +184,6 @@ In this {{lang-ident}} example, the `ml.inference.predicted_value` contains the
 
 To learn more about ingest pipelines and all of the other processors that you can add, refer to [Ingest pipelines](../../../manage-data/ingest/transform-enrich/ingest-pipelines.md).
 
-
 ## Common problems [ml-nlp-inference-common-problems]
 
 If you encounter problems while using your trained model in an ingest pipeline, check the following possible causes:
@@ -201,7 +194,6 @@ If you encounter problems while using your trained model in an ingest pipeline,
 
 These common failure scenarios and others can be captured by adding failure processors to your pipeline. For more examples, refer to [Handling pipeline failures](../../../manage-data/ingest/transform-enrich/ingest-pipelines.md#handling-pipeline-failures).
 
-
 ## Further reading [nlp-example-reading]
 
 * [How to deploy NLP: Text Embeddings and Vector Search](https://www.elastic.co/blog/how-to-deploy-nlp-text-embeddings-and-vector-search)
Original file line number	Diff line number	Diff line change
Expand Up		@@ -34,4 +34,3 @@ The {{infer}} APIs have the following base:
		* [Delete inference endpoint](https://www.elastic.co/guide/en/elasticsearch/reference/current/delete-inference-api.html)
		* [Get inference endpoint](https://www.elastic.co/guide/en/elasticsearch/reference/current/get-inference-api.html)
		* [Perform inference](https://www.elastic.co/guide/en/elasticsearch/reference/current/post-inference-api.html)
Original file line number	Diff line number	Diff line change
Expand Up		@@ -11,8 +11,3 @@ If you want to perform {{nlp}} tasks in your cluster, you must deploy an appropr
		2. [Import the trained model and vocabulary](ml-nlp-import-model.md).
		3. [Deploy the model in your cluster](ml-nlp-deploy-model.md).
		4. [Try it out](ml-nlp-test-inference.md).