Skip to content

Commit 368fd9c

Browse files
authored
[E&A] Refines NLP section (#328)
* [E&A] Refines NLP section. * [E&A] Adds authentication methods.
1 parent 7f83ea7 commit 368fd9c

15 files changed

+36
-73
lines changed

explore-analyze/machine-learning/anomaly-detection/anomaly-detection-scale.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ There are many advanced configuration options for {{anomaly-jobs}}, some of them
99

1010
In this guide, you’ll learn how to:
1111

12-
* Understand the impact of configuration options on the performance of {anomaly-jobs}
12+
* Understand the impact of configuration options on the performance of {{anomaly-jobs}}
1313

1414
Prerequisites:
1515

explore-analyze/machine-learning/nlp.md

Lines changed: 9 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -7,12 +7,13 @@ mapped_pages:
77

88
You can use {{stack-ml-features}} to analyze natural language data and make predictions.
99

10-
* [*Overview*](nlp/ml-nlp-overview.md)
11-
* [*Deploy trained models*](nlp/ml-nlp-deploy-models.md)
12-
* [*Trained model autoscaling*](nlp/ml-nlp-auto-scale.md)
13-
* [*Add NLP {{infer}} to ingest pipelines*](nlp/ml-nlp-inference.md)
14-
* [*API quick reference*](nlp/ml-nlp-apis.md)
10+
* [Overview](nlp/ml-nlp-overview.md)
11+
* [Deploy trained models](nlp/ml-nlp-deploy-models.md)
12+
* [Trained model autoscaling](nlp/ml-nlp-auto-scale.md)
13+
* [Add NLP {{infer}} to ingest pipelines](nlp/ml-nlp-inference.md)
14+
* [API quick reference](nlp/ml-nlp-apis.md)
1515
* [ELSER](nlp/ml-nlp-elser.md)
16-
* [*Examples*](nlp/ml-nlp-examples.md)
17-
* [*Limitations*](nlp/ml-nlp-limitations.md)
18-
16+
* [E5](nlp/ml-nlp-e5.md)
17+
* [Language identification](nlp/ml-nlp-lang-ident.md)
18+
* [Examples](nlp/ml-nlp-examples.md)
19+
* [Limitations](nlp/ml-nlp-limitations.md)

explore-analyze/machine-learning/nlp/ml-nlp-apis.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,4 +34,3 @@ The {{infer}} APIs have the following base:
3434
* [Delete inference endpoint](https://www.elastic.co/guide/en/elasticsearch/reference/current/delete-inference-api.html)
3535
* [Get inference endpoint](https://www.elastic.co/guide/en/elasticsearch/reference/current/get-inference-api.html)
3636
* [Perform inference](https://www.elastic.co/guide/en/elasticsearch/reference/current/post-inference-api.html)
37-

explore-analyze/machine-learning/nlp/ml-nlp-auto-scale.md

Lines changed: 1 addition & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -16,8 +16,6 @@ There are two ways to enable autoscaling:
1616
To fully leverage model autoscaling, it is highly recommended to enable [{{es}} deployment autoscaling](../../../deploy-manage/autoscaling.md).
1717
::::
1818

19-
20-
2119
## Enabling autoscaling through APIs - adaptive allocations [nlp-model-adaptive-allocations]
2220

2321
Model allocations are independent units of work for NLP tasks. If you set the numbers of threads and allocations for a model manually, they remain constant even when not all the available resources are fully used or when the load on the model requires more resources. Instead of setting the number of allocations manually, you can enable adaptive allocations to set the number of allocations based on the load on the process. This can help you to manage performance and cost more easily. (Refer to the [pricing calculator](https://cloud.elastic.co/pricing) to learn more about the possible costs.)
@@ -31,15 +29,13 @@ You can enable adaptive allocations by using:
3129

3230
If the new allocations fit on the current {{ml}} nodes, they are immediately started. If more resource capacity is needed for creating new model allocations, then your {{ml}} node will be scaled up if {{ml}} autoscaling is enabled to provide enough resources for the new allocation. The number of model allocations can be scaled down to 0. They cannot be scaled up to more than 32 allocations, unless you explicitly set the maximum number of allocations to more. Adaptive allocations must be set up independently for each deployment and [{{infer}} endpoint](https://www.elastic.co/guide/en/elasticsearch/reference/current/put-inference-api.html).
3331

34-
3532
### Optimizing for typical use cases [optimize-use-case]
3633

3734
You can optimize your model deployment for typical use cases, such as search and ingest. When you optimize for ingest, the throughput will be higher, which increases the number of {{infer}} requests that can be performed in parallel. When you optimize for search, the latency will be lower during search processes.
3835

3936
* If you want to optimize for ingest, set the number of threads to `1` (`"threads_per_allocation": 1`).
4037
* If you want to optimize for search, set the number of threads to greater than `1`. Increasing the number of threads will make the search processes more performant.
4138

42-
4339
## Enabling autoscaling in {{kib}} - adaptive resources [nlp-model-adaptive-resources]
4440

4541
You can enable adaptive resources for your models when starting or updating the model deployment. Adaptive resources make it possible for {{es}} to scale up or down the available resources based on the load on the process. This can help you to manage performance and cost more easily. When adaptive resources are enabled, the number of vCPUs that the model deployment uses is set automatically based on the current load. When the load is high, the number of vCPUs that the process can use is automatically increased. When the load is low, the number of vCPUs that the process can use is automatically decreased.
@@ -53,7 +49,6 @@ Refer to the tables in the [Model deployment resource matrix](#auto-scaling-matr
5349
:class: screenshot
5450
:::
5551

56-
5752
## Model deployment resource matrix [auto-scaling-matrix]
5853

5954
The used resources for trained model deployments depend on three factors:
@@ -68,13 +63,10 @@ If you use {{es}} on-premises, vCPUs level ranges are derived from the `total_ml
6863
On Serverless, adaptive allocations are automatically enabled for all project types. However, the "Adaptive resources" control is not displayed in {{kib}} for Observability and Security projects.
6964
::::
7065

71-
72-
7366
### Deployments in Cloud optimized for ingest [_deployments_in_cloud_optimized_for_ingest]
7467

7568
In case of ingest-optimized deployments, we maximize the number of model allocations.
7669

77-
7870
#### Adaptive resources enabled [_adaptive_resources_enabled]
7971

8072
| Level | Allocations | Threads | vCPUs |
@@ -85,7 +77,6 @@ In case of ingest-optimized deployments, we maximize the number of model allocat
8577

8678
* The Cloud console doesn’t directly set an allocations limit; it only sets a vCPU limit. This vCPU limit indirectly determines the number of allocations, calculated as the vCPU limit divided by the number of threads.
8779

88-
8980
#### Adaptive resources disabled [_adaptive_resources_disabled]
9081

9182
| Level | Allocations | Threads | vCPUs |
@@ -96,12 +87,10 @@ In case of ingest-optimized deployments, we maximize the number of model allocat
9687

9788
* The Cloud console doesn’t directly set an allocations limit; it only sets a vCPU limit. This vCPU limit indirectly determines the number of allocations, calculated as the vCPU limit divided by the number of threads.
9889

99-
10090
### Deployments in Cloud optimized for search [_deployments_in_cloud_optimized_for_search]
10191

10292
In case of search-optimized deployments, we maximize the number of threads. The maximum number of threads that can be claimed depends on the hardware your architecture has.
10393

104-
10594
#### Adaptive resources enabled [_adaptive_resources_enabled_2]
10695

10796
| Level | Allocations | Threads | vCPUs |
@@ -112,7 +101,6 @@ In case of search-optimized deployments, we maximize the number of threads. The
112101

113102
* The Cloud console doesn’t directly set an allocations limit; it only sets a vCPU limit. This vCPU limit indirectly determines the number of allocations, calculated as the vCPU limit divided by the number of threads.
114103

115-
116104
#### Adaptive resources disabled [_adaptive_resources_disabled_2]
117105

118106
| Level | Allocations | Threads | vCPUs |
@@ -121,5 +109,4 @@ In case of search-optimized deployments, we maximize the number of threads. The
121109
| Medium | 2 (if threads=16) statically | maximum that the hardware allows (for example, 16) | 32 if available |
122110
| High | Maximum available set in the Cloud console *, statically | maximum that the hardware allows (for example, 16) | Maximum available set in the Cloud console, statically |
123111

124-
* The Cloud console doesn’t directly set an allocations limit; it only sets a vCPU limit. This vCPU limit indirectly determines the number of allocations, calculated as the vCPU limit divided by the number of threads.
125-
112+
\* The Cloud console doesn’t directly set an allocations limit; it only sets a vCPU limit. This vCPU limit indirectly determines the number of allocations, calculated as the vCPU limit divided by the number of threads.

explore-analyze/machine-learning/nlp/ml-nlp-classify-text.md

Lines changed: 3 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -11,13 +11,11 @@ These NLP tasks enable you to identify the language of text and classify or labe
1111
* [Text classification](#ml-nlp-text-classification)
1212
* [Zero-shot text classification](#ml-nlp-zero-shot)
1313

14-
15-
## {{lang-ident-cap}} [_lang_ident_cap]
14+
## {{lang-ident-cap}} [_lang_ident_cap]
1615

1716
The {{lang-ident}} model is provided out-of-the box in your {{es}} cluster. You can find the documentation of the model on the [{{lang-ident-cap}}](ml-nlp-lang-ident.md) page under the Built-in models section.
1817

19-
20-
## Text classification [ml-nlp-text-classification]
18+
## Text classification [ml-nlp-text-classification]
2119

2220
Text classification assigns the input text to one of multiple classes that best describe the text. The classes used depend on the model and the data set that was used to train it. Based on the number of classes, two main types of classification exist: binary classification, where the number of classes is exactly two, and multi-class classification, where the number of classes is more than two.
2321

@@ -39,8 +37,7 @@ Likewise, you might use a trained model to perform multi-class classification an
3937
...
4038
```
4139

42-
43-
## Zero-shot text classification [ml-nlp-zero-shot]
40+
## Zero-shot text classification [ml-nlp-zero-shot]
4441

4542
The zero-shot classification task offers the ability to classify text without training a model on a specific set of classes. Instead, you provide the classes when you deploy the model or at {{infer}} time. It uses a model trained on a large data set that has gained a general language understanding and asks the model how well the labels you provided fit with your text.
4643

@@ -95,4 +92,3 @@ The task returns the following result:
9592
```
9693

9794
Since you can adjust the labels while you perform {{infer}}, this type of task is exceptionally flexible. If you are consistently using the same labels, however, it might be better to use a fine-tuned text classification model.
98-

explore-analyze/machine-learning/nlp/ml-nlp-deploy-model.md

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,6 @@ Each deployment will be fine-tuned automatically based on its specific purpose y
2222
Since eland uses APIs to deploy the models, you cannot see the models in {{kib}} until the saved objects are synchronized. You can follow the prompts in {{kib}}, wait for automatic synchronization, or use the [sync {{ml}} saved objects API](https://www.elastic.co/guide/en/kibana/current/machine-learning-api-sync.html).
2323
::::
2424

25-
2625
You can define the resource usage level of the NLP model during model deployment. The resource usage levels behave differently depending on [adaptive resources](ml-nlp-auto-scale.md#nlp-model-adaptive-resources) being enabled or disabled. When adaptive resources are disabled but {{ml}} autoscaling is enabled, vCPU usage of Cloud deployments derived from the Cloud console and functions as follows:
2726

2827
* Low: This level limits resources to two vCPUs, which may be suitable for development, testing, and demos depending on your parameters. It is not recommended for production use
@@ -31,7 +30,6 @@ You can define the resource usage level of the NLP model during model deployment
3130

3231
For the resource levels when adaptive resources are enabled, refer to <[*Trained model autoscaling*](ml-nlp-auto-scale.md).
3332

34-
3533
## Request queues and search priority [infer-request-queues]
3634

3735
Each allocation of a model deployment has a dedicated queue to buffer {{infer}} requests. The size of this queue is determined by the `queue_capacity` parameter in the [start trained model deployment API](https://www.elastic.co/guide/en/elasticsearch/reference/current/start-trained-model-deployment.html). When the queue reaches its maximum capacity, new requests are declined until some of the queued requests are processed, creating available capacity once again. When multiple ingest pipelines reference the same deployment, the queue can fill up, resulting in rejected requests. Consider using dedicated deployments to prevent this situation.

explore-analyze/machine-learning/nlp/ml-nlp-deploy-models.md

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -11,8 +11,3 @@ If you want to perform {{nlp}} tasks in your cluster, you must deploy an appropr
1111
2. [Import the trained model and vocabulary](ml-nlp-import-model.md).
1212
3. [Deploy the model in your cluster](ml-nlp-deploy-model.md).
1313
4. [Try it out](ml-nlp-test-inference.md).
14-
15-
16-
17-
18-

explore-analyze/machine-learning/nlp/ml-nlp-extract-info.md

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,6 @@ These NLP tasks enable you to extract information from your unstructured text:
1111
* [Fill-mask](#ml-nlp-mask)
1212
* [Question answering](#ml-nlp-question-answering)
1313

14-
1514
## Named entity recognition [ml-nlp-ner]
1615

1716
The named entity recognition (NER) task can identify and categorize certain entities - typically proper nouns - in your unstructured text. Named entities usually refer to objects in the real world such as persons, locations, organizations, and other miscellaneous entities that are consistently referenced by a proper name.
@@ -53,7 +52,6 @@ The task returns the following result:
5352
...
5453
```
5554

56-
5755
## Fill-mask [ml-nlp-mask]
5856

5957
The objective of the fill-mask task is to predict a missing word from a text sequence. The model uses the context of the masked word to predict the most likely word to complete the text.
@@ -80,7 +78,6 @@ The task returns the following result:
8078
...
8179
```
8280

83-
8481
## Question answering [ml-nlp-question-answering]
8582

8683
The question answering (or extractive question answering) task makes it possible to get answers to certain questions by extracting information from the provided text.
@@ -105,4 +102,3 @@ The answer is shown by the object below:
105102
}
106103
...
107104
```
108-

explore-analyze/machine-learning/nlp/ml-nlp-import-model.md

Lines changed: 20 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -9,17 +9,14 @@ mapped_pages:
99
If you want to install a trained model in a restricted or closed network, refer to [these instructions](https://www.elastic.co/guide/en/elasticsearch/client/eland/current/machine-learning.html#ml-nlp-pytorch-air-gapped).
1010
::::
1111

12-
1312
After you choose a model, you must import it and its tokenizer vocabulary to your cluster. When you import the model, it must be chunked and imported one chunk at a time for storage in parts due to its size.
1413

1514
::::{note}
1615
Trained models must be in a TorchScript representation for use with {{stack-ml-features}}.
1716
::::
1817

19-
2018
[Eland](https://github.com/elastic/eland) is an {{es}} Python client that provides a simple script to perform the conversion of Hugging Face transformer models to their TorchScript representations, the chunking process, and upload to {{es}}; it is therefore the recommended import method. You can either install the Python Eland client on your machine or use a Docker image to build Eland and run the model import script.
2119

22-
2320
## Import with the Eland client installed [ml-nlp-import-script]
2421

2522
1. Install the [Eland Python client](https://www.elastic.co/guide/en/elasticsearch/client/eland/current/installation.html) with PyTorch extra dependencies.
@@ -30,7 +27,7 @@ Trained models must be in a TorchScript representation for use with {{stack-ml-f
3027

3128
2. Run the `eland_import_hub_model` script to download the model from Hugging Face, convert it to TorchScript format, and upload to the {{es}} cluster. For example:
3229

33-
```shell
30+
```
3431
eland_import_hub_model \
3532
--cloud-id <cloud-id> \ <1>
3633
-u <username> -p <password> \ <2>
@@ -43,10 +40,8 @@ Trained models must be in a TorchScript representation for use with {{stack-ml-f
4340
3. Specify the identifier for the model in the Hugging Face model hub.
4441
4. Specify the type of NLP task. Supported values are `fill_mask`, `ner`, `question_answering`, `text_classification`, `text_embedding`, `text_expansion`, `text_similarity`, and `zero_shot_classification`.
4542

46-
4743
For more details, refer to [https://www.elastic.co/guide/en/elasticsearch/client/eland/current/machine-learning.html#ml-nlp-pytorch](https://www.elastic.co/guide/en/elasticsearch/client/eland/current/machine-learning.html#ml-nlp-pytorch).
4844

49-
5045
## Import with Docker [ml-nlp-import-docker]
5146

5247
If you want to use Eland without installing it, run the following command:
@@ -65,9 +60,26 @@ docker run -it --rm docker.elastic.co/eland/eland \
6560
--start
6661
```
6762

68-
Replace the `$ELASTICSEARCH_URL` with the URL for your {{es}} cluster. Refer to [Authentication methods](https://www.elastic.co/guide/en/machine-learning/current/ml-nlp-authentication.html) to learn more.
63+
Replace the `$ELASTICSEARCH_URL` with the URL for your {{es}} cluster. Refer to [Authentication methods](#ml-nlp-authentication) to learn more.
64+
65+
## Authentication methods [ml-nlp-authentication]
6966

67+
The following authentication options are available when using the import script:
7068

69+
* username/password authentication (specified with the `-u` and `-p` options):
70+
71+
```bash
72+
eland_import_hub_model --url https://<hostname>:<port> -u <username> -p <password> ...
73+
```
74+
75+
* username/password authentication (embedded in the URL):
76+
77+
```bash
78+
eland_import_hub_model --url https://<user>:<password>@<hostname>:<port> ...
79+
```
7180

81+
* API key authentication:
7282

73-
$$$ml-nlp-authentication$$$
83+
```bash
84+
eland_import_hub_model --url https://<hostname>:<port> --es-api-key <api-key> ...
85+
```

explore-analyze/machine-learning/nlp/ml-nlp-inference.md

Lines changed: 0 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,6 @@ After you [deploy a trained model in your cluster](ml-nlp-deploy-models.md), you
1212
3. [Ingest documents](#ml-nlp-inference-ingest-docs).
1313
4. [View the results](#ml-nlp-inference-discover).
1414

15-
1615
## Add an {{infer}} processor to an ingest pipeline [ml-nlp-inference-processor]
1716

1817
In {{kib}}, you can create and edit pipelines in **{{stack-manage-app}}** > **Ingest Pipelines**. To open **Ingest Pipelines**, find **{{stack-manage-app}}** in the main menu, or use the [global search field](../../overview/kibana-quickstart.md#_finding_your_apps_and_objects).
@@ -94,8 +93,6 @@ In {{kib}}, you can create and edit pipelines in **{{stack-manage-app}}** > **In
9493

9594
3. If everything looks correct, close the panel, and click **Create pipeline**. The pipeline is now ready for use.
9695

97-
98-
9996
## Ingest documents [ml-nlp-inference-ingest-docs]
10097

10198
You can now use your ingest pipeline to perform NLP tasks on your data.
@@ -120,7 +117,6 @@ PUT ner-test
120117
To use the `annotated_text` data type in this example, you must install the [mapper annotated text plugin](https://www.elastic.co/guide/en/elasticsearch/plugins/current/mapper-annotated-text.html). For more installation details, refer to [Add plugins provided with {{ess}}](https://www.elastic.co/guide/en/cloud/current/ec-adding-elastic-plugins.html).
121118
::::
122119

123-
124120
You can then use the new pipeline to index some documents. For example, use a bulk indexing request with the `pipeline` query parameter for your NER pipeline:
125121

126122
```console
@@ -168,8 +164,6 @@ However, those web log messages are unlikely to contain enough words for the mod
168164
Set the reindex `size` option to a value smaller than the `queue_capacity` for the trained model deployment. Otherwise, requests might be rejected with a "too many requests" 429 error code.
169165
::::
170166

171-
172-
173167
## View the results [ml-nlp-inference-discover]
174168

175169
Before you can verify the results of the pipelines, you must [create {{data-sources}}](../../find-and-organize/data-views.md). Then you can explore your data in **Discover**:
@@ -190,7 +184,6 @@ In this {{lang-ident}} example, the `ml.inference.predicted_value` contains the
190184

191185
To learn more about ingest pipelines and all of the other processors that you can add, refer to [Ingest pipelines](../../../manage-data/ingest/transform-enrich/ingest-pipelines.md).
192186

193-
194187
## Common problems [ml-nlp-inference-common-problems]
195188

196189
If you encounter problems while using your trained model in an ingest pipeline, check the following possible causes:
@@ -201,7 +194,6 @@ If you encounter problems while using your trained model in an ingest pipeline,
201194

202195
These common failure scenarios and others can be captured by adding failure processors to your pipeline. For more examples, refer to [Handling pipeline failures](../../../manage-data/ingest/transform-enrich/ingest-pipelines.md#handling-pipeline-failures).
203196

204-
205197
## Further reading [nlp-example-reading]
206198

207199
* [How to deploy NLP: Text Embeddings and Vector Search](https://www.elastic.co/blog/how-to-deploy-nlp-text-embeddings-and-vector-search)

0 commit comments

Comments
 (0)