You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: explore-analyze/machine-learning/nlp/ml-nlp-auto-scale.md
+1-14Lines changed: 1 addition & 14 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -16,8 +16,6 @@ There are two ways to enable autoscaling:
16
16
To fully leverage model autoscaling, it is highly recommended to enable [{{es}} deployment autoscaling](../../../deploy-manage/autoscaling.md).
17
17
::::
18
18
19
-
20
-
21
19
## Enabling autoscaling through APIs - adaptive allocations [nlp-model-adaptive-allocations]
22
20
23
21
Model allocations are independent units of work for NLP tasks. If you set the numbers of threads and allocations for a model manually, they remain constant even when not all the available resources are fully used or when the load on the model requires more resources. Instead of setting the number of allocations manually, you can enable adaptive allocations to set the number of allocations based on the load on the process. This can help you to manage performance and cost more easily. (Refer to the [pricing calculator](https://cloud.elastic.co/pricing) to learn more about the possible costs.)
@@ -31,15 +29,13 @@ You can enable adaptive allocations by using:
31
29
32
30
If the new allocations fit on the current {{ml}} nodes, they are immediately started. If more resource capacity is needed for creating new model allocations, then your {{ml}} node will be scaled up if {{ml}} autoscaling is enabled to provide enough resources for the new allocation. The number of model allocations can be scaled down to 0. They cannot be scaled up to more than 32 allocations, unless you explicitly set the maximum number of allocations to more. Adaptive allocations must be set up independently for each deployment and [{{infer}} endpoint](https://www.elastic.co/guide/en/elasticsearch/reference/current/put-inference-api.html).
33
31
34
-
35
32
### Optimizing for typical use cases [optimize-use-case]
36
33
37
34
You can optimize your model deployment for typical use cases, such as search and ingest. When you optimize for ingest, the throughput will be higher, which increases the number of {{infer}} requests that can be performed in parallel. When you optimize for search, the latency will be lower during search processes.
38
35
39
36
* If you want to optimize for ingest, set the number of threads to `1` (`"threads_per_allocation": 1`).
40
37
* If you want to optimize for search, set the number of threads to greater than `1`. Increasing the number of threads will make the search processes more performant.
41
38
42
-
43
39
## Enabling autoscaling in {{kib}} - adaptive resources [nlp-model-adaptive-resources]
44
40
45
41
You can enable adaptive resources for your models when starting or updating the model deployment. Adaptive resources make it possible for {{es}} to scale up or down the available resources based on the load on the process. This can help you to manage performance and cost more easily. When adaptive resources are enabled, the number of vCPUs that the model deployment uses is set automatically based on the current load. When the load is high, the number of vCPUs that the process can use is automatically increased. When the load is low, the number of vCPUs that the process can use is automatically decreased.
@@ -53,7 +49,6 @@ Refer to the tables in the [Model deployment resource matrix](#auto-scaling-matr
53
49
:class: screenshot
54
50
:::
55
51
56
-
57
52
## Model deployment resource matrix [auto-scaling-matrix]
58
53
59
54
The used resources for trained model deployments depend on three factors:
@@ -68,13 +63,10 @@ If you use {{es}} on-premises, vCPUs level ranges are derived from the `total_ml
68
63
On Serverless, adaptive allocations are automatically enabled for all project types. However, the "Adaptive resources" control is not displayed in {{kib}} for Observability and Security projects.
69
64
::::
70
65
71
-
72
-
73
66
### Deployments in Cloud optimized for ingest [_deployments_in_cloud_optimized_for_ingest]
74
67
75
68
In case of ingest-optimized deployments, we maximize the number of model allocations.
@@ -85,7 +77,6 @@ In case of ingest-optimized deployments, we maximize the number of model allocat
85
77
86
78
* The Cloud console doesn’t directly set an allocations limit; it only sets a vCPU limit. This vCPU limit indirectly determines the number of allocations, calculated as the vCPU limit divided by the number of threads.
@@ -96,12 +87,10 @@ In case of ingest-optimized deployments, we maximize the number of model allocat
96
87
97
88
* The Cloud console doesn’t directly set an allocations limit; it only sets a vCPU limit. This vCPU limit indirectly determines the number of allocations, calculated as the vCPU limit divided by the number of threads.
98
89
99
-
100
90
### Deployments in Cloud optimized for search [_deployments_in_cloud_optimized_for_search]
101
91
102
92
In case of search-optimized deployments, we maximize the number of threads. The maximum number of threads that can be claimed depends on the hardware your architecture has.
@@ -112,7 +101,6 @@ In case of search-optimized deployments, we maximize the number of threads. The
112
101
113
102
* The Cloud console doesn’t directly set an allocations limit; it only sets a vCPU limit. This vCPU limit indirectly determines the number of allocations, calculated as the vCPU limit divided by the number of threads.
@@ -121,5 +109,4 @@ In case of search-optimized deployments, we maximize the number of threads. The
121
109
| Medium | 2 (if threads=16) statically | maximum that the hardware allows (for example, 16) | 32 if available |
122
110
| High | Maximum available set in the Cloud console *, statically | maximum that the hardware allows (for example, 16) | Maximum available set in the Cloud console, statically |
123
111
124
-
* The Cloud console doesn’t directly set an allocations limit; it only sets a vCPU limit. This vCPU limit indirectly determines the number of allocations, calculated as the vCPU limit divided by the number of threads.
125
-
112
+
\* The Cloud console doesn’t directly set an allocations limit; it only sets a vCPU limit. This vCPU limit indirectly determines the number of allocations, calculated as the vCPU limit divided by the number of threads.
*[Zero-shot text classification](#ml-nlp-zero-shot)
13
13
14
-
15
-
## {{lang-ident-cap}} [_lang_ident_cap]
14
+
## {{lang-ident-cap}} [_lang_ident_cap]
16
15
17
16
The {{lang-ident}} model is provided out-of-the box in your {{es}} cluster. You can find the documentation of the model on the [{{lang-ident-cap}}](ml-nlp-lang-ident.md) page under the Built-in models section.
18
17
19
-
20
-
## Text classification [ml-nlp-text-classification]
18
+
## Text classification [ml-nlp-text-classification]
21
19
22
20
Text classification assigns the input text to one of multiple classes that best describe the text. The classes used depend on the model and the data set that was used to train it. Based on the number of classes, two main types of classification exist: binary classification, where the number of classes is exactly two, and multi-class classification, where the number of classes is more than two.
23
21
@@ -39,8 +37,7 @@ Likewise, you might use a trained model to perform multi-class classification an
39
37
...
40
38
```
41
39
42
-
43
-
## Zero-shot text classification [ml-nlp-zero-shot]
40
+
## Zero-shot text classification [ml-nlp-zero-shot]
44
41
45
42
The zero-shot classification task offers the ability to classify text without training a model on a specific set of classes. Instead, you provide the classes when you deploy the model or at {{infer}} time. It uses a model trained on a large data set that has gained a general language understanding and asks the model how well the labels you provided fit with your text.
46
43
@@ -95,4 +92,3 @@ The task returns the following result:
95
92
```
96
93
97
94
Since you can adjust the labels while you perform {{infer}}, this type of task is exceptionally flexible. If you are consistently using the same labels, however, it might be better to use a fine-tuned text classification model.
Copy file name to clipboardExpand all lines: explore-analyze/machine-learning/nlp/ml-nlp-deploy-model.md
-2Lines changed: 0 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -22,7 +22,6 @@ Each deployment will be fine-tuned automatically based on its specific purpose y
22
22
Since eland uses APIs to deploy the models, you cannot see the models in {{kib}} until the saved objects are synchronized. You can follow the prompts in {{kib}}, wait for automatic synchronization, or use the [sync {{ml}} saved objects API](https://www.elastic.co/guide/en/kibana/current/machine-learning-api-sync.html).
23
23
::::
24
24
25
-
26
25
You can define the resource usage level of the NLP model during model deployment. The resource usage levels behave differently depending on [adaptive resources](ml-nlp-auto-scale.md#nlp-model-adaptive-resources) being enabled or disabled. When adaptive resources are disabled but {{ml}} autoscaling is enabled, vCPU usage of Cloud deployments derived from the Cloud console and functions as follows:
27
26
28
27
* Low: This level limits resources to two vCPUs, which may be suitable for development, testing, and demos depending on your parameters. It is not recommended for production use
@@ -31,7 +30,6 @@ You can define the resource usage level of the NLP model during model deployment
31
30
32
31
For the resource levels when adaptive resources are enabled, refer to <[*Trained model autoscaling*](ml-nlp-auto-scale.md).
33
32
34
-
35
33
## Request queues and search priority [infer-request-queues]
36
34
37
35
Each allocation of a model deployment has a dedicated queue to buffer {{infer}} requests. The size of this queue is determined by the `queue_capacity` parameter in the [start trained model deployment API](https://www.elastic.co/guide/en/elasticsearch/reference/current/start-trained-model-deployment.html). When the queue reaches its maximum capacity, new requests are declined until some of the queued requests are processed, creating available capacity once again. When multiple ingest pipelines reference the same deployment, the queue can fill up, resulting in rejected requests. Consider using dedicated deployments to prevent this situation.
Copy file name to clipboardExpand all lines: explore-analyze/machine-learning/nlp/ml-nlp-extract-info.md
-4Lines changed: 0 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -11,7 +11,6 @@ These NLP tasks enable you to extract information from your unstructured text:
11
11
*[Fill-mask](#ml-nlp-mask)
12
12
*[Question answering](#ml-nlp-question-answering)
13
13
14
-
15
14
## Named entity recognition [ml-nlp-ner]
16
15
17
16
The named entity recognition (NER) task can identify and categorize certain entities - typically proper nouns - in your unstructured text. Named entities usually refer to objects in the real world such as persons, locations, organizations, and other miscellaneous entities that are consistently referenced by a proper name.
@@ -53,7 +52,6 @@ The task returns the following result:
53
52
...
54
53
```
55
54
56
-
57
55
## Fill-mask [ml-nlp-mask]
58
56
59
57
The objective of the fill-mask task is to predict a missing word from a text sequence. The model uses the context of the masked word to predict the most likely word to complete the text.
@@ -80,7 +78,6 @@ The task returns the following result:
80
78
...
81
79
```
82
80
83
-
84
81
## Question answering [ml-nlp-question-answering]
85
82
86
83
The question answering (or extractive question answering) task makes it possible to get answers to certain questions by extracting information from the provided text.
@@ -105,4 +102,3 @@ The answer is shown by the object below:
Copy file name to clipboardExpand all lines: explore-analyze/machine-learning/nlp/ml-nlp-import-model.md
+20-8Lines changed: 20 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,17 +9,14 @@ mapped_pages:
9
9
If you want to install a trained model in a restricted or closed network, refer to [these instructions](https://www.elastic.co/guide/en/elasticsearch/client/eland/current/machine-learning.html#ml-nlp-pytorch-air-gapped).
10
10
::::
11
11
12
-
13
12
After you choose a model, you must import it and its tokenizer vocabulary to your cluster. When you import the model, it must be chunked and imported one chunk at a time for storage in parts due to its size.
14
13
15
14
::::{note}
16
15
Trained models must be in a TorchScript representation for use with {{stack-ml-features}}.
17
16
::::
18
17
19
-
20
18
[Eland](https://github.com/elastic/eland) is an {{es}} Python client that provides a simple script to perform the conversion of Hugging Face transformer models to their TorchScript representations, the chunking process, and upload to {{es}}; it is therefore the recommended import method. You can either install the Python Eland client on your machine or use a Docker image to build Eland and run the model import script.
21
19
22
-
23
20
## Import with the Eland client installed [ml-nlp-import-script]
24
21
25
22
1. Install the [Eland Python client](https://www.elastic.co/guide/en/elasticsearch/client/eland/current/installation.html) with PyTorch extra dependencies.
@@ -30,7 +27,7 @@ Trained models must be in a TorchScript representation for use with {{stack-ml-f
30
27
31
28
2. Run the `eland_import_hub_model` script to download the model from Hugging Face, convert it to TorchScript format, and upload to the {{es}} cluster. For example:
32
29
33
-
```shell
30
+
```
34
31
eland_import_hub_model \
35
32
--cloud-id <cloud-id>\ <1>
36
33
-u <username> -p <password>\ <2>
@@ -43,10 +40,8 @@ Trained models must be in a TorchScript representation for use with {{stack-ml-f
43
40
3. Specify the identifier forthe modelin the Hugging Face model hub.
44
41
4. Specify the type of NLP task. Supported values are `fill_mask`, `ner`, `question_answering`, `text_classification`, `text_embedding`, `text_expansion`, `text_similarity`, and `zero_shot_classification`.
45
42
46
-
47
43
For more details, refer to [https://www.elastic.co/guide/en/elasticsearch/client/eland/current/machine-learning.html#ml-nlp-pytorch](https://www.elastic.co/guide/en/elasticsearch/client/eland/current/machine-learning.html#ml-nlp-pytorch).
48
44
49
-
50
45
## Import with Docker [ml-nlp-import-docker]
51
46
52
47
If you want to use Eland without installing it, run the following command:
@@ -65,9 +60,26 @@ docker run -it --rm docker.elastic.co/eland/eland \
65
60
--start
66
61
```
67
62
68
-
Replace the `$ELASTICSEARCH_URL` with the URL for your {{es}} cluster. Refer to [Authentication methods](https://www.elastic.co/guide/en/machine-learning/current/ml-nlp-authentication.html) to learn more.
63
+
Replace the `$ELASTICSEARCH_URL` with the URL for your {{es}} cluster. Refer to [Authentication methods](#ml-nlp-authentication) to learn more.
64
+
65
+
## Authentication methods [ml-nlp-authentication]
69
66
67
+
The following authentication options are available when using the import script:
70
68
69
+
* username/password authentication (specified with the `-u` and `-p` options):
## Add an {{infer}} processor to an ingest pipeline [ml-nlp-inference-processor]
17
16
18
17
In {{kib}}, you can create and edit pipelines in **{{stack-manage-app}}** > **Ingest Pipelines**. To open **Ingest Pipelines**, find **{{stack-manage-app}}** in the main menu, or use the [global search field](../../overview/kibana-quickstart.md#_finding_your_apps_and_objects).
@@ -94,8 +93,6 @@ In {{kib}}, you can create and edit pipelines in **{{stack-manage-app}}** > **In
94
93
95
94
3. If everything looks correct, close the panel, and click **Create pipeline**. The pipeline is now ready for use.
You can now use your ingest pipeline to perform NLP tasks on your data.
@@ -120,7 +117,6 @@ PUT ner-test
120
117
To use the `annotated_text` data type inthis example, you must install the [mapper annotated text plugin](https://www.elastic.co/guide/en/elasticsearch/plugins/current/mapper-annotated-text.html). For more installation details, refer to [Add plugins provided with {{ess}}](https://www.elastic.co/guide/en/cloud/current/ec-adding-elastic-plugins.html).
121
118
::::
122
119
123
-
124
120
You can then use the newpipeline to index some documents. For example, use a bulk indexing request with the `pipeline` query parameter for your NER pipeline:
125
121
126
122
```console
@@ -168,8 +164,6 @@ However, those web log messages are unlikely to contain enough words for the mod
168
164
Set the reindex `size` option to a value smaller than the `queue_capacity`for the trained model deployment. Otherwise, requests might be rejected with a "too many requests"429 error code.
169
165
::::
170
166
171
-
172
-
173
167
## View the results [ml-nlp-inference-discover]
174
168
175
169
Before you can verify the results of the pipelines, you must [create {{data-sources}}](../../find-and-organize/data-views.md). Then you can explore your data in**Discover**:
@@ -190,7 +184,6 @@ In this {{lang-ident}} example, the `ml.inference.predicted_value` contains the
190
184
191
185
To learn more about ingest pipelines and all of the other processors that you can add, refer to [Ingest pipelines](../../../manage-data/ingest/transform-enrich/ingest-pipelines.md).
192
186
193
-
194
187
## Common problems [ml-nlp-inference-common-problems]
195
188
196
189
If you encounter problems while using your trained model in an ingest pipeline, check the following possible causes:
@@ -201,7 +194,6 @@ If you encounter problems while using your trained model in an ingest pipeline,
201
194
202
195
These common failure scenarios and others can be captured by adding failure processors to your pipeline. For more examples, refer to [Handling pipeline failures](../../../manage-data/ingest/transform-enrich/ingest-pipelines.md#handling-pipeline-failures).
203
196
204
-
205
197
## Further reading [nlp-example-reading]
206
198
207
199
* [How to deploy NLP:Text Embeddings and Vector Search](https://www.elastic.co/blog/how-to-deploy-nlp-text-embeddings-and-vector-search)
0 commit comments