Skip to content

Commit aa03c26

Browse files
Merge pull request #209400 from ssalgadodev/autoMLNLPUpdates
AutoML | NLP doc removed and added features
2 parents 688e3ed + 3442338 commit aa03c26

File tree

1 file changed

+9
-4
lines changed

1 file changed

+9
-4
lines changed

articles/machine-learning/how-to-auto-train-nlp-models.md

Lines changed: 9 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -79,7 +79,11 @@ Task |AutoML job syntax| Description
7979
----|----|---
8080
Multi-class text classification | CLI v2: `text_classification` <br> SDK v2 (preview): `text_classification()`| There are multiple possible classes and each sample can be classified as exactly one class. The task is to predict the correct class for each sample. <br> <br> For example, classifying a movie script as "Comedy" or "Romantic".
8181
Multi-label text classification | CLI v2: `text_classification_multilabel` <br> SDK v2 (preview): `text_classification_multilabel()`| There are multiple possible classes and each sample can be assigned any number of classes. The task is to predict all the classes for each sample<br> <br> For example, classifying a movie script as "Comedy", or "Romantic", or "Comedy and Romantic".
82-
Named Entity Recognition (NER)| CLI v2:`text_ner` <br> SDK v2 (preview): `text_ner()`| There are multiple possible tags for tokens in sequences. The task is to predict the tags for all the tokens for each sequence. <br> <br> For example, extracting domain-specific entities from unstructured text, such as contracts or financial documents
82+
Named Entity Recognition (NER)| CLI v2:`text_ner` <br> SDK v2 (preview): `text_ner()`| There are multiple possible tags for tokens in sequences. The task is to predict the tags for all the tokens for each sequence. <br> <br> For example, extracting domain-specific entities from unstructured text, such as contracts or financial documents.
83+
84+
## Thresholding
85+
86+
Thresholding is the multi-label feature that allows users to pick the threshold above which the predicted probabilities will lead to a positive label. Lower values allow for more labels, which is better when users care more about recall, but this option could lead to more false positives. Higher values allow fewer labels and hence better for users who care about precision, but this option could lead to more false negatives.
8387

8488
## Preparing data
8589

@@ -178,9 +182,8 @@ Automated ML's NLP capability is triggered through task specific `automl` type j
178182
However, there are key differences:
179183
* You can ignore `primary_metric`, as it is only for reporting purposes. Currently, automated ML only trains one model per run for NLP and there is no model selection.
180184
* The `label_column_name` parameter is only required for multi-class and multi-label text classification tasks.
181-
* If the majority of the samples in your dataset contain more than 128 words, it's considered long range. By default, automated ML considers all samples long range text. To disable this feature, include the `enable_long_range_text=False` parameter in your `AutoMLConfig`.
182-
* If you enable long range text, then a GPU with higher memory is required such as, [NCv3](../virtual-machines/ncv3-series.md) series or [ND](../virtual-machines/nd-series.md) series.
183-
* The `enable_long_range_text` parameter is only available for multi-class classification tasks.
185+
* If more than 10% of the samples in your dataset contain more than 128 tokens, it's considered long range.
186+
* In order to use the long range text feature, you should use a NC6 or higher/better SKUs for GPU such as: [NCv3](../virtual-machines/ncv3-series.md) series or [ND](../virtual-machines/nd-series.md) series.
184187

185188
# [Azure CLI](#tab/cli)
186189

@@ -279,6 +282,8 @@ max_concurrent_iterations = number_of_vms
279282
enable_distributed_dnn_training = True
280283
```
281284

285+
In AutoML NLP only hold-out validation is supported and it requires a validation dataset.
286+
282287
---
283288

284289
## Submit the AutoML job

0 commit comments

Comments
 (0)