You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Multi-class text classification | CLI v2: `text_classification` <br> SDK v2 (preview): `text_classification()`| There are multiple possible classes and each sample can be classified as exactly one class. The task is to predict the correct class for each sample. <br> <br> For example, classifying a movie script as "Comedy" or "Romantic".
81
81
Multi-label text classification | CLI v2: `text_classification_multilabel` <br> SDK v2 (preview): `text_classification_multilabel()`| There are multiple possible classes and each sample can be assigned any number of classes. The task is to predict all the classes for each sample<br> <br> For example, classifying a movie script as "Comedy", or "Romantic", or "Comedy and Romantic".
82
-
Named Entity Recognition (NER)| CLI v2:`text_ner` <br> SDK v2 (preview): `text_ner()`| There are multiple possible tags for tokens in sequences. The task is to predict the tags for all the tokens for each sequence. <br> <br> For example, extracting domain-specific entities from unstructured text, such as contracts or financial documents
82
+
Named Entity Recognition (NER)| CLI v2:`text_ner` <br> SDK v2 (preview): `text_ner()`| There are multiple possible tags for tokens in sequences. The task is to predict the tags for all the tokens for each sequence. <br> <br> For example, extracting domain-specific entities from unstructured text, such as contracts or financial documents.
83
+
84
+
## Thresholding
85
+
86
+
Thresholding is the multi-label feature that allows users to pick the threshold above which the predicted probabilities will lead to a positive label. Lower values allow for more labels, which is better when users care more about recall, but this option could lead to more false positives. Higher values allow fewer labels and hence better for users who care about precision, but this option could lead to more false negatives.
83
87
84
88
## Preparing data
85
89
@@ -178,9 +182,8 @@ Automated ML's NLP capability is triggered through task specific `automl` type j
178
182
However, there are key differences:
179
183
* You can ignore `primary_metric`, as it is only for reporting purposes. Currently, automated ML only trains one model per run for NLP and there is no model selection.
180
184
* The `label_column_name` parameter is only required for multi-class and multi-label text classification tasks.
181
-
* If the majority of the samples in your dataset contain more than 128 words, it's considered long range. By default, automated ML considers all samples long range text. To disable this feature, include the `enable_long_range_text=False` parameter in your `AutoMLConfig`.
182
-
* If you enable long range text, then a GPU with higher memory is required such as, [NCv3](../virtual-machines/ncv3-series.md) series or [ND](../virtual-machines/nd-series.md) series.
183
-
* The `enable_long_range_text` parameter is only available for multi-class classification tasks.
185
+
* If more than 10% of the samples in your dataset contain more than 128 tokens, it's considered long range.
186
+
* In order to use the long range text feature, you should use a NC6 or higher/better SKUs for GPU such as: [NCv3](../virtual-machines/ncv3-series.md) series or [ND](../virtual-machines/nd-series.md) series.
0 commit comments