Skip to content

Commit a03c9af

Browse files
authored
Freshness updates
1 parent a73db06 commit a03c9af

File tree

1 file changed

+24
-24
lines changed

1 file changed

+24
-24
lines changed

articles/machine-learning/how-to-auto-train-nlp-models.md

Lines changed: 24 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -21,9 +21,9 @@ ms.date: 06/15/2023
2121

2222
In this article, you learn how to train natural language processing (NLP) models with [automated ML](concept-automated-ml.md) in Azure Machine Learning. You can create NLP models with automated ML via the Azure Machine Learning Python SDK v2 or the Azure Machine Learning CLI v2.
2323

24-
Automated ML supports NLP which allows ML professionals and data scientists to bring their own text data and build custom models for tasks such as, multi-class text classification, multi-label text classification, and named entity recognition (NER).
24+
Automated ML supports NLP which allows ML professionals and data scientists to bring their own text data and build custom models for NLP tasks. NLP tasks include multi-class text classification, multi-label text classification, and named entity recognition (NER).
2525

26-
You can seamlessly integrate with the [Azure Machine Learning data labeling](how-to-create-text-labeling-projects.md) capability to label your text data or bring your existing labeled data. Automated ML provides the option to use distributed training on multi-GPU compute clusters for faster model training. The resulting model can be operationalized at scale by leveraging Azure Machine Learning's MLOps capabilities.
26+
You can seamlessly integrate with the [Azure Machine Learning data labeling](how-to-create-text-labeling-projects.md) capability to label your text data or bring your existing labeled data. Automated ML provides the option to use distributed training on multi-GPU compute clusters for faster model training. The resulting model can be operationalized at scale using Azure Machine Learning's MLOps capabilities.
2727

2828
## Prerequisites
2929

@@ -33,7 +33,7 @@ You can seamlessly integrate with the [Azure Machine Learning data labeling](how
3333

3434
* Azure subscription. If you don't have an Azure subscription, sign up to try the [free or paid version of Azure Machine Learning](https://azure.microsoft.com/free/) today.
3535

36-
* An Azure Machine Learning workspace with a GPU training compute. To create the workspace, see [Create workspace resources](quickstart-create-resources.md). See [GPU optimized virtual machine sizes](../virtual-machines/sizes-gpu.md) for more details of GPU instances provided by Azure.
36+
* An Azure Machine Learning workspace with a GPU training compute. To create the workspace, see [Create workspace resources](quickstart-create-resources.md). For more information, see [GPU optimized virtual machine sizes](../virtual-machines/sizes-gpu.md) for more details of GPU instances provided by Azure.
3737

3838
> [!WARNING]
3939
> Support for multilingual models and the use of models with longer max sequence length is necessary for several NLP use cases, such as non-english datasets and longer range documents. As a result, these scenarios may require higher GPU memory for model training to succeed, such as the NC_v3 series or the ND series.
@@ -48,15 +48,15 @@ You can seamlessly integrate with the [Azure Machine Learning data labeling](how
4848

4949
* Azure subscription. If you don't have an Azure subscription, sign up to try the [free or paid version of Azure Machine Learning](https://azure.microsoft.com/free/) today.
5050

51-
* An Azure Machine Learning workspace with a GPU training compute. To create the workspace, see [Create workspace resources](quickstart-create-resources.md). See [GPU optimized virtual machine sizes](../virtual-machines/sizes-gpu.md) for more details of GPU instances provided by Azure.
51+
* An Azure Machine Learning workspace with a GPU training compute. To create the workspace, see [Create workspace resources](quickstart-create-resources.md). For more information, see [GPU optimized virtual machine sizes](../virtual-machines/sizes-gpu.md) for more details of GPU instances provided by Azure.
5252

5353
> [!WARNING]
5454
> Support for multilingual models and the use of models with longer max sequence length is necessary for several NLP use cases, such as non-english datasets and longer range documents. As a result, these scenarios may require higher GPU memory for model training to succeed, such as the NC_v3 series or the ND series.
5555
5656
* The Azure Machine Learning Python SDK v2 installed.
5757

5858
To install the SDK you can either,
59-
* Create a compute instance, which automatically installs the SDK and is pre-configured for ML workflows. See [Create and manage an Azure Machine Learning compute instance](how-to-create-manage-compute-instance.md) for more information.
59+
* Create a compute instance, which automatically installs the SDK and is preconfigured for ML workflows. See [Create and manage an Azure Machine Learning compute instance](how-to-create-manage-compute-instance.md) for more information.
6060

6161
* [Install the `automl` package yourself](https://github.com/Azure/azureml-examples/blob/main/v1/python-sdk/tutorials/automl-with-azureml/README.md#setup-using-a-local-conda-environment), which includes the [default installation](/python/api/overview/azure/ml/install#default-install) of the SDK.
6262

@@ -72,17 +72,17 @@ Determine what NLP task you want to accomplish. Currently, automated ML supports
7272

7373
Task |AutoML job syntax| Description
7474
----|----|---
75-
Multi-class text classification | CLI v2: `text_classification` <br> SDK v2: `text_classification()`| There are multiple possible classes and each sample can be classified as exactly one class. The task is to predict the correct class for each sample. <br> <br> For example, classifying a movie script as "Comedy" or "Romantic".
76-
Multi-label text classification | CLI v2: `text_classification_multilabel` <br> SDK v2: `text_classification_multilabel()`| There are multiple possible classes and each sample can be assigned any number of classes. The task is to predict all the classes for each sample<br> <br> For example, classifying a movie script as "Comedy", or "Romantic", or "Comedy and Romantic".
75+
Multi-class text classification | CLI v2: `text_classification` <br> SDK v2: `text_classification()`| There are multiple possible classes and each sample can be classified as exactly one class. The task is to predict the correct class for each sample. <br> <br> For example, classifying a movie script as "Comedy," or "Romantic".
76+
Multi-label text classification | CLI v2: `text_classification_multilabel` <br> SDK v2: `text_classification_multilabel()`| There are multiple possible classes and each sample can be assigned any number of classes. The task is to predict all the classes for each sample<br> <br> For example, classifying a movie script as "Comedy," or "Romantic," or "Comedy and Romantic".
7777
Named Entity Recognition (NER)| CLI v2:`text_ner` <br> SDK v2: `text_ner()`| There are multiple possible tags for tokens in sequences. The task is to predict the tags for all the tokens for each sequence. <br> <br> For example, extracting domain-specific entities from unstructured text, such as contracts or financial documents.
7878

7979
## Thresholding
8080

81-
Thresholding is the multi-label feature that allows users to pick the threshold above which the predicted probabilities will lead to a positive label. Lower values allow for more labels, which is better when users care more about recall, but this option could lead to more false positives. Higher values allow fewer labels and hence better for users who care about precision, but this option could lead to more false negatives.
81+
Thresholding is the multi-label feature that allows users to pick the threshold which the predicted probabilities will lead to a positive label. Lower values allow for more labels, which is better when users care more about recall, but this option could lead to more false positives. Higher values allow fewer labels and hence better for users who care about precision, but this option could lead to more false negatives.
8282

8383
## Preparing data
8484

85-
For NLP experiments in automated ML, you can bring your data in `.csv` format for multi-class and multi-label classification tasks. For NER tasks, two-column `.txt` files that use a space as the separator and adhere to the CoNLL format are supported. The following sections provide additional detail for the data format accepted for each task.
85+
For NLP experiments in automated ML, you can bring your data in `.csv` format for multi-class and multi-label classification tasks. For NER tasks, two-column `.txt` files that use a space as the separator and adhere to the CoNLL format are supported. The following sections provides details for the data format accepted for each task.
8686

8787
### Multi-class
8888

@@ -157,7 +157,7 @@ rings O
157157

158158
### Data validation
159159

160-
Before training, automated ML applies data validation checks on the input data to ensure that the data can be preprocessed correctly. If any of these checks fail, the run fails with the relevant error message. The following are the requirements to pass data validation checks for each task.
160+
Before a model trains, automated ML applies data validation checks on the input data to ensure that the data can be preprocessed correctly. If any of these checks fail, the run fails with the relevant error message. The following are the requirements to pass data validation checks for each task.
161161

162162
> [!Note]
163163
> Some data validation checks are applicable to both the training and the validation set, whereas others are applicable only to the training set. If the test dataset could not pass the data validation, that means that automated ML couldn't capture it and there is a possibility of model inference failure, or a decline in model performance.
@@ -167,24 +167,24 @@ Task | Data validation check
167167
All tasks | At least 50 training samples are required
168168
Multi-class and Multi-label | The training data and validation data must have <br> - The same set of columns <br>- The same order of columns from left to right <br>- The same data type for columns with the same name <br>- At least two unique labels <br> - Unique column names within each dataset (For example, the training set can't have multiple columns named **Age**)
169169
Multi-class only | None
170-
Multi-label only | - The label column format must be in [accepted format](#multi-label) <br> - At least one sample should have 0 or 2+ labels, otherwise it should be a `multiclass` task <br> - All labels should be in `str` or `int` format, with no overlapping. You should not have both label `1` and label `'1'`
171-
NER only | - The file should not start with an empty line <br> - Each line must be an empty line, or follow format `{token} {label}`, where there is exactly one space between the token and the label and no white space after the label <br> - All labels must start with `I-`, `B-`, or be exactly `O`. Case sensitive <br> - Exactly one empty line between two samples <br> - Exactly one empty line at the end of the file
170+
Multi-label only | - The label column format must be in [accepted format](#multi-label) <br> - At least one sample should have 0 or 2+ labels, otherwise it should be a `multiclass` task <br> - All labels should be in `str` or `int` format, with no overlapping. You shouldn't have both label `1` and label `'1'`
171+
NER only | - The file shouldn't start with an empty line <br> - Each line must be an empty line, or follow format `{token} {label}`, where there's exactly one space between the token and the label and no white space after the label <br> - All labels must start with `I-`, `B-`, or be exactly `O`. Case sensitive <br> - Exactly one empty line between two samples <br> - Exactly one empty line at the end of the file
172172

173173
## Configure experiment
174174

175175
Automated ML's NLP capability is triggered through task specific `automl` type jobs, which is the same workflow for submitting automated ML experiments for classification, regression and forecasting tasks. You would set parameters as you would for those experiments, such as `experiment_name`, `compute_name` and data inputs.
176176

177177
However, there are key differences:
178-
* You can ignore `primary_metric`, as it is only for reporting purposes. Currently, automated ML only trains one model per run for NLP and there is no model selection.
178+
* You can ignore `primary_metric`, as it's only for reporting purposes. Currently, automated ML only trains one model per run for NLP and there is no model selection.
179179
* The `label_column_name` parameter is only required for multi-class and multi-label text classification tasks.
180180
* If more than 10% of the samples in your dataset contain more than 128 tokens, it's considered long range.
181-
* In order to use the long range text feature, you should use a NC6 or higher/better SKUs for GPU such as: [NCv3](../virtual-machines/ncv3-series.md) series or [ND](../virtual-machines/nd-series.md) series.
181+
* In order to use the long range text feature, you should use an NC6 or higher/better SKUs for GPU such as: [NCv3](../virtual-machines/ncv3-series.md) series or [ND](../virtual-machines/nd-series.md) series.
182182

183183
# [Azure CLI](#tab/cli)
184184

185185
[!INCLUDE [cli v2](../../includes/machine-learning-cli-v2.md)]
186186

187-
For CLI v2 AutoML jobs you configure your experiment in a YAML file like the following.
187+
For CLI v2 automated ml jobs, you configure your experiment in a YAML file like the following.
188188

189189

190190

@@ -193,7 +193,7 @@ For CLI v2 AutoML jobs you configure your experiment in a YAML file like the fol
193193
[!INCLUDE [sdk v2](../../includes/machine-learning-sdk-v2.md)]
194194

195195

196-
For AutoML jobs via the SDK, you configure the job with the specific NLP task function. The following example demonstrates the configuration for `text_classification`.
196+
For Automated ML jobs via the SDK, you configure the job with the specific NLP task function. The following example demonstrates the configuration for `text_classification`.
197197
```Python
198198
# general job parameters
199199
compute_name = "gpu-cluster"
@@ -358,15 +358,15 @@ All the pre-trained text DNN models currently available in AutoML NLP for fine-t
358358
* xlnet_base_cased
359359
* xlnet_large_cased
360360

361-
Note that the large models are significantly larger than their base counterparts. They are typically more performant, but they take up more GPU memory and time for training. As such, their SKU requirements are more stringent: we recommend running on ND-series VMs for the best results.
361+
Note that the large models are larger than their base counterparts. They are typically more performant, but they take up more GPU memory and time for training. As such, their SKU requirements are more stringent: we recommend running on ND-series VMs for the best results.
362362

363363
## Supported hyperparameters
364364

365365
The following table describes the hyperparameters that AutoML NLP supports.
366366

367367
| Parameter name | Description | Syntax |
368368
|-------|---------|---------|
369-
| gradient_accumulation_steps | The number of backward operations whose gradients are to be summed up before performing one step of gradient descent by calling the optimizer's step function. <br><br> This is leveraged to use an effective batch size which is gradient_accumulation_steps times larger than the maximum size that fits the GPU. | Must be a positive integer.
369+
| gradient_accumulation_steps | The number of backward operations whose gradients are to be summed up before performing one step of gradient descent by calling the optimizer's step function. <br><br> This is to use an effective batch size, which is gradient_accumulation_steps times larger than the maximum size that fits the GPU. | Must be a positive integer.
370370
| learning_rate | Initial learning rate. | Must be a float in the range (0, 1). |
371371
| learning_rate_scheduler |Type of learning rate scheduler. | Must choose from `linear, cosine, cosine_with_restarts, polynomial, constant, constant_with_warmup`. |
372372
| model_name | Name of one of the supported models. | Must choose from `bert_base_cased, bert_base_uncased, bert_base_multilingual_cased, bert_base_german_cased, bert_large_cased, bert_large_uncased, distilbert_base_cased, distilbert_base_uncased, roberta_base, roberta_large, distilroberta_base, xlm_roberta_base, xlm_roberta_large, xlnet_base_cased, xlnet_large_cased`. |
@@ -380,7 +380,7 @@ All discrete hyperparameters only allow choice distributions, such as the intege
380380

381381
## Configure your sweep settings
382382

383-
You can configure all the sweep-related parameters. Multiple model subspaces can be constructed with hyperparameters conditional to the respective model, as seen below in each example.
383+
You can configure all the sweep-related parameters. Multiple model subspaces can be constructed with hyperparameters conditional to the respective model, as seen in each hyperparameter tuning example.
384384

385385
The same discrete and continuous distribution options that are available for general HyperDrive jobs are supported here. See all nine options in [Hyperparameter tuning a model](how-to-tune-hyperparameters.md#define-the-search-space)
386386

@@ -477,7 +477,7 @@ When sweeping hyperparameters, you need to specify the sampling method to use fo
477477

478478
You can optionally specify the experiment budget for your AutoML NLP training job using the `timeout_minutes` parameter in the `limits` - the amount of time in minutes before the experiment terminates. If none specified, the default experiment timeout is seven days (maximum 60 days).
479479

480-
AutoML NLP also supports `trial_timeout_minutes`, the maximum amount of time in minutes an individual trial can run before being terminated, and `max_nodes`, the maximum number of nodes from the backing compute cluster to leverage for the job. These parameters also belong to the `limits` section.
480+
AutoML NLP also supports `trial_timeout_minutes`, the maximum amount of time in minutes an individual trial can run before being terminated, and `max_nodes`, the maximum number of nodes from the backing compute cluster to use for the job. These parameters also belong to the `limits` section.
481481

482482

483483

@@ -506,7 +506,7 @@ Parameter | Detail
506506
`max_trials` | Parameter for maximum number of configurations to sweep. Must be an integer between 1 and 1000. When exploring just the default hyperparameters for a given model algorithm, set this parameter to 1. The default value is 1.
507507
`max_concurrent_trials`| Maximum number of runs that can run concurrently. If specified, must be an integer between 1 and 100. The default value is 1. <br><br> **NOTE:** <li> The number of concurrent runs is gated on the resources available in the specified compute target. Ensure that the compute target has the available resources for the desired concurrency. <li> `max_concurrent_trials` is capped at `max_trials` internally. For example, if user sets `max_concurrent_trials=4`, `max_trials=2`, values would be internally updated as `max_concurrent_trials=2`, `max_trials=2`.
508508

509-
You can configure all the sweep related parameters as shown in the example below.
509+
You can configure all the sweep related parameters as shown in this example.
510510

511511

512512
[!INCLUDE [cli v2](../../includes/machine-learning-cli-v2.md)]
@@ -527,11 +527,11 @@ sweep:
527527

528528
## Known Issues
529529

530-
Dealing with very low scores, or higher loss values:
530+
Dealing with low scores, or higher loss values:
531531

532-
For certain datasets, regardless of the NLP task, the scores produced may be very low, sometimes even zero. This would be accompanied by higher loss values implying that the neural network failed to converge. This can happen more frequently on certain GPU SKUs.
532+
For certain datasets, regardless of the NLP task, the scores produced may be very low, sometimes even zero. This score is accompanied by higher loss values implying that the neural network failed to converge. These scores can happen more frequently on certain GPU SKUs.
533533

534-
While such cases are uncommon, they're possible and the best way to handle it is to leverage hyperparameter tuning and provide a wider range of values, especially for hyperparameters like learning rates. Until our hyperparameter tuning capability is available in production we recommend users, who face such issues, to leverage the NC6 or ND6 compute clusters, where we've found training outcomes to be fairly stable.
534+
While such cases are uncommon, they're possible and the best way to handle it's to leverage hyperparameter tuning and provide a wider range of values, especially for hyperparameters like learning rates. Until our hyperparameter tuning capability is available in production we recommend users experiencing these issues, to use the NC6 or ND6 compute clusters. These clusters typically have training outcomes that are fairly stable.
535535

536536
## Next steps
537537

0 commit comments

Comments
 (0)