You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
You can specify your dataset language in the featurization section of your configuration YAML file. BERT is also used in the featurization process of automated ML experiment training, learn more about [BERT integration and featurization in automated ML (SDK v1)](./v1/how-to-configure-auto-features.md#bert-integration-in-automated-ml).
235
+
You can specify your dataset language in the featurization section of your configuration YAML file. BERT is also used in the featurization process of automated ML experiment training, learn more about [BERT integration and featurization in AutoML (SDK v1)](./v1/how-to-configure-auto-features.md#bert-integration-in-automl).
You can specify your dataset language with the `set_featurization()` method. BERT is also used in the featurization process of automated ML experiment training, learn more about [BERT integration and featurization in automated ML (SDK v1)](./v1/how-to-configure-auto-features.md?view=azureml-api-1&preserve-view=true#bert-integration-in-automated-ml).
246
+
You can specify your dataset language with the `set_featurization()` method. BERT is also used in the featurization process of automated ML experiment training, learn more about [BERT integration and featurization in AutoML (SDK v1)](./v1/how-to-configure-auto-features.md?view=azureml-api-1&preserve-view=true#bert-integration-in-automl).
Copy file name to clipboardExpand all lines: articles/machine-learning/v1/how-to-auto-train-nlp-models.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -189,7 +189,7 @@ Multi-class text classification| `'eng'` <br> `'deu'` <br> `'mul'`| English
189
189
Named entity recognition (NER)| `'eng'` <br> `'deu'` <br> `'mul'`| English BERT [cased](https://huggingface.co/bert-base-cased) <br> [German BERT](https://huggingface.co/bert-base-german-cased)<br> [Multilingual BERT](https://huggingface.co/bert-base-multilingual-cased) <br><br>For all other languages, automated ML applies multilingual BERT
190
190
191
191
192
-
You can specify your dataset language in your `FeaturizationConfig`. BERT is also used in the featurization process of automated ML experiment training, learn more about [BERT integration and featurization in automated ML](how-to-configure-auto-features.md#bert-integration-in-automated-ml).
192
+
You can specify your dataset language in your `FeaturizationConfig`. BERT is also used in the featurization process of automated ML experiment training, learn more about [BERT integration and featurization in AutoML](how-to-configure-auto-features.md#bert-integration-in-automl).
193
193
194
194
```python
195
195
from azureml.automl.core.featurization import FeaturizationConfig
|**Column purpose update**|Override the autodetected feature type for the specified column.|
145
145
|**Transformer parameter update**|Update the parameters for the specified transformer. Currently supports *Imputer* (mean, most frequent, and median) and *HashOneHotEncoder*.|
146
-
|**Drop columns**|Specifies columns to drop from being featurized.|
146
+
|**Drop columns***|Specifies columns to drop from being featurized.|
147
147
|**Block transformers**| Specifies block transformers to be used in the featurization process.|
148
148
149
149
>[!NOTE]
150
-
> The **drop columns** functionality is deprecated as of SDK version 1.19. Drop columns from your dataset as part of data cleansing, before consuming it in your AutoML experiment.
150
+
> *The **drop columns** functionality is deprecated as of SDK version 1.19. Drop columns from your dataset as part of data cleansing, before consuming it in your AutoML experiment.
151
151
152
-
Create the `FeaturizationConfig` object by using API calls:
152
+
You can create the `FeaturizationConfig` object by using API calls:
153
153
154
154
```python
155
155
featurization_config = FeaturizationConfig()
@@ -241,7 +241,7 @@ Output
241
241
242
242
### Scaling and normalization
243
243
244
-
To understand scaling and normalization, and the selected algorithm with its hyperparameter values, use `fitted_model.steps`.
244
+
To understand scaling/normalization and the selected algorithm with its hyperparameter values, use `fitted_model.steps`.
245
245
246
246
The following sample output is from running `fitted_model.steps` for a chosen run:
247
247
@@ -334,9 +334,9 @@ If the underlying model doesn't support the `predict_proba()` function or the fo
334
334
335
335
## BERT integration in AutoML
336
336
337
-
[BERT](https://techcommunity.microsoft.com/t5/azure-ai/how-bert-is-integrated-into-azure-automated-machine-learning/ba-p/1194657) is used in the featurization layer of AutoML. In this layer, if a column contains free text or other types of data like timestamps or simple numbers, then featurization is applied accordingly.
337
+
[Bidirectional Encoder Representations from Transformers (BERT)](https://techcommunity.microsoft.com/t5/azure-ai/how-bert-is-integrated-into-azure-automated-machine-learning/ba-p/1194657) is used in the featurization layer of AutoML. In this layer, if a column contains free text or other types of data like timestamps or simple numbers, then featurization is applied accordingly.
338
338
339
-
For BERT, the model is fine-tuned and trained utilizing the user-provided labels. From here, document embeddings are output as features alongside others, like timestamp-based features, day of week.
339
+
For BERT, the model is fine-tuned and trained by utilizing the user-provided labels. From here, document embeddings are output as features alongside others, like timestamp-based features, day of week.
340
340
341
341
Learn how to [Set up AutoML to train a natural language processing model with Python](how-to-auto-train-nlp-models.md).
342
342
@@ -346,7 +346,7 @@ In order to invoke BERT, set `enable_dnn: True` in your `automl_settings` and us
346
346
347
347
AutoML takes the following steps for BERT.
348
348
349
-
1.**Preprocesses and tokenizes all text columns**. For example, the `StringCast` transformer can be found in the final model's featurization summary. An example of how to produce the model's featurization summary can be found in [this Jupyter notebook](https://github.com/Azure/azureml-examples/blob/v1-archive/v1/python-sdk/tutorials/automl-with-azureml/classification-text-dnn/auto-ml-classification-text-dnn.ipynb).
349
+
1.**Preprocesses and tokenizes all text columns**. For example, the `StringCast` transformer can be found in the final model's featurization summary. An example of how to produce the model's featurization summary can be found in this [Jupyter notebook](https://github.com/Azure/azureml-examples/blob/v1-archive/v1/python-sdk/tutorials/automl-with-azureml/classification-text-dnn/auto-ml-classification-text-dnn.ipynb).
350
350
351
351
1.**Concatenates all text columns into a single text column**, hence the `StringConcatTransformer` in the final model.
0 commit comments