Skip to content

Commit 360de1c

Browse files
committed
Peer review edits
1 parent ed9afb6 commit 360de1c

File tree

2 files changed

+27
-17
lines changed

2 files changed

+27
-17
lines changed

articles/machine-learning/concept-automated-ml.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -101,7 +101,7 @@ While model building is automated, you can also [learn how important or relevant
101101

102102
Feature engineering is the process of using domain knowledge of the data to create features that help ML algorithms learn better. In Azure Machine Learning, scaling and normalization techniques are applied to facilitate feature engineering. Collectively, these techniques and feature engineering are referred to as featurization.
103103

104-
For automated machine learning experiments, featurization is applied automatically, but can also be customized based on your data.
104+
For automated machine learning experiments, featurization is applied automatically, but can also be customized based on your data. [Learn more about what featurization is included](how-to-configure-auto-features.md#featurization).
105105

106106
> [!NOTE]
107107
> Automated machine learning featurization steps (feature normalization, handling missing data,
@@ -125,7 +125,7 @@ In every automated machine learning experiment, your data is automatically scale
125125

126126
### Customize featurization
127127

128-
Additional feature engineering techniques such as, encoding and transforms are also available. [Learn more about what featurization is included](how-to-configure-auto-features.md#featurization).
128+
Additional feature engineering techniques such as, encoding and transforms are also available.
129129

130130
Enable this setting with:
131131

articles/machine-learning/how-to-configure-auto-features.md

Lines changed: 25 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
title: Feature engineering for auto ML experiments
2+
title: Feature engineering for AutoML experiments
33
titleSuffix: Azure Machine Learning
44
description: Learn what feature engineering options Azure Machine Learning offers with automated ml experiments.
55
author: nibaccam
@@ -13,15 +13,18 @@ ms.date: 05/25/2020
1313
ms.custom: seodec18
1414
---
1515

16-
# Feature engineering with automated machine learning
16+
# Featurization with automated machine learning
1717

1818
[!INCLUDE [applies-to-skus](../../includes/aml-applies-to-basic-enterprise-sku.md)]
1919

20-
In this guide, learn what featurization settings are offered, and how to customize them for your [automated machine learning](concept-automated-ml.md) experiments with the [Azure Machine Learning Python SDK](https://docs.microsoft.com/python/api/overview/azure/ml/?view=azure-ml-py).
20+
In this guide, learn what featurization settings are offered, and how to customize them for your [automated machine learning experiments](concept-automated-ml.md).
2121

22-
Feature engineering is the process of using domain knowledge of the data to create features that help ML algorithms learn better. In Azure Machine Learning, data scaling and normalization techniques are applied to facilitate feature engineering. Collectively, these techniques and feature engineering are referred to as featurization in automated machine learning experiments
22+
Feature engineering is the process of using domain knowledge of the data to create features that help ML algorithms learn better. In Azure Machine Learning, data scaling and normalization techniques are applied to facilitate feature engineering. Collectively, these techniques and feature engineering are referred to as featurization in automated machine learning experiments.
2323

24-
This article assumes you are already familiar with [how to configure an automated machine learning experiment](how-to-configure-auto-train.md).
24+
This article assumes you are already familiar with how to configure an automated machine learning experiment. See the following articles for details,
25+
26+
* For a code first experience: [Configure automated ML experiments with the Python SDK](how-to-configure-auto-train.md).
27+
* For a low/no code experience: [Create, review, and deploy automated machine learning models with the Azure Machine Learning studio](how-to-use-automated-ml-for-ml-models.md)
2528

2629
## Configure featurization
2730

@@ -33,28 +36,28 @@ In every automated machine learning experiment, [automatic scaling and normaliza
3336
> predictions, the same featurization steps applied during training are applied to
3437
> your input data automatically.
3538
36-
In your `AutoMLConfig` object, you can enable/disable the setting `featurization` and further specify the featurization steps that should be used for your experiment.
39+
For experiments configured with the SDK, you can enable/disable the setting `featurization` and further specify the featurization steps that should be used for your experiment.
3740

3841
The following table shows the accepted settings for featurization in the [AutoMLConfig class](/python/api/azureml-train-automl-client/azureml.train.automl.automlconfig.automlconfig).
3942

4043
Featurization Configuration | Description
4144
------------- | -------------
4245
`"featurization": 'auto'`| Indicates that as part of preprocessing, [data guardrails and featurization steps](#featurization) are performed automatically. **Default setting**.
43-
`"featurization": 'off'`| Indicates featurization step should not be done automatically.
46+
`"featurization": 'off'`| Indicates featurization steps shouldn't be done automatically.
4447
`"featurization":` `'FeaturizationConfig'`| Indicates customized featurization step should be used. [Learn how to customize featurization](#customize-featurization).|
4548

4649
<a name="featurization"></a>
4750

4851
## Automatic featurization
4952

50-
The following table summarizes the techniques that are automatically applied to your data by default or when `"featurization": 'auto'` is specified in your `AutoMLConfig` object.
53+
Whether you configure your experiment via the SDK or the studio, the following table summarizes the techniques that are automatically applied to your data by default. The same techniques are applied if `"featurization": 'auto'` is specified in your `AutoMLConfig` object.
5154

5255
> [!NOTE]
5356
> If you plan to export your auto ML created models to an [ONNX model](concept-onnx.md), only the featurization options indicated with an * are supported in the ONNX format. Learn more about [converting models to ONNX](concept-automated-ml.md#use-with-onnx).
5457
5558
|Featurization&nbsp;steps| Description |
5659
| ------------- | ------------- |
57-
|Drop high cardinality or no variance features* |Drop these from training and validation sets, including features with all values missing, same value across all rows or with extremely high cardinality (for example, hashes, IDs, or GUIDs).|
60+
|Drop high cardinality or no variance features* |Drop these from training and validation sets, including features with all values missing, same value across all rows or with high cardinality (for example, hashes, IDs, or GUIDs).|
5861
|Impute missing values* |For numerical features, impute with average of values in the column.<br/><br/>For categorical features, impute with most frequent value.|
5962
|Generate additional features* |For DateTime features: Year, Month, Day, Day of week, Day of year, Quarter, Week of the year, Hour, Minute, Second.<br/><br/>For Text features: Term frequency based on unigrams, bi-grams, and tri-character-grams.|
6063
|Transform and encode *|Numeric features with few unique values are transformed into categorical features.<br/><br/>One-hot encoding is performed for low cardinality categorical; for high cardinality, one-hot-hash encoding.|
@@ -66,11 +69,18 @@ The following table summarizes the techniques that are automatically applied to
6669

6770
## Data guardrails
6871

69-
Data guardrails help you identify potential issues with your data (e.g., missing values, [class imbalance](concept-manage-ml-pitfalls.md#identify-models-with-imbalanced-data)) and help take corrective actions for improved results. Data guardrails are applied when `"featurization": 'auto'` is specified or validation is set to `auto` in your `AutoMLConfig` object.
72+
Data guardrails help you identify potential issues with your data (e.g., missing values, [class imbalance](concept-manage-ml-pitfalls.md#identify-models-with-imbalanced-data)) and help take corrective actions for improved results.
73+
74+
Data guardrails are applied
75+
76+
* **For SDK experiments**, when either the parameters `"featurization": 'auto'` or `validation=auto` are specified in your `AutoMLConfig` object.
77+
* **For studio experiments**, when *Automatic featurization* is enabled.
78+
79+
You can review the data guardrails pertaining to your experiment
80+
81+
* By setting `show_output=True` when submitting an experiment with the Python SDK.
7082

71-
Users can review data guardrails
72-
* In the studio on the **Data guardrails** tab of an automated ML run.
73-
* By setting ```show_output=True``` when submitting an experiment with the Python SDK.
83+
* In the studio on the **Data guardrails** tab of you automated ML run.
7484

7585
### Data guardrail states
7686

@@ -92,7 +102,7 @@ Guardrail|Status|Condition&nbsp;for&nbsp;trigger
92102
Missing feature values imputation |**Passed** <br><br><br> **Done**| No missing feature values were detected in your training data. Learn more about [missing value imputation.](https://docs.microsoft.com/azure/machine-learning/how-to-use-automated-ml-for-ml-models#advanced-featurization-options) <br><br> Missing feature values were detected in your training data and imputed.
93103
High cardinality feature handling |**Passed** <br><br><br> **Done**| Your inputs were analyzed, and no high cardinality features were detected. Learn more about [high cardinality feature detection.](https://docs.microsoft.com/azure/machine-learning/how-to-use-automated-ml-for-ml-models#advanced-featurization-options) <br><br> High cardinality features were detected in your inputs and were handled.
94104
Validation split handling |**Done**| *The validation configuration was set to 'auto' and the training data contained **less** than 20,000 rows.* <br> Each iteration of the trained model was validated through cross-validation. Learn more about [validation data.](https://docs.microsoft.com/azure/machine-learning/how-to-configure-auto-train#train-and-validation-data) <br><br> *The validation configuration was set to 'auto' and the training data contained **more** than 20,000 rows.* <br> The input data has been split into a training dataset and a validation dataset for validation of the model.
95-
Class balancing detection |**Passed** <br><br><br><br> **Alerted** | Your inputs were analyzed, and all classes are balanced in your training data. A dataset is considered balanced if each class has good representation in the dataset, as measured by number and ratio of samples. <br><br><br> Imbalanced classes were detected in your inputs. To fix model bias fix the balancing problem. Learn more about [imbalanced data.](https://docs.microsoft.com/azure/machine-learning/concept-manage-ml-pitfalls#identify-models-with-imbalanced-data)
105+
Class balancing detection |**Passed** <br><br><br><br> **Alerted** | Your inputs were analyzed, and all classes are balanced in your training data. A dataset is considered balanced if each class has good representation in the dataset, as measured by number and ratio of samples. <br><br><br> Imbalanced classes were detected in your inputs. To fix model bias, fix the balancing problem. Learn more about [imbalanced data.](https://docs.microsoft.com/azure/machine-learning/concept-manage-ml-pitfalls#identify-models-with-imbalanced-data)
96106
Memory issues detection |**Passed** <br><br><br><br> **Done** |<br> The selected {horizon, lag, rolling window} value(s) were analyzed, and no potential out-of-memory issues were detected. Learn more about time-series [forecasting configurations.](https://docs.microsoft.com/azure/machine-learning/how-to-auto-train-forecast#configure-and-run-experiment) <br><br><br>The selected {horizon, lag, rolling window} values were analyzed and will potentially cause your experiment to run out of memory. The lag or rolling window configurations have been turned off.
97107
Frequency detection |**Passed** <br><br><br><br> **Done** |<br> The time series was analyzed and all data points are aligned with the detected frequency. <br> <br> The time series was analyzed and data points that do not align with the detected frequency were detected. These data points were removed from the dataset. Learn more about [data preparation for time-series forecasting.](https://docs.microsoft.com/azure/machine-learning/how-to-auto-train-forecast#preparing-data)
98108

@@ -129,7 +139,7 @@ featurization_config.add_transformer_params('HashOneHotEncoder', [], {"number_of
129139
* Learn how to set up your automated ML experiments,
130140

131141
* For code experience customers: [Configure automated ML experiments with the Azure Machine Learning SDK](how-to-configure-auto-train.md).
132-
* For limited/no code experience customers: [Create your automated machine learning experiments in Azure Machine Learning studio](how-to-use-automated-ml-for-ml-models.md).
142+
* For low/no code experience customers: [Create your automated machine learning experiments in Azure Machine Learning studio](how-to-use-automated-ml-for-ml-models.md).
133143

134144
* Learn more about [how and where to deploy a model](how-to-deploy-and-where.md).
135145

0 commit comments

Comments
 (0)