You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/machine-learning/concept-automated-ml.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -101,7 +101,7 @@ While model building is automated, you can also [learn how important or relevant
101
101
102
102
Feature engineering is the process of using domain knowledge of the data to create features that help ML algorithms learn better. In Azure Machine Learning, scaling and normalization techniques are applied to facilitate feature engineering. Collectively, these techniques and feature engineering are referred to as featurization.
103
103
104
-
For automated machine learning experiments, featurization is applied automatically, but can also be customized based on your data.
104
+
For automated machine learning experiments, featurization is applied automatically, but can also be customized based on your data. [Learn more about what featurization is included](how-to-configure-auto-features.md#featurization).
@@ -125,7 +125,7 @@ In every automated machine learning experiment, your data is automatically scale
125
125
126
126
### Customize featurization
127
127
128
-
Additional feature engineering techniques such as, encoding and transforms are also available. [Learn more about what featurization is included](how-to-configure-auto-features.md#featurization).
128
+
Additional feature engineering techniques such as, encoding and transforms are also available.
In this guide, learn what featurization settings are offered, and how to customize them for your [automated machine learning](concept-automated-ml.md) experiments with the [Azure Machine Learning Python SDK](https://docs.microsoft.com/python/api/overview/azure/ml/?view=azure-ml-py).
20
+
In this guide, learn what featurization settings are offered, and how to customize them for your [automated machine learning experiments](concept-automated-ml.md).
21
21
22
-
Feature engineering is the process of using domain knowledge of the data to create features that help ML algorithms learn better. In Azure Machine Learning, data scaling and normalization techniques are applied to facilitate feature engineering. Collectively, these techniques and feature engineering are referred to as featurization in automated machine learning experiments
22
+
Feature engineering is the process of using domain knowledge of the data to create features that help ML algorithms learn better. In Azure Machine Learning, data scaling and normalization techniques are applied to facilitate feature engineering. Collectively, these techniques and feature engineering are referred to as featurization in automated machine learning experiments.
23
23
24
-
This article assumes you are already familiar with [how to configure an automated machine learning experiment](how-to-configure-auto-train.md).
24
+
This article assumes you are already familiar with how to configure an automated machine learning experiment. See the following articles for details,
25
+
26
+
* For a code first experience: [Configure automated ML experiments with the Python SDK](how-to-configure-auto-train.md).
27
+
* For a low/no code experience: [Create, review, and deploy automated machine learning models with the Azure Machine Learning studio](how-to-use-automated-ml-for-ml-models.md)
25
28
26
29
## Configure featurization
27
30
@@ -33,28 +36,28 @@ In every automated machine learning experiment, [automatic scaling and normaliza
33
36
> predictions, the same featurization steps applied during training are applied to
34
37
> your input data automatically.
35
38
36
-
In your `AutoMLConfig` object, you can enable/disable the setting `featurization` and further specify the featurization steps that should be used for your experiment.
39
+
For experiments configured with the SDK, you can enable/disable the setting `featurization` and further specify the featurization steps that should be used for your experiment.
37
40
38
41
The following table shows the accepted settings for featurization in the [AutoMLConfig class](/python/api/azureml-train-automl-client/azureml.train.automl.automlconfig.automlconfig).
39
42
40
43
Featurization Configuration | Description
41
44
------------- | -------------
42
45
`"featurization": 'auto'`| Indicates that as part of preprocessing, [data guardrails and featurization steps](#featurization) are performed automatically. **Default setting**.
43
-
`"featurization": 'off'`| Indicates featurization step should not be done automatically.
46
+
`"featurization": 'off'`| Indicates featurization steps shouldn't be done automatically.
44
47
`"featurization":` `'FeaturizationConfig'`| Indicates customized featurization step should be used. [Learn how to customize featurization](#customize-featurization).|
45
48
46
49
<aname="featurization"></a>
47
50
48
51
## Automatic featurization
49
52
50
-
The following table summarizes the techniques that are automatically applied to your data by default or when`"featurization": 'auto'` is specified in your `AutoMLConfig` object.
53
+
Whether you configure your experiment via the SDK or the studio, the following table summarizes the techniques that are automatically applied to your data by default. The same techniques are applied if`"featurization": 'auto'` is specified in your `AutoMLConfig` object.
51
54
52
55
> [!NOTE]
53
56
> If you plan to export your auto ML created models to an [ONNX model](concept-onnx.md), only the featurization options indicated with an * are supported in the ONNX format. Learn more about [converting models to ONNX](concept-automated-ml.md#use-with-onnx).
54
57
55
58
|Featurization steps| Description |
56
59
| ------------- | ------------- |
57
-
|Drop high cardinality or no variance features*|Drop these from training and validation sets, including features with all values missing, same value across all rows or with extremely high cardinality (for example, hashes, IDs, or GUIDs).|
60
+
|Drop high cardinality or no variance features*|Drop these from training and validation sets, including features with all values missing, same value across all rows or with high cardinality (for example, hashes, IDs, or GUIDs).|
58
61
|Impute missing values*|For numerical features, impute with average of values in the column.<br/><br/>For categorical features, impute with most frequent value.|
59
62
|Generate additional features*|For DateTime features: Year, Month, Day, Day of week, Day of year, Quarter, Week of the year, Hour, Minute, Second.<br/><br/>For Text features: Term frequency based on unigrams, bi-grams, and tri-character-grams.|
60
63
|Transform and encode *|Numeric features with few unique values are transformed into categorical features.<br/><br/>One-hot encoding is performed for low cardinality categorical; for high cardinality, one-hot-hash encoding.|
@@ -66,11 +69,18 @@ The following table summarizes the techniques that are automatically applied to
66
69
67
70
## Data guardrails
68
71
69
-
Data guardrails help you identify potential issues with your data (e.g., missing values, [class imbalance](concept-manage-ml-pitfalls.md#identify-models-with-imbalanced-data)) and help take corrective actions for improved results. Data guardrails are applied when `"featurization": 'auto'` is specified or validation is set to `auto` in your `AutoMLConfig` object.
72
+
Data guardrails help you identify potential issues with your data (e.g., missing values, [class imbalance](concept-manage-ml-pitfalls.md#identify-models-with-imbalanced-data)) and help take corrective actions for improved results.
73
+
74
+
Data guardrails are applied
75
+
76
+
***For SDK experiments**, when either the parameters `"featurization": 'auto'` or `validation=auto` are specified in your `AutoMLConfig` object.
77
+
***For studio experiments**, when *Automatic featurization* is enabled.
78
+
79
+
You can review the data guardrails pertaining to your experiment
80
+
81
+
* By setting `show_output=True` when submitting an experiment with the Python SDK.
70
82
71
-
Users can review data guardrails
72
-
* In the studio on the **Data guardrails** tab of an automated ML run.
73
-
* By setting ```show_output=True``` when submitting an experiment with the Python SDK.
83
+
* In the studio on the **Data guardrails** tab of you automated ML run.
Missing feature values imputation |**Passed** <br><br><br> **Done**| No missing feature values were detected in your training data. Learn more about [missing value imputation.](https://docs.microsoft.com/azure/machine-learning/how-to-use-automated-ml-for-ml-models#advanced-featurization-options) <br><br> Missing feature values were detected in your training data and imputed.
93
103
High cardinality feature handling |**Passed** <br><br><br> **Done**| Your inputs were analyzed, and no high cardinality features were detected. Learn more about [high cardinality feature detection.](https://docs.microsoft.com/azure/machine-learning/how-to-use-automated-ml-for-ml-models#advanced-featurization-options) <br><br> High cardinality features were detected in your inputs and were handled.
94
104
Validation split handling |**Done**| *The validation configuration was set to 'auto' and the training data contained **less** than 20,000 rows.* <br> Each iteration of the trained model was validated through cross-validation. Learn more about [validation data.](https://docs.microsoft.com/azure/machine-learning/how-to-configure-auto-train#train-and-validation-data) <br><br> *The validation configuration was set to 'auto' and the training data contained **more** than 20,000 rows.* <br> The input data has been split into a training dataset and a validation dataset for validation of the model.
95
-
Class balancing detection |**Passed** <br><br><br><br> **Alerted** | Your inputs were analyzed, and all classes are balanced in your training data. A dataset is considered balanced if each class has good representation in the dataset, as measured by number and ratio of samples. <br><br><br> Imbalanced classes were detected in your inputs. To fix model bias fix the balancing problem. Learn more about [imbalanced data.](https://docs.microsoft.com/azure/machine-learning/concept-manage-ml-pitfalls#identify-models-with-imbalanced-data)
105
+
Class balancing detection |**Passed** <br><br><br><br> **Alerted** | Your inputs were analyzed, and all classes are balanced in your training data. A dataset is considered balanced if each class has good representation in the dataset, as measured by number and ratio of samples. <br><br><br> Imbalanced classes were detected in your inputs. To fix model bias, fix the balancing problem. Learn more about [imbalanced data.](https://docs.microsoft.com/azure/machine-learning/concept-manage-ml-pitfalls#identify-models-with-imbalanced-data)
96
106
Memory issues detection |**Passed** <br><br><br><br> **Done** |<br> The selected {horizon, lag, rolling window} value(s) were analyzed, and no potential out-of-memory issues were detected. Learn more about time-series [forecasting configurations.](https://docs.microsoft.com/azure/machine-learning/how-to-auto-train-forecast#configure-and-run-experiment) <br><br><br>The selected {horizon, lag, rolling window} values were analyzed and will potentially cause your experiment to run out of memory. The lag or rolling window configurations have been turned off.
97
107
Frequency detection |**Passed** <br><br><br><br> **Done** |<br> The time series was analyzed and all data points are aligned with the detected frequency. <br> <br> The time series was analyzed and data points that do not align with the detected frequency were detected. These data points were removed from the dataset. Learn more about [data preparation for time-series forecasting.](https://docs.microsoft.com/azure/machine-learning/how-to-auto-train-forecast#preparing-data)
0 commit comments