Skip to content

Commit f4057f5

Browse files
authored
Merge pull request #116118 from nibaccam/automl-feature
AutoML | New article: Featurization settings standalone
2 parents 03bb04e + f09274e commit f4057f5

11 files changed

+194
-110
lines changed

articles/machine-learning/concept-automated-ml.md

Lines changed: 16 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -30,14 +30,15 @@ Data scientists, analysts, and developers across industries can use automated ML
3030

3131
### Classification
3232

33-
Classification is a common machine learning task. Classification is a type of supervised learning in which models learn using training data, and apply those learnings to new data. Azure Machine Learning offers featurizations specifically for these tasks, such as deep neural network text featurizers for classification. Learn more about [featurization options](how-to-use-automated-ml-for-ml-models.md#featurization).
33+
Classification is a common machine learning task. Classification is a type of supervised learning in which models learn using training data, and apply those learnings to new data. Azure Machine Learning offers featurizations specifically for these tasks, such as deep neural network text featurizers for classification. Learn more about [featurization options](how-to-configure-auto-features.md#featurization).
3434

3535
The main goal of classification models is to predict which categories new data will fall into based on learnings from its training data. Common classification examples include fraud detection, handwriting recognition, and object detection. Learn more and see an example of [classification with automated machine learning](tutorial-train-models-with-aml.md).
3636

3737
See examples of classification and automated machine learning in these Python notebooks: [Fraud Detection](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/automated-machine-learning/classification-credit-card-fraud/auto-ml-classification-credit-card-fraud.ipynb), [Marketing Prediction](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/automated-machine-learning/classification-bank-marketing-all-features/auto-ml-classification-bank-marketing-all-features.ipynb), and [Newsgroup Data Classification](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/automated-machine-learning/classification-text-dnn/auto-ml-classification-text-dnn.ipynb)
3838

3939
### Regression
40-
Similar to classification, regression tasks are also a common supervised learning task. Azure Machine Learning offers [featurizations specifically for these tasks](how-to-use-automated-ml-for-ml-models.md#featurization).
40+
41+
Similar to classification, regression tasks are also a common supervised learning task. Azure Machine Learning offers [featurizations specifically for these tasks](how-to-configure-auto-features.md#featurization).
4142

4243
Different from classification where predicted output values are categorical, regression models predict numerical output values based on independent predictors. In regression, the objective is to help establish the relationship among those independent predictor variables by estimating how one variable impacts the others. For example, automobile price based on features like, gas mileage, safety rating, etc. Learn more and see an example of [regression with automated machine learning](tutorial-auto-train-models.md).
4344

@@ -95,21 +96,22 @@ While model building is automated, you can also [learn how important or relevant
9596

9697
> [!VIDEO https://www.microsoft.com/videoplayer/embed/RE2Xc9t]
9798
98-
<a name="preprocess"></a>
9999

100-
## Preprocessing
100+
## Feature engineering
101+
102+
Feature engineering is the process of using domain knowledge of the data to create features that help ML algorithms learn better. In Azure Machine Learning, scaling and normalization techniques are applied to facilitate feature engineering. Collectively, these techniques and feature engineering are referred to as featurization.
101103

102-
In every automated machine learning experiment, your data is preprocessed using the default methods and optionally through advanced preprocessing.
104+
For automated machine learning experiments, featurization is applied automatically, but can also be customized based on your data. [Learn more about what featurization is included](how-to-configure-auto-features.md#featurization).
103105

104106
> [!NOTE]
105-
> Automated machine learning pre-processing steps (feature normalization, handling missing data,
107+
> Automated machine learning featurization steps (feature normalization, handling missing data,
106108
> converting text to numeric, etc.) become part of the underlying model. When using the model for
107-
> predictions, the same pre-processing steps applied during training are applied to
109+
> predictions, the same featurization steps applied during training are applied to
108110
> your input data automatically.
109111
110-
### Automatic preprocessing (standard)
112+
### Automatic featurization (standard)
111113

112-
In every automated machine learning experiment, your data is automatically scaled or normalized to help algorithms perform well. During model training, one of the following scaling or normalization techniques will be applied to each model. Learn how autoML helps [prevent over-fitting and imbalanced data](concept-manage-ml-pitfalls.md) in your models.
114+
In every automated machine learning experiment, your data is automatically scaled or normalized to help algorithms perform well. During model training, one of the following scaling or normalization techniques will be applied to each model. Learn how AutoML helps [prevent over-fitting and imbalanced data](concept-manage-ml-pitfalls.md) in your models.
113115

114116
|Scaling&nbsp;&&nbsp;normalization| Description |
115117
| ------------- | ------------- |
@@ -121,16 +123,15 @@ In every automated machine learning experiment, your data is automatically scale
121123
| [TruncatedSVDWrapper](https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.TruncatedSVD.html) |This transformer performs linear dimensionality reduction by means of truncated singular value decomposition (SVD). Contrary to PCA, this estimator does not center the data before computing the singular value decomposition, which means it can work with scipy.sparse matrices efficiently |
122124
| [SparseNormalizer](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.Normalizer.html) | Each sample (that is, each row of the data matrix) with at least one non-zero component is rescaled independently of other samples so that its norm (l1 or l2) equals one |
123125

124-
### Advanced preprocessing & featurization
126+
### Customize featurization
125127

126-
Additional advanced preprocessing and featurization are also available, such as data guardrails, encoding, and transforms. [Learn more about what featurization is included](how-to-use-automated-ml-for-ml-models.md#featurization).
127-
Enable this setting with:
128+
Additional feature engineering techniques such as, encoding and transforms are also available.
128129

129-
+ Azure Machine Learning studio: Enable **Automatic featurization** in the **View additional configuration** section [with these steps](how-to-use-automated-ml-for-ml-models.md#create-and-run-experiment).
130-
131-
+ Python SDK: Specifying `"feauturization": 'auto' / 'off' / 'FeaturizationConfig'` for the [`AutoMLConfig` class](/python/api/azureml-train-automl-client/azureml.train.automl.automlconfig.automlconfig).
130+
Enable this setting with:
132131

132+
+ Azure Machine Learning studio: Enable **Automatic featurization** in the **View additional configuration** section [with these steps](how-to-use-automated-ml-for-ml-models.md#customize-featurization).
133133

134+
+ Python SDK: Specify `"feauturization": 'auto' / 'off' / 'FeaturizationConfig'` in your [AutoMLConfig](/python/api/azureml-train-automl-client/azureml.train.automl.automlconfig.automlconfig) object. Learn more about [enabling featurization]((how-to-configure-auto-features.md).
134135

135136
## <a name="ensemble"></a> Ensemble models
136137

0 commit comments

Comments
 (0)