Skip to content

Commit bed0d8b

Browse files
authored
Merge pull request #115309 from nibaccam/seo-automl
AutoML | SEO round 1
2 parents 61696c4 + a4b8348 commit bed0d8b

8 files changed

+30
-19
lines changed

articles/machine-learning/concept-automated-ml.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -109,7 +109,7 @@ In every automated machine learning experiment, your data is preprocessed using
109109
110110
### Automatic preprocessing (standard)
111111

112-
In every automated machine learning experiment, your data is automatically scaled or normalized to help algorithms perform well. During model training, one of the following scaling or normalization techniques will be applied to each model.
112+
In every automated machine learning experiment, your data is automatically scaled or normalized to help algorithms perform well. During model training, one of the following scaling or normalization techniques will be applied to each model. Learn how autoML helps [prevent over-fitting and imbalanced data](concept-manage-ml-pitfalls.md) in your models.
113113

114114
|Scaling & normalization| Description |
115115
| ------------- | ------------- |

articles/machine-learning/concept-manage-ml-pitfalls.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -53,17 +53,17 @@ In the context of automated ML, the first three items above are **best-practices
5353

5454
Using **more data** is the simplest and best possible way to prevent over-fitting, and as an added bonus typically increases accuracy. When you use more data, it becomes harder for the model to memorize exact patterns, and it is forced to reach solutions that are more flexible to accommodate more conditions. It's also important to recognize **statistical bias**, to ensure your training data doesn't include isolated patterns that won't exist in live-prediction data. This scenario can be difficult to solve, because there may not be over-fitting between your train and test sets, but there may be over-fitting present when compared to live test data.
5555

56-
Target leakage is a similar issue, where you may not see over-fitting between train/test sets, but rather it appears at prediction-time. Target leakage occurs when your model "cheats" during training by having access to data that it shouldn't normally have at prediction-time. For example, if your problem is to predict on Monday what a commodity price will be on Friday, but one of your features accidentally included data from Thursdays, that would be data the model won't have at prediction-time since it cannot see into the future. Target leakage is an easy mistake to miss, but is often characterized by abnormally high accuracy for your problem. If you are attempting to predict stock price and trained a model at 95% accuracy, there is likely target leakage somewhere in your features.
56+
**Target leakage** is a similar issue, where you may not see over-fitting between train/test sets, but rather it appears at prediction-time. Target leakage occurs when your model "cheats" during training by having access to data that it shouldn't normally have at prediction-time. For example, if your problem is to predict on Monday what a commodity price will be on Friday, but one of your features accidentally included data from Thursdays, that would be data the model won't have at prediction-time since it cannot see into the future. Target leakage is an easy mistake to miss, but is often characterized by abnormally high accuracy for your problem. If you are attempting to predict stock price and trained a model at 95% accuracy, there is likely target leakage somewhere in your features.
5757

58-
Removing features can also help with over-fitting by preventing the model from having too many fields to use to memorize specific patterns, thus causing it to be more flexible. It can be difficult to measure quantitatively, but if you can remove features and retain the same accuracy, you have likely made the model more flexible and have reduced the risk of over-fitting.
58+
**Removing features** can also help with over-fitting by preventing the model from having too many fields to use to memorize specific patterns, thus causing it to be more flexible. It can be difficult to measure quantitatively, but if you can remove features and retain the same accuracy, you have likely made the model more flexible and have reduced the risk of over-fitting.
5959

6060
### Best practices automated ML implements
6161

62-
Regularization is the process of minimizing a cost function to penalize complex and over-fitted models. There are different types of regularization functions, but in general they all penalize model coefficient size, variance, and complexity. Automated ML uses L1 (Lasso), L2 (Ridge), and ElasticNet (L1 and L2 simultaneously) in different combinations with different model hyperparameter settings that control over-fitting. In simple terms, automated ML will vary how much a model is regulated and choose the best result.
62+
**Regularization** is the process of minimizing a cost function to penalize complex and over-fitted models. There are different types of regularization functions, but in general they all penalize model coefficient size, variance, and complexity. Automated ML uses L1 (Lasso), L2 (Ridge), and ElasticNet (L1 and L2 simultaneously) in different combinations with different model hyperparameter settings that control over-fitting. In simple terms, automated ML will vary how much a model is regulated and choose the best result.
6363

64-
Automated ML also implements explicit model complexity limitations to prevent over-fitting. In most cases this implementation is specifically for decision tree or forest algorithms, where individual tree max-depth is limited, and the total number of trees used in forest or ensemble techniques are limited.
64+
Automated ML also implements explicit **model complexity limitations** to prevent over-fitting. In most cases this implementation is specifically for decision tree or forest algorithms, where individual tree max-depth is limited, and the total number of trees used in forest or ensemble techniques are limited.
6565

66-
Cross-validation (CV) is the process of taking many subsets of your full training data and training a model on each subset. The idea is that a model could get "lucky" and have great accuracy with one subset, but by using many subsets the model won't achieve this high accuracy every time. When doing CV, you provide a validation holdout dataset, specify your CV folds (number of subsets) and automated ML will train your model and tune hyperparameters to minimize error on your validation set. One CV fold could be over-fit, but by using many of them it reduces the probability that your final model is over-fit. The tradeoff is that CV does result in longer training times and thus greater cost, because instead of training a model once, you train it once for each *n* CV subsets.
66+
**Cross-validation (CV)** is the process of taking many subsets of your full training data and training a model on each subset. The idea is that a model could get "lucky" and have great accuracy with one subset, but by using many subsets the model won't achieve this high accuracy every time. When doing CV, you provide a validation holdout dataset, specify your CV folds (number of subsets) and automated ML will train your model and tune hyperparameters to minimize error on your validation set. One CV fold could be over-fit, but by using many of them it reduces the probability that your final model is over-fit. The tradeoff is that CV does result in longer training times and thus greater cost, because instead of training a model once, you train it once for each *n* CV subsets.
6767

6868
> [!NOTE]
6969
> Cross-validation is not enabled by default; it must be configured in automated ML settings. However, after cross-validation is configured and a validation data set has been provided, the process is automated for you. See

articles/machine-learning/how-to-auto-train-forecast.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,9 @@ ms.date: 03/09/2020
1515
# Auto-train a time-series forecast model
1616
[!INCLUDE [aml-applies-to-basic-enterprise-sku](../../includes/aml-applies-to-basic-enterprise-sku.md)]
1717

18-
In this article, you learn how to configure and train a time-series forecasting regression model using automated machine learning in Azure Machine Learning.
18+
In this article, you learn how to configure and train a time-series forecasting regression model using automated machine learning in the [Azure Machine Learning Python SDK](https://docs.microsoft.com/python/api/overview/azure/ml/?view=azure-ml-py).
19+
20+
For a low code experience, see the [Tutorial: Forecast demand with automated machine learning](tutorial-automated-ml-forecast.md) for a time-series forecasting example using automated machine learning in the [Azure Machine Learning studio](https://ml.azure.com/).
1921

2022
Configuring a forecasting model is similar to setting up a standard regression model using automated machine learning, but certain configuration options and pre-processing steps exist for working with time-series data.
2123

@@ -35,7 +37,6 @@ Features extracted from the training data play a critical role. And, automated M
3537

3638
## Time-series and deep learning models
3739

38-
3940
Automated ML's deep learning allows for forecasting univariate and multivariate time series data.
4041

4142
Deep learning models have three intrinsic capabilities:
@@ -47,7 +48,6 @@ Given larger data, deep learning models, such as Microsoft's ForecastTCN, can im
4748

4849
Automated ML provides users with both native time-series and deep learning models as part of the recommendation system.
4950

50-
5151
Models| Description | Benefits
5252
----|----|---
5353
Prophet (Preview)|Prophet works best with time series that have strong seasonal effects and several seasons of historical data. | Accurate & fast, robust to outliers, missing data, and dramatic changes in your time series.
@@ -112,7 +112,7 @@ For time series forecasting Rolling Origin Cross Validation (ROCV) is used to sp
112112

113113
![alt text](./media/how-to-auto-train-forecast/ROCV.svg)
114114

115-
This strategy will preserve the time series data integrity and eliminate the risk of data leakage. ROCV is automatically used for forecasting tasks by passing the training and validation data together and setting the number of cross validation folds using `n_cross_validations`.
115+
This strategy will preserve the time series data integrity and eliminate the risk of data leakage. ROCV is automatically used for forecasting tasks by passing the training and validation data together and setting the number of cross validation folds using `n_cross_validations`. Learn more about how auto ML applies cross validation to [prevent over-fitting models](concept-manage-ml-pitfalls.md#prevent-over-fitting).
116116

117117
```python
118118
automl_config = AutoMLConfig(task='forecasting',

articles/machine-learning/how-to-configure-auto-train.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -118,6 +118,7 @@ You can specify separate train and validation sets directly in the `AutoMLConfig
118118

119119
Use `n_cross_validations` setting to specify the number of cross validations. The training data set will be randomly split into `n_cross_validations` folds of equal size. During each cross validation round, one of the folds will be used for validation of the model trained on the remaining folds. This process repeats for `n_cross_validations` rounds until each fold is used once as validation set. The average scores across all `n_cross_validations` rounds will be reported, and the corresponding model will be retrained on the whole training data set.
120120

121+
Learn more about how autoML applies cross validation to [prevent over-fitting models](concept-manage-ml-pitfalls.md#prevent-over-fitting).
121122
### Monte Carlo Cross Validation (Repeated Random Sub-Sampling)
122123

123124
Use `validation_size` to specify the percentage of the training dataset that should be used for validation, and use `n_cross_validations` to specify the number of cross validations. During each cross validation round, a subset of size `validation_size` will be randomly selected for validation of the model trained on the remaining data. Finally, the average scores across all `n_cross_validations` rounds will be reported, and the corresponding model will be retrained on the whole training data set. Monte Carlo is not supported for time series forecasting.

articles/machine-learning/how-to-use-automated-ml-for-ml-models.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -155,7 +155,7 @@ Kurtosis| Measure of how heavily tailed this column's data is compared to a norm
155155

156156
## Advanced featurization options
157157

158-
Automated machine learning offers preprocessing and data guardrails automatically, to help you identify and manage potential issues with your data.
158+
Automated machine learning offers preprocessing and data guardrails automatically, to help you identify and manage potential issues with your data, like [over-fitting and imbalanced data](concept-manage-ml-pitfalls.md#prevent-over-fitting).
159159

160160
### Preprocessing
161161

articles/machine-learning/how-to-use-labeled-dataset.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,8 @@ description: Learn how to export data labels from your Azure Machine Learning la
55
author: nibaccam
66
ms.author: nibaccam
77
ms.service: machine-learning
8-
ms.topic: how-to
9-
ms.date: 01/21/2020
8+
ms.topic: conceptual
9+
ms.date: 05/14/2020
1010

1111
# Customer intent: As an experienced Python developer, I need to export my data labels and use them for machine learning tasks.
1212
---

articles/machine-learning/tutorial-automated-ml-forecast.md

Lines changed: 12 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
title: Forecast bike sharing demand with automated ML experiment
2+
title: Tutorial:Demand forecasting & AutoML
33
titleSuffix: Azure Machine Learning
44
description: Learn how to train and deploy a demand forecasting model with automated machine learning in Azure Machine Learning studio.
55
services: machine-learning
@@ -9,21 +9,24 @@ ms.topic: tutorial
99
ms.author: sacartac
1010
ms.reviewer: nibaccam
1111
author: cartacioS
12-
ms.date: 01/27/2020
12+
ms.date: 05/19/2020
1313

1414
# Customer intent: As a non-coding data scientist, I want to use automated machine learning to build a demand forecasting model.
1515
---
1616

17-
# Tutorial: Forecast bike sharing demand with automated machine learning
17+
# Tutorial: Forecast demand with automated machine learning
1818
[!INCLUDE [applies-to-skus](../../includes/aml-applies-to-enterprise-sku.md)]
1919

20-
In this tutorial, you use automated machine learning, or automated ML, in the Azure Machine Learning studio to create a time series forecasting model to predict rental demand for a bike sharing service.
20+
In this tutorial, you use automated machine learning, or automated ML, in the Azure Machine Learning studio to create a time-series forecasting model to predict rental demand for a bike sharing service.
21+
22+
For a classification model example, see [Tutorial: Create a classification model with automated ML in Azure Machine Learning](tutorial-first-experiment-automated-ml.md).
2123

2224
In this tutorial, you learn how to do the following tasks:
2325

2426
> [!div class="checklist"]
2527
> * Create and load a dataset.
2628
> * Configure and run an automated ML experiment.
29+
> * Specify forecasting settings.
2730
> * Explore the experiment results.
2831
> * Deploy the best model.
2932
@@ -126,7 +129,7 @@ Complete the setup for your automated ML experiment by specifying the machine le
126129

127130
1. Select **date** as your **Time column** and leave **Group by column(s)** blank.
128131

129-
1. Select **View additional configuration settings** and populate the fields as follows. These settings are to better control the training job. Otherwise, defaults are applied based on experiment selection and data.
132+
1. Select **View additional configuration settings** and populate the fields as follows. These settings are to better control the training job and specify settings for your forecast. Otherwise, defaults are applied based on experiment selection and data.
130133

131134

132135
Additional configurations|Description|Value for tutorial
@@ -221,6 +224,10 @@ See this article for steps on how to create a Power BI supported schema to facil
221224
> [!div class="nextstepaction"]
222225
> [Consume a web service](how-to-consume-web-service.md#consume-the-service-from-power-bi)
223226
227+
+ Learn more about [automated machine learning](concept-automated-ml.md).
228+
+ For more information on classification metrics and charts, see the [Understand automated machine learning results](how-to-understand-automated-ml.md#classification) article.
229+
+ Learn more about [featurization](how-to-use-automated-ml-for-ml-models.md#featurization).
230+
+ Learn more about [data profiling](how-to-use-automated-ml-for-ml-models.md#profile).
224231

225232
>[!NOTE]
226233
> This bike share dataset has been modified for this tutorial. This dataset was made available as part of a [Kaggle competition](https://www.kaggle.com/c/bike-sharing-demand/data) and was originally available via [Capital Bikeshare](https://www.capitalbikeshare.com/system-data). It can also be found within the [UCI Machine Learning Database](http://archive.ics.uci.edu/ml/datasets/Bike+Sharing+Dataset).<br><br>

articles/machine-learning/tutorial-first-experiment-automated-ml.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,8 @@ In this tutorial, you learn how to create a basic classification model without w
2121

2222
With automated machine learning, you can automate away time intensive tasks. Automated machine learning rapidly iterates over many combinations of algorithms and hyperparameters to help you find the best model based on a success metric of your choosing.
2323

24+
For a time-series forecasting example, see [Tutorial: Demand forecasting & AutoML](tutorial-automated-ml-forecast.md).
25+
2426
In this tutorial, you learn how to do the following tasks:
2527

2628
> [!div class="checklist"]
@@ -219,7 +221,8 @@ In this automated machine learning tutorial, you used Azure Machine Learning's a
219221
> [Consume a web service](how-to-consume-web-service.md#consume-the-service-from-power-bi)
220222
221223
+ Learn more about [automated machine learning](concept-automated-ml.md).
222-
+ For more information on classification metrics and charts, see the [Understand automated machine learning results](how-to-understand-automated-ml.md#classification) article.+ Learn more about [featurization](how-to-use-automated-ml-for-ml-models.md#featurization).
224+
+ For more information on classification metrics and charts, see the [Understand automated machine learning results](how-to-understand-automated-ml.md#classification) article.
225+
+ Learn more about [featurization](how-to-use-automated-ml-for-ml-models.md#featurization).
223226
+ Learn more about [data profiling](how-to-use-automated-ml-for-ml-models.md#profile).
224227

225228

0 commit comments

Comments
 (0)