Skip to content

Commit 8725da3

Browse files
address review
1 parent 82ace43 commit 8725da3

File tree

3 files changed

+48
-15
lines changed

3 files changed

+48
-15
lines changed

articles/machine-learning/how-to-auto-train-forecast.md

Lines changed: 26 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,10 @@ show_latex: true
1818

1919
[!INCLUDE [sdk v2](../../includes/machine-learning-sdk-v2.md)]
2020

21+
> [!div class="op_single_selector" title1="Select the version of the Azure Machine Learning SDK you are using:"]
22+
> * [v1](./v1/how-to-auto-train-forecast-v1.md)
23+
> * [v2 (current version)](how-to-auto-train-forecast.md)
24+
2125
In this article, you'll learn how to set up AutoML training for time-series forecasting models with Azure Machine Learning automated ML in the [Azure Machine Learning Python SDK](/python/api/overview/azure/ai-ml-readme).
2226

2327
To do so, you:
@@ -141,6 +145,28 @@ Other settings are optional and reviewed in the [optional settings](#optional-se
141145

142146
Optional configurations are available for forecasting tasks, such as enabling deep learning and specifying a target rolling window aggregation. A complete list of parameters is available in the [forecast_settings API doc](/python/api/azure-ai-ml/azure.ai.ml.automl.forecastingjob#azure-ai-ml-automl-forecastingjob-set-forecast-settings).
143147

148+
#### Model search settings
149+
150+
There are two optional settings that control the model space where AutoML searches for the best model, `allowed_training_algorithms` and `blocked_training_algorithms`. To restrict the search space to a given set of model classes, use allowed_training_algorithms as in the following sample:
151+
152+
```python
153+
# Only search ExponentialSmoothing and ElasticNet models
154+
forecasting_job.set_training(
155+
allowed_training_algorithms=["ExponentialSmoothing", "ElasticNet"]
156+
)
157+
```
158+
159+
In this case, the forecasting job _only_ searches over Exponential Smoothing and Elastic Net model classes. To remove a given set of model classes from the search space, use the blocked_training_algorithms as in the following sample:
160+
161+
```python
162+
# Search over all model classes except Prophet
163+
forecasting_job.set_training(
164+
blocked_training_algorithms=["Prophet"]
165+
)
166+
```
167+
168+
Now, the job searches over all model classes _except_ Prophet. For a list of forecasting model names that are accepted in `allowed_training_algorithms` and `blocked_training_algorithms`, see [supported forecasting models](/python/api/azureml-train-automl-client/azureml.train.automl.constants.supportedmodels.forecasting) and [supported regression models](/python/api/azureml-train-automl-client/azureml.train.automl.constants.supportedmodels.regression).
169+
144170
#### Enable deep learning
145171

146172
AutoML ships with a custom deep neural network (DNN) model called `ForecastTCN`. This model is a [temporal convolutional network](https://arxiv.org/abs/1803.01271), or TCN, that applies common imaging task methods to time series modeling. Namely, one-dimensional "causal" convolutions form the backbone of the network and enable the model to learn complex patterns over long durations in the training history.
@@ -257,8 +283,6 @@ forecasting_job.set_forecast_settings(
257283
)
258284
```
259285

260-
One or both of these settings can be set to `"auto"` if you want AutoML to make the determination.
261-
262286
### Custom featurization
263287

264288
By default, AutoML augments training data with engineered features to increase the accuracy of the models. See [automated feature engineering](./concept-automl-forecasting-methods.md#automated-feature-engineering) for more information. Some of the preprocessing steps can be customized using the `set_featurization()` method of the forecasting job.

articles/machine-learning/how-to-automl-forecasting-faq.md

Lines changed: 18 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -21,8 +21,8 @@ This article answers common questions about forecasting in AutoML. See the [meth
2121

2222
## How do I start building forecasting models in AutoML?
2323
You can start by reading our guide on [setting up AutoML to train a time-series forecasting model with Python](./how-to-auto-train-forecast.md). We've also provided hands-on examples in several Jupyter notebooks:
24-
1. [Bike share example](https://github.com/Azure/azureml-examples/blob/main/v1/python-sdk/tutorials/automl-with-azureml/forecasting-bike-share/auto-ml-forecasting-bike-share.ipynb)
25-
2. [Forecasting using deep learning](https://github.com/Azure/azureml-examples/blob/main/v1/python-sdk/tutorials/automl-with-azureml/forecasting-github-dau/auto-ml-forecasting-github-dau.ipynb)
24+
1. [Bike share example](https://github.com/Azure/azureml-examples/blob/main/sdk/python/jobs/automl-standalone-jobs/automl-forecasting-task-bike-share/auto-ml-forecasting-bike-share.ipynb)
25+
2. [Forecasting using deep learning](https://github.com/Azure/azureml-examples/blob/main/sdk/python/jobs/automl-standalone-jobs/automl-forecasting-github-dau/auto-ml-forecasting-github-dau.ipynb)
2626
3. [Many models](https://github.com/Azure/azureml-examples/blob/main/v1/python-sdk/tutorials/automl-with-azureml/forecasting-many-models/auto-ml-forecasting-many-models.ipynb)
2727
4. [Forecasting Recipes](https://github.com/Azure/azureml-examples/blob/main/v1/python-sdk/tutorials/automl-with-azureml/forecasting-recipes-univariate/auto-ml-forecasting-univariate-recipe-experiment-settings.ipynb)
2828
5. [Advanced forecasting scenarios](https://github.com/Azure/azureml-examples/blob/main/v1/python-sdk/tutorials/automl-with-azureml/forecasting-forecast-function/auto-ml-forecasting-function.ipynb)
@@ -36,7 +36,7 @@ One common source of slow runtime is training AutoML with default settings on da
3636
## How can I make AutoML faster?
3737
See the ["why is AutoML slow on my data"](#why-is-automl-slow-on-my-data) answer to understand why it may be slow in your case.
3838
Consider the following configuration changes that may speed up your job:
39-
- Block time series models like ARIMA and Prophet
39+
- [Block time series models](./how-to-auto-train-forecast.md#model-search-settings) like ARIMA and Prophet
4040
- Turn off look-back features like lags and rolling windows
4141
- Reduce
4242
- number of trials/iterations
@@ -49,12 +49,12 @@ Consider the following configuration changes that may speed up your job:
4949

5050
There are four basic configurations supported by AutoML forecasting:
5151

52-
Configuration|Scenario|Pros|Cons
53-
--|--|--|--
54-
**Default AutoML**|Recommended if the dataset has a small number of time series that have roughly similar historic behavior.|<li> Simple to configure from code/SDK or AzureML Studio <br> <li> AutoML has the chance to cross-learn across different time series since the regression models pool all series together in training. See the [model grouping](./concept-automl-forecasting-methods.md#model-grouping) section for more information.|<li> Regression models may be less accurate if the time series in the training data have divergent behavior <br> <li> Time series models may take a long time to train if there are a large number of series in the training data. See the ["why is AutoML slow on my data"](#why-is-automl-slow-on-my-data) answer for more information.
55-
**AutoML with deep learning**|Recommended for datasets with more than 1000 observations and, potentially, numerous time series exhibiting complex patterns. When enabled, AutoML will sweep over temporal convolutional neural network (TCN) models during training. See the [enable deep learning](./how-to-auto-train-forecast.md#enable-deep-learning) section for more information.|<li> Simple to configure from code/SDK or AzureML Studio <br> <li> Cross-learning opportunities since the TCN pools data over all series <br> <li> Potentially higher accuracy due to the large capacity of DNN models. See the [forecasting models in AutoML](./concept-automl-forecasting-methods.md#forecasting-models-in-automl) section for more information.|<li> Training can take much longer due to the complexity of DNN models <br> <li> Series with small amounts of history are unlikely to benefit from these models.
56-
**Many Models**|Recommended if you need to train and manage a large number of forecasting models in a scalable way. See the [forecasting at scale](./how-to-auto-train-forecast.md#forecasting-at-scale) section for more information.|<li> Scalable <br> <li> Potentially higher accuracy when time series have divergent behavior from one another.|<li> No cross-learning across time series <br> <li> You can't configure or launch Many Models jobs from AzureML Studio, only the code/SDK experience is currently available.
57-
**Hierarchical Time Series**|HTS is recommended if the series in your data have nested, hierarchical structure and you need to train or make forecasts at aggregated levels of the hierarchy. See the [hierarchical time series forecasting](how-to-auto-train-forecast.md#hierarchical-time-series-forecasting) section for more information.|<li> Training at aggregated levels can reduce noise in the leaf node time series and potentially lead to higher accuracy models. <br> <li> Forecasts can be retrieved for any level of the hierarchy by aggregating or dis-aggregating forecasts from the training level.|You need to provide the aggregation level for training. AutoML doesn't currently have an algorithm to find an optimal level.
52+
|Configuration|Scenario|Pros|Cons|
53+
|--|--|--|--|
54+
|**Default AutoML**|Recommended if the dataset has a small number of time series that have roughly similar historic behavior.|<li> Simple to configure from code/SDK or AzureML Studio <br> <li> AutoML has the chance to cross-learn across different time series since the regression models pool all series together in training. See the [model grouping](./concept-automl-forecasting-methods.md#model-grouping) section for more information.|<li> Regression models may be less accurate if the time series in the training data have divergent behavior <br> <li> Time series models may take a long time to train if there are a large number of series in the training data. See the ["why is AutoML slow on my data"](#why-is-automl-slow-on-my-data) answer for more information.|
55+
|**AutoML with deep learning**|Recommended for datasets with more than 1000 observations and, potentially, numerous time series exhibiting complex patterns. When enabled, AutoML will sweep over temporal convolutional neural network (TCN) models during training. See the [enable deep learning](./how-to-auto-train-forecast.md#enable-deep-learning) section for more information.|<li> Simple to configure from code/SDK or AzureML Studio <br> <li> Cross-learning opportunities since the TCN pools data over all series <br> <li> Potentially higher accuracy due to the large capacity of DNN models. See the [forecasting models in AutoML](./concept-automl-forecasting-methods.md#forecasting-models-in-automl) section for more information.|<li> Training can take much longer due to the complexity of DNN models <br> <li> Series with small amounts of history are unlikely to benefit from these models.|
56+
|**Many Models**|Recommended if you need to train and manage a large number of forecasting models in a scalable way. See the [forecasting at scale](./how-to-auto-train-forecast.md#forecasting-at-scale) section for more information.|<li> Scalable <br> <li> Potentially higher accuracy when time series have divergent behavior from one another.|<li> No cross-learning across time series <br> <li> You can't configure or launch Many Models jobs from AzureML Studio, only the code/SDK experience is currently available.|
57+
|**Hierarchical Time Series**|HTS is recommended if the series in your data have nested, hierarchical structure and you need to train or make forecasts at aggregated levels of the hierarchy. See the [hierarchical time series forecasting](how-to-auto-train-forecast.md#hierarchical-time-series-forecasting) section for more information.|<li> Training at aggregated levels can reduce noise in the leaf node time series and potentially lead to higher accuracy models. <br> <li> Forecasts can be retrieved for any level of the hierarchy by aggregating or dis-aggregating forecasts from the training level.|You need to provide the aggregation level for training. AutoML doesn't currently have an algorithm to find an optimal level.|
5858

5959
> [!NOTE]
6060
> We recommend using compute nodes with GPUs when deep learning is enabled to best take advantage of high DNN capacity. Training time can be much faster in comparison to nodes with only CPUs. See the GPU optimized compute article for more information.
@@ -68,7 +68,8 @@ AutoML uses machine learning best practices, such as cross-validated model selec
6868

6969
- The input data contains **feature columns that are derived from the target with a simple formula**. For example, a feature that is an exact multiple of the target can result in a nearly perfect training score. The model, however, will likely not generalize to out-of-sample data. We advise you to explore the data prior to model training and to drop columns that "leak" the target information.
7070
- The training data uses **features that are not known into the future**, up to the forecast horizon. AutoML's regression models currently assume all features are known to the forecast horizon. We advise you to explore your data prior to training and remove any feature columns that are only known historically.
71-
- There are **significant structural differences - regime changes - between the training, validation, or test portions of the data**. For example, consider the effect of the COVID-19 pandemic on demand for almost any good during 2020 and 2021; this is a classic example of a regime change. Over-fitting due to regime change is the most challenging issue to address because it's highly scenario dependent and can require deep knowledge to identify. As a first line of defense, try to reserve 10 - 20% of the total history for validation, or cross-validation, data. It isn't always possible to reserve this amount of validation data if the training history is short, but is a best practice. See our guide on [configuring validation](./how-to-auto-train-forecast.md#training-and-validation-data) for more information.
71+
- There are **significant structural differences - regime changes - between the training, validation, or test portions of the data**. For example, consider the effect of the COVID-19 pandemic on demand for almost any good during 2020 and 2021; this is a classic example of a regime change. Over-fitting due to regime change is the most challenging issue to address because it's highly scenario dependent and can require deep knowledge to identify. As a first line of defense, try to reserve 10 - 20% of the total history for validation, or cross-validation, data. It isn't always possible to reserve this amount of validation data if the training history is short, but is a best practice. See our guide on [configuring validation](./how-to-auto-train-forecast.md#training-and-validation-data) for more information.
72+
7273

7374
## What if my time series data doesn't have regularly spaced observations?
7475

@@ -98,7 +99,11 @@ The primary metric is very important since its value on validation data determin
9899
- Add new features that may help predict the target. Subject matter expertise can help greatly when selecting training data.
99100
- Compare validation and test metric values and determine if the selected model is under-fitting or over-fitting the data. This knowledge can guide you to a better training configuration. For example, you might determine that you need to use more cross-validation folds in response to over-fitting.
100101

101-
### How do I fix an Out-Of-Memory error?
102+
## Will AutoML always select the same best model given the same training data and configuration?
103+
104+
[AutoML's model search process](./concept-automl-forecasting-sweeping.md#model-sweeping) is not deterministic, so it does not always select the same model given the same data and configuration.
105+
106+
## How do I fix an Out-Of-Memory error?
102107

103108
There are two types of memory issues:
104109
- RAM Out-of-Memory
@@ -110,7 +115,7 @@ For default AutoML settings, RAM Out-of-Memory may be fixed by using compute nod
110115

111116
Disk Out-of-Memory errors may be resolved by deleting the compute cluster and creating a new one.
112117

113-
### What advanced forecasting scenarios are supported by AutoML?
118+
## What advanced forecasting scenarios are supported by AutoML?
114119

115120
We support the following advanced prediction scenarios:
116121
- Quantile forecasts
@@ -133,7 +138,7 @@ If your AutoML forecasting job fails, you'll see an error message in the studio
133138
> [!NOTE]
134139
> For Many Models or HTS job, training is usually on multi-node compute clusters. Logs for these jobs are present for each node IP address. You will need to search for error logs in each node in this case. The error logs, along with the driver logs, are in the `user_logs` folder for each node IP.
135140
136-
### What is a workspace / environment / experiment/ compute instance / compute target?
141+
## What is a workspace / environment / experiment/ compute instance / compute target?
137142

138143
If you aren't familiar with Azure Machine Learning concepts, start with the ["What is AzureML"](overview-what-is-azure-machine-learning.md) article and the [workspaces](./concept-workspace.md) article.
139144

articles/machine-learning/v1/how-to-auto-train-forecast-v1.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,10 @@ show_latex: true
1818

1919
[!INCLUDE [sdk v1](../../../includes/machine-learning-sdk-v1.md)]
2020

21+
> [!div class="op_single_selector" title1="Select the version of the Azure Machine Learning SDK you are using:"]
22+
> * [v1](how-to-auto-train-forecast-v1.md)
23+
> * [v2 (current version)](../how-to-auto-train-forecast.md)
24+
2125
In this article, you learn how to set up AutoML training for time-series forecasting models with Azure Machine Learning automated ML in the [Azure Machine Learning Python SDK](/python/api/overview/azure/ml/).
2226

2327
To do so, you:

0 commit comments

Comments
 (0)