Skip to content

Commit c2d284c

Browse files
authored
Merge pull request #238443 from EricWrightAtWork/erwright/forecasting-component-docs-stage1
Add docs/text for forecasting components and tabular data distributed training
2 parents e0ae4dd + eced7f8 commit c2d284c

17 files changed

+1846
-279
lines changed
Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
---
2+
title: Forecasting at scale
3+
titleSuffix: Azure Machine Learning
4+
description: Learn about different ways to scale forecasting model training
5+
services: machine-learning
6+
author: ericwrightatwork
7+
ms.author: erwright
8+
ms.reviewer: ssalgado
9+
ms.service: machine-learning
10+
ms.subservice: automl
11+
ms.topic: conceptual
12+
ms.custom: contperf-fy21q1, automl, FY21Q4-aml-seo-hack, sdkv2, event-tier1-build-2022
13+
ms.date: 08/01/2023
14+
show_latex: true
15+
---
16+
17+
# Forecasting at scale: many models and distributed training
18+
19+
This article is about training forecasting models on large quantities of historical data. Instructions and examples for training forecasting models in AutoML can be found in our [set up AutoML for time series forecasting](./how-to-auto-train-forecast.md) article.
20+
21+
Time series data can be large due to the number of series in the data, the number of historical observations, or both. **Many models** and hierarchical time series, or **HTS**, are scaling solutions for the former scenario, where the data consists of a large number of time series. In these cases, it can be beneficial for model accuracy and scalability to partition the data into groups and train a large number of independent models in parallel on the groups. Conversely, there are scenarios where one or a small number of high-capacity models is better. **Distributed DNN training** targets this case. We review concepts around these scenarios in the remainder of the article.
22+
23+
## Many models
24+
25+
The many models [components](concept-component.md) in AutoML enable you to train and manage millions of models in parallel. For example, suppose you have historical sales data for a large number of stores. You can use many models to launch parallel AutoML training jobs for each store, as in the following diagram:
26+
27+
:::image type="content" source="./media/how-to-auto-train-forecast/many-models.svg" alt-text="Diagram showing the AutoML many models workflow.":::
28+
29+
The many models training component applies AutoML's [model sweeping and selection](concept-automl-forecasting-sweeping.md) independently to each store in this example. This model independence aids scalability and can benefit model accuracy especially when the stores have diverging sales dynamics. However, a single model approach may yield more accurate forecasts when there are common sales dynamics. See the [distributed DNN training](#distributed-dnn-training) section for more details on that case.
30+
31+
You can configure the data partitioning, the [AutoML settings](how-to-auto-train-forecast.md#configure-experiment) for the models, and the degree of parallelism for many models training jobs. For examples, see our guide section on [many models components](how-to-auto-train-forecast.md#forecasting-at-scale-many-models).
32+
33+
## Hierarchical time series forecasting
34+
35+
It's common for time series in business applications to have nested attributes that form a hierarchy. Geography and product catalog attributes are often nested, for instance. Consider an example where the hierarchy has two geographic attributes, state and store ID, and two product attributes, category and SKU:
36+
37+
:::image type="content" source="./media/how-to-auto-train-forecast/hierarchy-data-table.svg" alt-text="Example table of hierarchical time series data.":::
38+
39+
This hierarchy is illustrated in the following diagram:
40+
41+
:::image type="content" source="./media/how-to-auto-train-forecast/data-tree.svg" alt-text="Diagram of data hierarchy for the example data.":::
42+
43+
Importantly, the sales quantities at the leaf (SKU) level add up to the aggregated sales quantities at the state and total sales levels. Hierarchical forecasting methods preserve these aggregation properties when forecasting the quantity sold at any level of the hierarchy. Forecasts with this property are **coherent** with respect to the hierarchy.
44+
45+
AutoML supports the following features for hierarchical time series (HTS):
46+
47+
* **Training at any level of the hierarchy**. In some cases, the leaf-level data may be noisy, but aggregates may be more amenable to forecasting.
48+
* **Retrieving point forecasts at any level of the hierarchy**. If the forecast level is "below" the training level, then forecasts from the training level are disaggregated via [average historical proportions](https://otexts.com/fpp3/single-level.html#average-historical-proportions) or [proportions of historical averages](https://otexts.com/fpp3/single-level.html#proportions-of-the-historical-averages). Training level forecasts are summed according to the aggregation structure when the forecast level is "above" the training level.
49+
* **Retrieving quantile/probabilistic forecasts for levels at or "below" the training level**. Current modeling capabilities support disaggregation of probabilistic forecasts.
50+
51+
HTS components in AutoML are built on top of [many models](#many-models), so HTS shares the scalable properties of many models.
52+
For examples, see our guide section on [HTS components](how-to-auto-train-forecast.md#forecasting-at-scale-hierarchical-time-series).
53+
54+
## Distributed DNN training
55+
56+
Data scenarios featuring large amounts of historical observations and/or large numbers of related time series may benefit from a scalable, single model approach. Accordingly, **AutoML supports distributed training and model search on temporal convolutional network (TCN) models**, which are a type of deep neural network (DNN) for time series data. For more information on AutoML's TCN model class, see our [DNN article](concept-automl-forecasting-deep-learning.md).
57+
58+
59+
Distributed DNN training achieves scalability using a data partitioning algorithm that respects time series boundaries. The following diagram illustrates a simple example with two partitions:
60+
61+
:::image type="content" source="./media/concept-automl-forecasting-at-scale/distributed-training-diagram.png" alt-text="Example diagram of a distributed training data partition.":::
62+
63+
During training, the DNN data loaders on each compute load just what they need to complete an iteration of back-propagation; **the whole dataset is never read into memory**. The partitions are further distributed across multiple compute cores (usually GPUs) on possibly multiple nodes to accelerate training. Coordination across computes is provided by the [Horovod](https://horovod.ai/) framework.
64+
65+
## Next steps
66+
67+
* Learn more about [how to set up AutoML to train a time-series forecasting model](./how-to-auto-train-forecast.md).
68+
* Learn about [how AutoML uses machine learning to build forecasting models](./concept-automl-forecasting-methods.md).
69+
* Learn about [deep learning models](./concept-automl-forecasting-deep-learning.md) for forecasting in AutoML

articles/machine-learning/concept-automl-forecasting-deep-learning.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ ms.service: machine-learning
1010
ms.subservice: automl
1111
ms.topic: conceptual
1212
ms.custom: contperf-fy21q1, automl, FY21Q4-aml-seo-hack, sdkv2, event-tier1-build-2022
13-
ms.date: 02/24/2023
13+
ms.date: 08/01/2023
1414
show_latex: true
1515
---
1616

@@ -90,9 +90,9 @@ AutoML executes several preprocessing steps on your data to prepare for model tr
9090
|--|--|
9191
Fill missing data|[Impute missing values and observation gaps](./concept-automl-forecasting-methods.md#missing-data-handling) and optionally [pad or drop short time series](./how-to-auto-train-forecast.md#short-series-handling)|
9292
|Create calendar features|Augment the input data with [features derived from the calendar](./concept-automl-forecasting-calendar-features.md) like day of the week and, optionally, holidays for a specific country/region.|
93-
|Encode categorical data|[Label encode](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html) strings and other categorical types; this includes all [time series ID columns](./how-to-auto-train-forecast.md#configuration-settings).|
93+
|Encode categorical data|[Label encode](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html) strings and other categorical types; this includes all [time series ID columns](./how-to-auto-train-forecast.md#forecasting-job-settings).|
9494
|Target transform|Optionally apply the natural logarithm function to the target depending on the results of certain statistical tests.|
95-
|Normalization|[Z-score normalize](https://en.wikipedia.org/wiki/Standard_score) all numeric data; normalization is performed per feature and per time series group, as defined by the [time series ID columns](./how-to-auto-train-forecast.md#configuration-settings).
95+
|Normalization|[Z-score normalize](https://en.wikipedia.org/wiki/Standard_score) all numeric data; normalization is performed per feature and per time series group, as defined by the [time series ID columns](./how-to-auto-train-forecast.md#forecasting-job-settings).
9696

9797
These steps are included in AutoML's transform pipelines, so they are automatically applied when needed at inference time. In some cases, the inverse operation to a step is included in the inference pipeline. For example, if AutoML applied a $\log$ transform to the target during training, the raw forecasts are exponentiated in the inference pipeline.
9898

Lines changed: 110 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,110 @@
1+
---
2+
title: Inference and evaluation of forecasting models
3+
titleSuffix: Azure Machine Learning
4+
description: Learn about different ways to inference and evaluate forecasting models
5+
services: machine-learning
6+
author: ericwrightatwork
7+
ms.author: erwright
8+
ms.reviewer: ssalgado
9+
ms.service: machine-learning
10+
ms.subservice: automl
11+
ms.topic: conceptual
12+
ms.custom: contperf-fy21q1, automl, FY21Q4-aml-seo-hack, sdkv2, event-tier1-build-2022
13+
ms.date: 08/01/2023
14+
show_latex: true
15+
---
16+
17+
# Inference and evaluation of forecasting models
18+
19+
This article introduces concepts related to model inference and evaluation in forecasting tasks. Instructions and examples for training forecasting models in AutoML can be found in our [set up AutoML for time series forecasting](./how-to-auto-train-forecast.md) article.
20+
21+
Once you've used AutoML to train and select a best model, the next step is to generate forecasts and then, if possible, to evaluate their accuracy on a test set held out from the training data. To see how to setup and run forecasting model evaluation in automated machine learning, see our guide on [inference and evaluation components](how-to-auto-train-forecast.md#orchestrating-training-inference-and-evaluation-with-components-and-pipelines).
22+
23+
## Inference scenarios
24+
25+
In machine learning, inference is the process of generating model predictions for new data not used in training. There are multiple ways to generate predictions in forecasting due to the time dependence of the data. The simplest scenario is when the inference period immediately follows the training period and we generate predictions out to the forecast horizon. This scenario is illustrated in the following diagram:
26+
27+
:::image type="content" source="media/concept-automl-forecasting-evaluation/forecast-diagram.png" alt-text="Diagram demonstrating a forecast immediately following the training period.":::
28+
29+
The diagram shows two important inference parameters:
30+
31+
* The **context length**, or the amount of history that the model requires to make a forecast,
32+
* The **forecast horizon**, which is how far ahead in time the forecaster is trained to predict.
33+
34+
Forecasting models usually use some historical information, the context, to make predictions ahead in time up to the forecast horizon. **When the context is part of the training data, AutoML saves what it needs to make forecasts**, so there is no need to explicitly provide it.
35+
36+
There are two other inference scenarios that are more complicated:
37+
38+
* Generating predictions farther into the future than the forecast horizon,
39+
* Getting predictions when there is a gap between the training and inference periods.
40+
41+
We review these cases in the following sub-sections.
42+
43+
### Prediction past the forecast horizon: recursive forecasting
44+
45+
When you need forecasts past the horizon, AutoML applies the model recursively over the inference period. This means that predictions from the model are _fed back as input_ in order to generate predictions for subsequent forecasting windows. The following diagram shows a simple example:
46+
47+
:::image type="content" source="media/concept-automl-forecasting-evaluation/recursive-forecast-diagram.png" alt-text="Diagram demonstrating a recursive forecast on a test set.":::
48+
49+
Here, we generate forecasts on a period three times the length of the horizon by using predictions from one window as the context for the next window.
50+
51+
> [!WARNING]
52+
> Recursive forecasting compounds modeling errors, so predictions become less accurate the farther they are from the original forecast horizon. You may find a more accurate model by re-training with a longer horizon in this case.
53+
54+
### Prediction with a gap between training and inference periods
55+
56+
Suppose that you've trained a model in the past and you want to use it to make predictions from new observations that weren't yet available during training. In this case, there's a time gap between the training and inference periods:
57+
58+
:::image type="content" source="media/concept-automl-forecasting-evaluation/forecasting-with-gap-diagram.png" alt-text="Diagram demonstrating a forecast with a gap between the training and inference periods.":::
59+
60+
AutoML supports this inference scenario, but **you need to provide the context data in the gap period**, as shown in the diagram. The prediction data passed to the [inference component](how-to-auto-train-forecast.md#orchestrating-training-inference-and-evaluation-with-components-and-pipelines) needs values for features and observed target values in the gap and missing values or "NaN" values for the target in the inference period. The following table shows an example of this pattern:
61+
62+
:::image type="content" source="media/concept-automl-forecasting-evaluation/forecasting-with-gap-table.png" alt-text="Table showing an example of prediction data when there's a gap between the training and inference periods.":::
63+
64+
Here, known values of the target and features are provided for 2023-05-01 through 2023-05-03. Missing target values starting at 2023-05-04 indicate that the inference period starts at that date.
65+
66+
AutoML uses the new context data to update lag and other lookback features, and also to update models like ARIMA that keep an internal state. This operation _does not_ update or re-fit model parameters.
67+
68+
## Model evaluation
69+
70+
Evaluation is the process of generating predictions on a test set held-out from the training data and computing metrics from these predictions that guide model deployment decisions. Accordingly, there's an inference mode specifically suited for model evaluation - a rolling forecast. We review it in the following sub-section.
71+
72+
### Rolling forecast
73+
74+
A best practice procedure for evaluating a forecasting model is to roll the trained forecaster forward in time over the test set, averaging error metrics over several prediction windows. This procedure is sometimes called a **backtest**, depending on the context. Ideally, the test set for the evaluation is long relative to the model's forecast horizon. Estimates of forecasting error may otherwise be statistically noisy and, therefore, less reliable.
75+
76+
The following diagram shows a simple example with three forecasting windows:
77+
78+
:::image type="content" source="media/concept-automl-forecasting-evaluation/rolling-evaluation-diagram.png" alt-text="Diagram demonstrating a rolling forecast on a test set.":::
79+
80+
The diagram illustrates three rolling evaluation parameters:
81+
82+
* The **context length**, or the amount of history that the model requires to make a forecast,
83+
* The **forecast horizon**, which is how far ahead in time the forecaster is trained to predict,
84+
* The **step size**, which is how far ahead in time the rolling window advances on each iteration on the test set.
85+
86+
Importantly, the context advances along with the forecasting window. This means that actual values from the test set are used to make forecasts when they fall within the current context window. The latest date of actual values used for a given forecast window is called the **origin time** of the window. The following table shows an example output from the three-window rolling forecast with a horizon of three days and a step size of one day:
87+
88+
:::image type="content" source="media/concept-automl-forecasting-evaluation/rolling-evaluation-table.png" alt-text="Example output table from a rolling forecast.":::
89+
90+
With a table like this, we can visualize the forecasts vs. the actuals and compute desired evaluation metrics. AutoML pipelines can generate rolling forecasts on a test set with an [inference component](how-to-auto-train-forecast.md#orchestrating-training-inference-and-evaluation-with-components-and-pipelines).
91+
92+
> [!NOTE]
93+
> When the test period is the same length as the forecast horizon, a rolling forecast gives a single window of forecasts up to the horizon.
94+
95+
## Evaluation metrics
96+
97+
The choice of evaluation summary or metric is usually driven by the specific business scenario. Some common choices include the following:
98+
99+
* Plots of observed target values vs. forecasted values to check that certain dynamics of the data are captured by the model,
100+
* MAPE (mean absolute percentage error) between actual and forecasted values,
101+
* RMSE (root mean squared error), possibly with a normalization, between actual and forecasted values,
102+
* MAE (mean absolute error), possibly with a normalization, between actual and forecasted values.
103+
104+
There are many other possibilities, depending on the business scenario. You may need to create your own post-processing utilities for computing evaluation metrics from inference results or rolling forecasts. For more information on metrics, see our [regression and forecasting metrics](how-to-understand-automated-ml.md#regressionforecasting-metrics) article section.
105+
106+
## Next steps
107+
108+
* Learn more about [how to set up AutoML to train a time-series forecasting model](./how-to-auto-train-forecast.md).
109+
* Learn about [how AutoML uses machine learning to build forecasting models](./concept-automl-forecasting-methods.md).
110+
* Read answers to [frequently asked questions](./how-to-automl-forecasting-faq.md) about forecasting in AutoML.

0 commit comments

Comments
 (0)