You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This article introduces concepts related to model inference and evaluation in forecasting tasks. Instructions and examples for training forecasting models in AutoML can be found in our [set up AutoML for time series forecasting](./how-to-auto-train-forecast.md) article.
22
+
This article introduces concepts related to model inference and evaluation in forecasting tasks. For instructions and examples for training forecasting models in AutoML, see [Set up AutoML to train a time-series forecasting model with SDK and CLI](./how-to-auto-train-forecast.md).
22
23
23
-
Once you've used AutoML to train and select a best model, the next step is to generate forecasts and then, if possible, to evaluate their accuracy on a test set held out from the training data. To see how to setup and run forecasting model evaluation in automated machine learning, see our guide on [inference and evaluation components](how-to-auto-train-forecast.md#orchestrating-training-inference-and-evaluation-with-components-and-pipelines).
24
+
After you use AutoML to train and select a best model, the next step is to generate forecasts. Then, if possible, evaluate their accuracy on a test set held out from the training data. To see how to setup and run forecasting model evaluation in automated machine learning, see [Orchestrating training, inference, and evaluation](how-to-auto-train-forecast.md#orchestrating-training-inference-and-evaluation-with-components-and-pipelines).
24
25
25
26
## Inference scenarios
26
27
27
-
In machine learning, inference is the process of generating model predictions for new data not used in training. There are multiple ways to generate predictions in forecasting due to the time dependence of the data. The simplest scenario is when the inference period immediately follows the training period and we generate predictions out to the forecast horizon. This scenario is illustrated in the following diagram:
28
+
In machine learning, *inference* is the process of generating model predictions for new data not used in training. There are multiple ways to generate predictions in forecasting due to the time dependence of the data. The simplest scenario is when the inference period immediately follows the training period and you generate predictions out to the forecast horizon. The following diagram illustrates this scenario:
28
29
29
30
:::image type="content" source="media/concept-automl-forecasting-evaluation/forecast-diagram.png" alt-text="Diagram demonstrating a forecast immediately following the training period.":::
30
31
31
32
The diagram shows two important inference parameters:
32
33
33
-
* The **context length**, or the amount of history that the model requires to make a forecast,
34
-
* The **forecast horizon**, which is how far ahead in time the forecaster is trained to predict.
34
+
- The *context length* is the amount of history that the model requires to make a forecast.
35
+
- The *forecast horizon* is how far ahead in time the forecaster is trained to predict.
35
36
36
-
Forecasting models usually use some historical information, the context, to make predictions ahead in time up to the forecast horizon. **When the context is part of the training data, AutoML saves what it needs to make forecasts**, so there is no need to explicitly provide it.
37
+
Forecasting models usually use some historical information, the *context*, to make predictions ahead in time up to the forecast horizon. When the context is part of the training data, AutoML saves what it needs to make forecasts. There's no need to explicitly provide it.
37
38
38
-
There are two other inference scenarios that are more complicated:
39
+
There are two other inference scenarios that are more complicated:
39
40
40
-
* Generating predictions farther into the future than the forecast horizon,
41
-
* Getting predictions when there is a gap between the training and inference periods.
41
+
- Generating predictions farther into the future than the forecast horizon
42
+
- Getting predictions when there's a gap between the training and inference periods
42
43
43
-
We review these cases in the following sub-sections.
44
+
The following subsections review these cases.
44
45
45
-
### Prediction past the forecast horizon: recursive forecasting
46
+
### Predict past the forecast horizon: recursive forecasting
46
47
47
-
When you need forecasts past the horizon, AutoML applies the model recursively over the inference period. This means that predictions from the model are _fed back as input_ in order to generate predictions for subsequent forecasting windows. The following diagram shows a simple example:
48
+
When you need forecasts past the horizon, AutoML applies the model recursively over the inference period. Predictions from the model are *fed back as input* to generate predictions for subsequent forecasting windows. The following diagram shows a simple example:
48
49
49
50
:::image type="content" source="media/concept-automl-forecasting-evaluation/recursive-forecast-diagram.png" alt-text="Diagram demonstrating a recursive forecast on a test set.":::
50
51
51
-
Here, we generate forecasts on a period three times the length of the horizon by using predictions from one window as the context for the next window.
52
+
Here, machine learning generates forecasts on a period three times the length of the horizon. It uses predictions from one window as the context for the next window.
52
53
53
54
> [!WARNING]
54
-
> Recursive forecasting compounds modeling errors, so predictions become less accurate the farther they are from the original forecast horizon. You may find a more accurate model by re-training with a longer horizon in this case.
55
+
> Recursive forecasting compounds modeling errors. Predictions become less accurate the farther they are from the original forecast horizon. You might find a more accurate model by re-training with a longer horizon.
55
56
56
-
### Prediction with a gap between training and inference periods
57
+
### Predict with a gap between training and inference periods
57
58
58
-
Suppose that you've trained a model in the past and you want to use it to make predictions from new observations that weren't yet available during training. In this case, there's a time gap between the training and inference periods:
59
+
Suppose that after you train a model, you want to use it to make predictions from new observations that weren't yet available during training. In this case, there's a time gap between the training and inference periods:
59
60
60
61
:::image type="content" source="media/concept-automl-forecasting-evaluation/forecasting-with-gap-diagram.png" alt-text="Diagram demonstrating a forecast with a gap between the training and inference periods.":::
61
62
62
-
AutoML supports this inference scenario, but **you need to provide the context data in the gap period**, as shown in the diagram. The prediction data passed to the [inference component](how-to-auto-train-forecast.md#orchestrating-training-inference-and-evaluation-with-components-and-pipelines) needs values for features and observed target values in the gap and missing values or "NaN" values for the target in the inference period. The following table shows an example of this pattern:
63
-
63
+
AutoML supports this inference scenario, but you need to provide the context data in the gap period, as shown in the diagram. The prediction data passed to the [inference component](how-to-auto-train-forecast.md#orchestrating-training-inference-and-evaluation-with-components-and-pipelines) needs values for features and observed target values in the gap and missing values or `NaN` values for the target in the inference period. The following table shows an example of this pattern:
64
+
64
65
:::image type="content" source="media/concept-automl-forecasting-evaluation/forecasting-with-gap-table.png" alt-text="Table showing an example of prediction data when there's a gap between the training and inference periods.":::
65
66
66
-
Here, known values of the target and features are provided for 2023-05-01 through 2023-05-03. Missing target values starting at 2023-05-04 indicate that the inference period starts at that date.
67
+
Known values of the target and features are provided for `2023-05-01` through `2023-05-03`. Missing target values starting at `2023-05-04` indicate that the inference period starts at that date.
67
68
68
-
AutoML uses the new context data to update lag and other lookback features, and also to update models like ARIMA that keep an internal state. This operation _doesn't_ update or re-fit model parameters.
69
+
AutoML uses the new context data to update lag and other lookback features, and also to update models like ARIMA that keep an internal state. This operation *doesn't* update or refit model parameters.
69
70
70
-
## Model evaluation
71
-
72
-
Evaluation is the process of generating predictions on a test set held-out from the training data and computing metrics from these predictions that guide model deployment decisions. Accordingly, there's an inference mode suited for model evaluation - a rolling forecast. We review it in the following subsection.
71
+
## <aname="rolling-forecast"></a>Model evaluation
73
72
74
-
### Rolling forecast
73
+
*Evaluation* is the process of generating predictions on a test set held-out from the training data and computing metrics from these predictions that guide model deployment decisions. Accordingly, there's an inference mode suited for model evaluation: a rolling forecast.
75
74
76
-
A best practice procedure for evaluating a forecasting model is to roll the trained forecaster forward in time over the test set, averaging error metrics over several prediction windows. This procedure is sometimes called a **backtest**, depending on the context. Ideally, the test set for the evaluation is long relative to the model's forecast horizon. Estimates of forecasting error may otherwise be statistically noisy and, therefore, less reliable.
75
+
A best practice procedure for evaluating a forecasting model is to roll the trained forecaster forward in time over the test set, averaging error metrics over several prediction windows. This procedure is sometimes called a *backtest*. Ideally, the test set for the evaluation is long relative to the model's forecast horizon. Estimates of forecasting error might otherwise be statistically noisy and, therefore, less reliable.
77
76
78
77
The following diagram shows a simple example with three forecasting windows:
79
78
80
79
:::image type="content" source="media/concept-automl-forecasting-evaluation/rolling-evaluation-diagram.png" alt-text="Diagram demonstrating a rolling forecast on a test set.":::
81
80
82
81
The diagram illustrates three rolling evaluation parameters:
83
82
84
-
* The **context length**, or the amount of history that the model requires to make a forecast,
85
-
* The **forecast horizon**, which is how far ahead in time the forecaster is trained to predict,
86
-
* The **step size**, which is how far ahead in time the rolling window advances on each iteration on the test set.
83
+
- The *context length* is the amount of history that the model requires to make a forecast.
84
+
- The *forecast horizon*is how far ahead in time the forecaster is trained to predict.
85
+
- The *step size* is how far ahead in time the rolling window advances on each iteration on the test set.
87
86
88
-
Importantly, the context advances along with the forecasting window. This means that actual values from the test set are used to make forecasts when they fall within the current context window. The latest date of actual values used for a given forecast window is called the **origin time** of the window. The following table shows an example output from the three-window rolling forecast with a horizon of three days and a step size of one day:
87
+
The context advances along with the forecasting window. Actual values from the test set are used to make forecasts when they fall within the current context window. The latest date of actual values used for a given forecast window is called the *origin time* of the window. The following table shows an example output from the three-window rolling forecast with a horizon of three days and a step size of one day:
89
88
90
-
:::image type="content" source="media/concept-automl-forecasting-evaluation/rolling-evaluation-table.png" alt-text="Example output table from a rolling forecast.":::
89
+
:::image type="content" source="media/concept-automl-forecasting-evaluation/rolling-evaluation-table.png" alt-text="Diagram shows example output table from a rolling forecast.":::
91
90
92
-
With a table like this, we can visualize the forecasts vs. the actuals and compute desired evaluation metrics. AutoML pipelines can generate rolling forecasts on a test set with an [inference component](how-to-auto-train-forecast.md#orchestrating-training-inference-and-evaluation-with-components-and-pipelines).
91
+
With a table like this, you can visualize the forecasts versus the actuals and compute desired evaluation metrics. AutoML pipelines can generate rolling forecasts on a test set with an [inference component](how-to-auto-train-forecast.md#orchestrating-training-inference-and-evaluation-with-components-and-pipelines).
93
92
94
93
> [!NOTE]
95
94
> When the test period is the same length as the forecast horizon, a rolling forecast gives a single window of forecasts up to the horizon.
96
95
97
96
## Evaluation metrics
98
97
99
-
The choice of evaluation summary or metric is usually driven by the specific business scenario. Some common choices include the following:
98
+
The specific business scenario usually drives the choice of evaluation summary or metric. Some common choices include the following examples:
100
99
101
-
* Plots of observed target values vs. forecasted values to check that certain dynamics of the data are captured by the model,
102
-
* MAPE (mean absolute percentage error) between actual and forecasted values,
103
-
* RMSE (root mean squared error), possibly with a normalization, between actual and forecasted values,
104
-
* MAE (mean absolute error), possibly with a normalization, between actual and forecasted values.
100
+
- Plots of observed target values versus forecasted values to check that certain dynamics of the data that the model captures
101
+
- Mean absolute percentage error (MAPE) between actual and forecasted values
102
+
- Root mean squared error (RMSE), possibly with a normalization, between actual and forecasted values
103
+
- Mean absolute error (MAE), possibly with a normalization, between actual and forecasted values
105
104
106
-
There are many other possibilities, depending on the business scenario. You may need to create your own post-processing utilities for computing evaluation metrics from inference results or rolling forecasts. For more information on metrics, see our [regression and forecasting metrics](how-to-understand-automated-ml.md#regressionforecasting-metrics) article section.
105
+
There are many other possibilities, depending on the business scenario. You might need to create your own post-processing utilities for computing evaluation metrics from inference results or rolling forecasts. For more information on metrics, see [Regression/forecasting metrics](how-to-understand-automated-ml.md#regressionforecasting-metrics).
107
106
108
-
## Next steps
107
+
## Related content
109
108
110
-
* Learn more about [how to set up AutoML to train a time-series forecasting model](./how-to-auto-train-forecast.md).
111
-
* Learn about [how AutoML uses machine learning to build forecasting models](./concept-automl-forecasting-methods.md).
112
-
* Read answers to [frequently asked questions](./how-to-automl-forecasting-faq.md) about forecasting in AutoML.
109
+
- Learn more about [how to set up AutoML to train a time-series forecasting model](./how-to-auto-train-forecast.md).
110
+
- Learn about [how AutoML uses machine learning to build forecasting models](./concept-automl-forecasting-methods.md).
111
+
- Read answers to [frequently asked questions](./how-to-automl-forecasting-faq.md) about forecasting in AutoML.
0 commit comments