|
1 | 1 | ---
|
2 |
| -title: Lagged features for time series forecasting in AutoML |
| 2 | +title: Lag features for time-series forecasting in AutoML |
3 | 3 | titleSuffix: Azure Machine Learning
|
4 |
| -description: Learn how Azure Machine Learning's AutoML forms lag based features for time series forecasting |
| 4 | +description: Explore how automated machine learning (AutoML) in Azure Machine Learning creates lag and rolling window aggregation to forecast time-series regression models. |
5 | 5 | services: machine-learning
|
6 | 6 | author: ssalgadodev
|
7 | 7 | ms.author: ssalgado
|
8 | 8 | ms.reviewer: vlbejan
|
9 | 9 | ms.service: azure-machine-learning
|
10 | 10 | ms.subservice: automl
|
11 |
| -ms.topic: conceptual |
| 11 | +ms.topic: concept-article |
12 | 12 | ms.custom: automl, sdkv1
|
13 |
| -ms.date: 12/15/2022 |
| 13 | +ms.date: 09/25/2024 |
14 | 14 | show_latex: true
|
| 15 | + |
| 16 | +#customer intent: As a developer, I want to use AutoML methods in Azure Machine Learning for creating lag and rolling window aggregation, so I can forecast time-series regression models. |
15 | 17 | ---
|
16 | 18 |
|
17 |
| -# Lagged features for time series forecasting in AutoML |
18 |
| -This article focuses on AutoML's methods for creating lag and rolling window aggregation features for forecasting regression models. Features like these that use past information can significantly increase accuracy by helping the model to learn correlational patterns in time. See the [methods overview article](./concept-automl-forecasting-methods.md) for general information about forecasting methodology in AutoML. Instructions and examples for training forecasting models in AutoML can be found in our [set up AutoML for time series forecasting](./how-to-auto-train-forecast.md) article. |
| 19 | +# Lag features for time-series forecasting in AutoML |
| 20 | + |
| 21 | +This article describes how automated machine learning (AutoML) in Azure Machine Learning creates lag and rolling window aggregation features to help you forecast time-series regression models. The AutoML features use historical model data that can significantly increase model accuracy by helping the model learn correlational patterns in time. |
| 22 | + |
| 23 | +If you're interested in learning more about the forecasting methodology in AutoML, see [Overview of forecasting methods in AutoML](concept-automl-forecasting-methods.md). To explore training examples for forecasting models in AutoML, see [Set up AutoML to train a time-series forecasting model with the SDK and CLI](how-to-auto-train-forecast.md). |
| 24 | + |
| 25 | +## Lag featurization in AutoML |
19 | 26 |
|
20 |
| -## Lag feature example |
21 |
| -AutoML generates lags with respect to the forecast horizon. The example in this section illustrates this concept. Here, we use a forecast horizon of three and target lag order of one. Consider the following monthly time series: |
| 27 | +AutoML generates lag features that correspond to the forecast horizon. This section explores lag featurization in AutoML for a model with a forecast horizon of three and target lag order of one. The following tables present the model data and lag features for a monthly time series. |
22 | 28 |
|
23 |
| -Table 1: Original time series <a name="tab:original-ts"></a> |
| 29 | +**Table 1**: Original time series |
24 | 30 |
|
25 | 31 | | Date | $y_t$ |
|
26 |
| -|:--- |:--- | |
| 32 | +| :--- | :--- | |
27 | 33 | | 1/1/2001 | 0 |
|
28 | 34 | | 2/1/2001 | 10 |
|
29 | 35 | | 3/1/2001 | 20 |
|
30 | 36 | | 4/1/2001 | 30 |
|
31 | 37 | | 5/1/2001 | 40 |
|
32 | 38 | | 6/1/2001 | 50 |
|
33 | 39 |
|
34 |
| -First, we generate the lag feature for the horizon $h=1$ only. As you continue reading, it will become clear why we use individual horizons in each table. |
35 |
| - |
36 |
| -Table 2: Lag featurization for $h=1$ <a name="tbl:classic-lag-1"></a> |
37 |
| - |
38 |
| -| Date | $y_t$ | Origin | $y_{t-1}$ | $h$ | |
39 |
| -|:--- |:--- |:--- |:--- |:--- | |
40 |
| -| 1/1/2001 | 0 | 12/1/2000 | - | 1 | |
41 |
| -| 2/1/2001 | 10 | 1/1/2001 | 0 | 1 | |
42 |
| -| 3/1/2001 | 20 | 2/1/2001 | 10 | 1 | |
43 |
| -| 4/1/2001 | 30 | 3/1/2001 | 20 | 1 | |
44 |
| -| 5/1/2001 | 40 | 4/1/2001 | 30 | 1 | |
45 |
| -| 6/1/2001 | 50 | 5/1/2001 | 40 | 1 | |
46 |
| - |
47 |
| -Table 2 is generated from Table 1 by shifting the $y_t$ column down by a single observation. We've added a column named `Origin` that has the dates that the lag features originate from. Next, we generate the lagging feature for the forecast horizon $h=2$ only. |
48 |
| - |
49 |
| -Table 3: Lag featurization for $h=2$ <a name="tbl:classic-lag-2"></a> |
50 |
| - |
51 |
| -| Date | $y_t$ | Origin | $y_{t-2}$ | $h$ | |
52 |
| -|:--- |:--- |:--- |:--- |:--- | |
53 |
| -| 1/1/2001 | 0 | 11/1/2000 | - | 2 | |
54 |
| -| 2/1/2001 | 10 | 12/1/2000 | - | 2 | |
55 |
| -| 3/1/2001 | 20 | 1/1/2001 | 0 | 2 | |
56 |
| -| 4/1/2001 | 30 | 2/1/2001 | 10 | 2 | |
57 |
| -| 5/1/2001 | 40 | 3/1/2001 | 20 | 2 | |
58 |
| -| 6/1/2001 | 50 | 4/1/2001 | 30 | 2 | |
59 |
| - |
60 |
| -Table 3 is generated from Table 1 by shifting the $y_t$ column down by two observations. Finally, we will generate the lagging feature for the forecast horizon $h=3$ only. |
61 |
| - |
62 |
| -Table 4: Lag featurization for $h=3$ <a name="tbl:classic-lag-3"></a> |
63 |
| - |
64 |
| -| Date | $y_t$ | Origin | $y_{t-3}$ | $h$ | |
65 |
| -|:--- |:--- |:--- |:--- |:--- | |
66 |
| -| 1/1/2001 | 0 | 10/1/2000 | - | 3 | |
67 |
| -| 2/1/2001 | 10 | 11/1/2000 | - | 3 | |
68 |
| -| 3/1/2001 | 20 | 12/1/2000 | - | 3 | |
69 |
| -| 4/1/2001 | 30 | 1/1/2001 | 0 | 3 | |
70 |
| -| 5/1/2001 | 40 | 2/1/2001 | 10 | 3 | |
71 |
| -| 6/1/2001 | 50 | 3/1/2001 | 20 | 3 | |
72 |
| - |
73 |
| -Next, we concatenate Tables 1, 2, and 3 and rearrange the rows. The result is in the following table: |
74 |
| - |
75 |
| -Table 5: Lag featurization complete <a name="tbl:automl-lag-complete"></a> |
76 |
| - |
77 |
| -| Date | $y_t$ | Origin | $y_{t-1}^{(h)}$ | $h$ | |
78 |
| -|:--- |:--- |:--- |:--- |:--- | |
79 |
| -| 1/1/2001 | 0 | 12/1/2000 | - | 1 | |
80 |
| -| 1/1/2001 | 0 | 11/1/2000 | - | 2 | |
81 |
| -| 1/1/2001 | 0 | 10/1/2000 | - | 3 | |
82 |
| -| 2/1/2001 | 10 | 1/1/2001 | 0 | 1 | |
83 |
| -| 2/1/2001 | 10 | 12/1/2000 | - | 2 | |
84 |
| -| 2/1/2001 | 10 | 11/1/2000 | - | 3 | |
85 |
| -| 3/1/2001 | 20 | 2/1/2001 | 10 | 1 | |
86 |
| -| 3/1/2001 | 20 | 1/1/2001 | 0 | 2 | |
87 |
| -| 3/1/2001 | 20 | 12/1/2000 | - | 3 | |
88 |
| -| 4/1/2001 | 30 | 3/1/2001 | 20 | 1 | |
89 |
| -| 4/1/2001 | 30 | 2/1/2001 | 10 | 2 | |
90 |
| -| 4/1/2001 | 30 | 1/1/2001 | 0 | 3 | |
91 |
| -| 5/1/2001 | 40 | 4/1/2001 | 30 | 1 | |
92 |
| -| 5/1/2001 | 40 | 3/1/2001 | 20 | 2 | |
93 |
| -| 5/1/2001 | 40 | 2/1/2001 | 10 | 3 | |
94 |
| -| 6/1/2001 | 50 | 4/1/2001 | 40 | 1 | |
95 |
| -| 6/1/2001 | 50 | 4/1/2001 | 30 | 2 | |
96 |
| -| 6/1/2001 | 50 | 3/1/2001 | 20 | 3 | |
97 |
| - |
98 |
| - |
99 |
| -In the final table, we've changed the name of the lag column to $y_{t-1}^{(h)}$ to reflect that the lag is generated with respect to a specific horizon. The table shows that the lags we generated with respect to the horizon can be mapped to the conventional ways of generating lags in the previous tables. |
100 |
| - |
101 |
| -Table 5 is an example of the data augmentation that AutoML applies to training data to enable direct forecasting from regression models. When the configuration includes lag features, AutoML creates horizon dependent lags along with an integer-valued horizon feature. This enables AutoML's forecasting regression models to make a prediction at horizon $h$ without regard to the prediction at $h-1$, in contrast to recursively defined models like ARIMA. |
102 |
| - |
103 |
| -> [!NOTE] |
104 |
| -> Generation of horizon dependent lag features adds new _rows_ to the dataset. The number of new rows is proportional to forecast horizon. This dataset size growth can lead to out-of-memory errors on smaller compute nodes or when dataset size is already large. See the [frequently asked questions](./how-to-automl-forecasting-faq.md#how-do-i-fix-an-out-of-memory-error) article for solutions to this problem. |
105 |
| -
|
106 |
| -Another consequence of this lagging strategy is that lag order and forecast horizon are decoupled. If, for example, your forecast horizon is seven, and you want AutoML to use lag features, you do not have to set the lag order to seven to ensure prediction over a full forecast horizon. Since AutoML generates lags with respect to horizon, you can set the lag order to one and AutoML will augment the data so that lags of any order are valid up to forecast horizon. |
107 |
| - |
108 |
| -## Next steps |
109 |
| -* Learn more about [how to set up AutoML to train a time-series forecasting model](./how-to-auto-train-forecast.md). |
110 |
| -* Browse [AutoML Forecasting Frequently Asked Questions](./how-to-automl-forecasting-faq.md). |
111 |
| -* Learn about [calendar features for time series forecasting in AutoML](./concept-automl-forecasting-calendar-features.md). |
112 |
| -* Learn about [how AutoML uses machine learning to build forecasting models](./concept-automl-forecasting-methods.md). |
| 40 | +The first step generates the lag feature for the horizon $h=1$ only. The subsequent tables demonstrate why the process uses individual horizons to complete the lag featurization. |
| 41 | + |
| 42 | +**Table 2**: Lag featurization for horizon $h=1$ |
| 43 | + |
| 44 | +| Date | $y_t$ | Origin | $y_{t-1}$ | $h$ | |
| 45 | +| :--- | :--- | :--- | :--- | :--- | |
| 46 | +| 1/1/2001 | 0 | 12/1/2000 | - | 1 | |
| 47 | +| 2/1/2001 | 10 | 1/1/2001 | 0 | 1 | |
| 48 | +| 3/1/2001 | 20 | 2/1/2001 | 10 | 1 | |
| 49 | +| 4/1/2001 | 30 | 3/1/2001 | 20 | 1 | |
| 50 | +| 5/1/2001 | 40 | 4/1/2001 | 30 | 1 | |
| 51 | +| 6/1/2001 | 50 | 5/1/2001 | 40 | 1 | |
| 52 | + |
| 53 | +AutoML generates the data in Table 2 from the data in Table 1 by shifting the $y_t$ column down by a single observation. Tables 2 through 5 include the **Origin** column to show the dates from which the lag features originate. |
| 54 | + |
| 55 | +The next step generates the lag feature for the forecast horizon $h=2$ only. |
| 56 | + |
| 57 | +**Table 3**: Lag featurization for forecast horizon $h=2$ |
| 58 | + |
| 59 | +| Date | $y_t$ | Origin | $y_{t-2}$ | $h$ | |
| 60 | +| :--- | :--- | :--- | :--- | :--- | |
| 61 | +| 1/1/2001 | 0 | 11/1/2000 | - | 2 | |
| 62 | +| 2/1/2001 | 10 | 12/1/2000 | - | 2 | |
| 63 | +| 3/1/2001 | 20 | 1/1/2001 | 0 | 2 | |
| 64 | +| 4/1/2001 | 30 | 2/1/2001 | 10 | 2 | |
| 65 | +| 5/1/2001 | 40 | 3/1/2001 | 20 | 2 | |
| 66 | +| 6/1/2001 | 50 | 4/1/2001 | 30 | 2 | |
| 67 | + |
| 68 | +AutoML generates the data in Table 3 from the data in Table 1 by shifting the $y_t$ column down by two observations. |
| 69 | + |
| 70 | +The next step generates the lag feature for the forecast horizon $h=3$ only. |
| 71 | + |
| 72 | +**Table 4**: Lag featurization for forecast horizon $h=3$ |
| 73 | + |
| 74 | +| Date | $y_t$ | Origin | $y_{t-3}$ | $h$ | |
| 75 | +| :--- | :--- | :--- | :--- | :--- | |
| 76 | +| 1/1/2001 | 0 | 10/1/2000 | - | 3 | |
| 77 | +| 2/1/2001 | 10 | 11/1/2000 | - | 3 | |
| 78 | +| 3/1/2001 | 20 | 12/1/2000 | - | 3 | |
| 79 | +| 4/1/2001 | 30 | 1/1/2001 | 0 | 3 | |
| 80 | +| 5/1/2001 | 40 | 2/1/2001 | 10 | 3 | |
| 81 | +| 6/1/2001 | 50 | 3/1/2001 | 20 | 3 | |
| 82 | + |
| 83 | +The final step concatenates the data in Tables 1, 2, and 3, and rearranges the rows. |
| 84 | + |
| 85 | +**Table 5**: Lag featurization complete |
| 86 | + |
| 87 | +| Date | $y_t$ | Origin | $y_{t-1}^{(h)}$ | $h$ | |
| 88 | +| :--- | :--- | :--- | :--- | :--- | |
| 89 | +| 1/1/2001 | 0 | 12/1/2000 | - | 1 | |
| 90 | +| 1/1/2001 | 0 | 11/1/2000 | - | 2 | |
| 91 | +| 1/1/2001 | 0 | 10/1/2000 | - | 3 | |
| 92 | +| 2/1/2001 | 10 | 1/1/2001 | 0 | 1 | |
| 93 | +| 2/1/2001 | 10 | 12/1/2000 | - | 2 | |
| 94 | +| 2/1/2001 | 10 | 11/1/2000 | - | 3 | |
| 95 | +| 3/1/2001 | 20 | 2/1/2001 | 10 | 1 | |
| 96 | +| 3/1/2001 | 20 | 1/1/2001 | 0 | 2 | |
| 97 | +| 3/1/2001 | 20 | 12/1/2000 | - | 3 | |
| 98 | +| 4/1/2001 | 30 | 3/1/2001 | 20 | 1 | |
| 99 | +| 4/1/2001 | 30 | 2/1/2001 | 10 | 2 | |
| 100 | +| 4/1/2001 | 30 | 1/1/2001 | 0 | 3 | |
| 101 | +| 5/1/2001 | 40 | 4/1/2001 | 30 | 1 | |
| 102 | +| 5/1/2001 | 40 | 3/1/2001 | 20 | 2 | |
| 103 | +| 5/1/2001 | 40 | 2/1/2001 | 10 | 3 | |
| 104 | +| 6/1/2001 | 50 | 4/1/2001 | 40 | 1 | |
| 105 | +| 6/1/2001 | 50 | 4/1/2001 | 30 | 2 | |
| 106 | +| 6/1/2001 | 50 | 3/1/2001 | 20 | 3 | |
| 107 | + |
| 108 | +In Table 5, the lag column is renamed to **$y_{t-1}^{(h)}$** to reflect that the lag is generated with respect to a specific horizon. Table 5 shows how lags generated with respect to the horizon can be mapped to the conventional ways of generating lags in the previous tables. |
| 109 | + |
| 110 | +Table 5 is an example of the data augmentation that AutoML applies to training data to enable direct forecasting from regression models. When the configuration includes lag features, AutoML creates horizon-dependent lags along with an integer-valued horizon feature. AutoML forecasting regression models can make a prediction at horizon $h$ without regard to the prediction at $h-1$, in contrast to recursively defined models like ARIMA. |
| 111 | + |
| 112 | +## Considerations for lag featurization |
| 113 | + |
| 114 | +There are a few considerations related to lag featurization for a model. Review the following sections to identify potential actions for your scenario. |
| 115 | + |
| 116 | +### Dataset-size growth |
| 117 | + |
| 118 | +When AutoML generates horizon-dependent lag features, it adds new _rows_ to the model dataset. The number of new rows is proportional to the forecast horizon. |
| 119 | + |
| 120 | +The growth in the dataset-size can lead to out-of-memory errors on smaller compute nodes or when the dataset size is already large. You can find solutions to address this issue in the [Frequently Asked Questions (FAQ) for AutoML forecasting](how-to-automl-forecasting-faq.md#how-do-i-fix-an-out-of-memory-error). |
| 121 | + |
| 122 | +### Decoupling of lag order and forecast horizon |
| 123 | + |
| 124 | +The AutoML lagging strategy decouples lag order and forecast horizon. Suppose your forecast horizon is seven, and you want AutoML to use lag features. In this scenario, you don't have to set the lag order to seven to ensure prediction over a full forecast horizon. Because AutoML generates lags with respect to horizon, you can set the lag order to one. AutoML augments the data so lags of any order are valid up to the forecast horizon. |
| 125 | + |
| 126 | +## Related content |
| 127 | + |
| 128 | +- [Train time-series forecasting models with AutoML](how-to-auto-train-forecast.md) |
| 129 | +- Browse [FAQ for AutoML forecasting](how-to-automl-forecasting-faq.md) |
| 130 | +- Explore [how AutoML uses machine learning to build forecasting models](concept-automl-forecasting-methods.md) |
0 commit comments