Skip to content

Commit 2492a3c

Browse files
Merge pull request #221586 from EricWrightAtWork/erwright-forecasting-user-guide-vol1
Adding first set of documents for AutoML forecasting user guide
2 parents 8768ea2 + 05b2046 commit 2492a3c

11 files changed

+633
-8
lines changed
Lines changed: 107 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,107 @@
1+
---
2+
title: Calendar features for time series forecasting in AutoML
3+
titleSuffix: Azure Machine Learning
4+
description: Learn how Azure Machine Learning's AutoML creates calendar and holiday features
5+
services: machine-learning
6+
author: nivi09
7+
ms.author: nivmishra
8+
ms.reviewer: ssalgado
9+
ms.service: machine-learning
10+
ms.subservice: automl
11+
ms.topic: conceptual
12+
ms.custom: contperf-fy21q1, automl, FY21Q4-aml-seo-hack, sdkv1, event-tier1-build-2022
13+
ms.date: 12/15/2022
14+
---
15+
16+
# Calendar features for time series forecasting in AutoML
17+
18+
This article focuses on the calendar-based features that AutoML creates to increase the accuracy of forecasting regression models. Since holidays can have a strong influence on how the modeled system behaves, the time before, during, and after a holiday can bias the series’ patterns. Each holiday generates a window over your existing dataset that the learner can assign an effect to. This can be especially useful in scenarios such as holidays that generate high demands for specific products. See the [methods overview article](./concept-automl-forecasting-methods.md) for more general information about forecasting methodology in AutoML. Instructions and examples for training forecasting models in AutoML can be found in our [set up AutoML for time series forecasting](./how-to-auto-train-forecast.md) article.
19+
20+
As a part of feature engineering, AutoML transforms datetime type columns provided in the training data into new columns of calendar-based features. These features can help regression models learn seasonal patterns at several cadences. AutoML can always create calendar features from the time index of the time series since this is a required column in the training data. Calendar features are also made from other columns with datetime type, if any are present. See the [how AutoML uses your data](./concept-automl-forecasting-methods.md#how-automl-uses-your-data) guide for more information on data requirements.
21+
22+
AutoML considers two categories of calendar features: standard features that are based entirely on date and time values and holiday features which are specific to a country or region of the world. We'll go over these features in the remainder of the article.
23+
24+
## Standard calendar features
25+
26+
Th following table shows the full set of AutoML's standard calendar features along with an example output. The example uses the standard `YY-mm-dd %H-%m-%d` format for datetime representation.
27+
28+
| Feature name | Description | Example output for 2011-01-01 00:25:30 |
29+
| --- | ----------- | -------------- |
30+
|`year`|Numeric feature representing the calendar year |2011|
31+
|`year_iso`|Represents ISO year as defined in ISO 8601. ISO years start on the first week of year that has a Thursday. For example, if January 1 is a Friday, the ISO year begins on January 4. ISO years may differ from calendar years.|2010|
32+
|`half`| Feature indicating whether the date is in the first or second half of the year. It is 1 if the date is prior to July 1 and 2 otherwise.
33+
|`quarter`|Numeric feature representing the quarter of the given date. It takes values 1, 2, 3, or 4 representing first, second, third, fourth quarter of calendar year.|1|
34+
|`month`|Numeric feature representing the calendar month. It takes values 1 through 12.|1|
35+
|`month_lbl`|String feature representing the name of month.|'January'|
36+
|`day`|Numeric feature representing the day of the month. It takes values from 1 through 31.|1|
37+
|`hour`|Numeric feature representing the hour of the day. It takes values 0 through 23.|0|
38+
|`minute`|Numeric feature representing the minute within the hour. It takes values 0 through 59.|25|
39+
|`second`|Numeric feature representing the second of the given datetime. In the case where only date format is provided, then it is assumed as 0. It takes values 0 through 59.|30|
40+
|`am_pm`|Numeric feature indicating whether the time is in the morning or evening. It is 0 for times before 12PM and 1 for times after 12PM. |0|
41+
|`am_pm_lbl`|String feature indicating whether the time is in the morning or evening.|'am'|
42+
|`hour12`|Numeric feature representing the hour of the day on a 12 hour clock. It takes values 0 through 12 for first half of the day and 1 through 11 for second half.|0|
43+
|`wday`|Numeric feature representing the day of the week. It takes values 0 through 6, where 0 corresponds to Monday. |5|
44+
|`wday_lbl`|String feature representing name of the day of the week. |
45+
|`qday`|Numeric feature representing the day within the quarter. It takes values 1 through 92.|1|
46+
|`yday`|Numeric feature representing the day of the year. It takes values 1 through 365, or 1 through 366 in the case of leap year.|1|
47+
|`week`|Numeric feature representing [ISO week](https://en.wikipedia.org/wiki/ISO_week_date) as defined in ISO 8601. ISO weeks always start on Monday and end on Sunday. It takes values 1 through 52, or 53 for years having 1st January falling on Thursday or for leap years having 1st January falling on Wednesday.|52|
48+
49+
The full set of standard calendar features may not be created in all cases. The generated set depends on the frequency of the time series and whether the training data contains datetime features in addition to the time index. The following table shows the features created for different column types:
50+
51+
Column purpose | Calendar features
52+
--- | ---
53+
Time index | The full set minus calendar features that have high correlation with other features. For example, if the time series frequency is daily, then any features with a more granular frequency than daily will be removed since they don't provide useful information.
54+
Other datetime column | A reduced set consisting of `Year`, `Month`, `Day`, `DayOfWeek`, `DayOfYear`, `QuarterOfYear`, `WeekOfMonth`, `Hour`, `Minute`, and `Second`. If the column is a date with no time, `Hour`, `Minute`, and `Second` will be 0.
55+
56+
## Holiday features
57+
58+
AutoML can optionally create features representing holidays from a specific country or region. These features are configured in AutoML using the `country_or_region_for_holidays` parameter which accepts an [ISO country code](https://en.wikipedia.org/wiki/List_of_ISO_3166_country_codes).
59+
60+
> [!NOTE]
61+
> Holiday features can only be made for time series with daily frequency.
62+
63+
The following table summarizes the holiday features:
64+
65+
Feature name | Description
66+
--- | ----------- |
67+
`Holiday`| String feature that specifies whether a date is a regional or national holiday. Days within some range of a holiday are also marked.
68+
`isPaidTimeOff`| Binary feature that takes value 1 if the day is a "paid time-off holiday" in the given country or region.
69+
70+
AutoML uses Azure Open Datasets as a source for holiday information. For more information, see the [PublicHolidays](/python/api/azureml-opendatasets/azureml.opendatasets.publicholidays) documentation.
71+
72+
To better understand the holiday feature generation, consider the following example data:
73+
74+
<img src='./media/concept-automl-forecasting-calendar-features/load_forecasting_sample_data_daily.png' alt='sample_data' width=50%></img>
75+
76+
To make American holiday features for this data, we set the `country_or_region_for_holiday` to 'US' in the [forecast settings](/python/api/azure-ai-ml/azure.ai.ml.automl.forecastingjob#azure-ai-ml-automl-forecastingjob-set-forecast-settings) as shown in the following code sample:
77+
```python
78+
from azure.ai.ml import automl
79+
80+
# create a forcasting job
81+
forecasting_job = automl.forecasting(
82+
compute='test_cluster', # Name of single or multinode AML compute infrastructure created by user
83+
experiment_name=exp_name, # name of experiment
84+
training_data=sample_data,
85+
target_column_name='demand',
86+
primary_metric='NormalizedRootMeanSquaredError',
87+
n_cross_validations=3,
88+
enable_model_explainability=True
89+
)
90+
91+
# set custom forecast settings
92+
forecasting_job.set_forecast_settings(
93+
time_column_name='timeStamp',
94+
country_or_region_for_holidays='US'
95+
)
96+
```
97+
The generated holiday features look like the following:
98+
99+
<a name='output'><img src='./media/concept-automl-forecasting-calendar-features/sample_dataset_holiday_feature_generated.png' alt='sample_data_output' width=75%></img></a>
100+
101+
Note that generated features have the prefix `_automl_` prepended to their column names. AutoML generally uses this prefix to distinguish input features from engineered features.
102+
103+
## Next steps
104+
* Learn more about [how to set up AutoML to train a time-series forecasting model](./how-to-auto-train-forecast.md).
105+
* Browse [AutoML Forecasting Frequently Asked Questions](./how-to-automl-forecasting-faq.md).
106+
* Learn about [AutoML Forecasting Lagged Features](./concept-automl-forecasting-lags.md).
107+
* Learn about [how AutoML uses machine learning to build forecasting models](./concept-automl-forecasting-methods.md).
Lines changed: 112 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,112 @@
1+
---
2+
title: Lagged features for time series forecasting in AutoML
3+
titleSuffix: Azure Machine Learning
4+
description: Learn how Azure Machine Learning's AutoML forms lag based features for time series forecasting
5+
services: machine-learning
6+
author: ericwrightatwork
7+
ms.author: vlbejan
8+
ms.reviewer: ssalgado
9+
ms.service: machine-learning
10+
ms.subservice: automl
11+
ms.topic: conceptual
12+
ms.custom: contperf-fy21q1, automl, FY21Q4-aml-seo-hack, sdkv1, event-tier1-build-2022
13+
ms.date: 12/15/2022
14+
show_latex: true
15+
---
16+
17+
# Lagged features for time series forecasting in AutoML
18+
This article focuses on AutoML's methods for creating lag and rolling window aggregation features for forecasting regression models. Features like these that use past information can significantly increase accuracy by helping the model to learn correlational patterns in time. See the [methods overview article](./concept-automl-forecasting-methods.md) for general information about forecasting methodology in AutoML. Instructions and examples for training forecasting models in AutoML can be found in our [set up AutoML for time series forecasting](./how-to-auto-train-forecast.md) article.
19+
20+
## Lag feature example
21+
AutoML generates lags with respect to the forecast horizon. The example in this section illustrates this concept. Here, we use a forecast horizon of three and target lag order of one. Consider the following monthly time series:
22+
23+
Table 1: Original time series <a name="tab:original-ts"></a>
24+
25+
| Date | $y_t$ |
26+
|:--- |:--- |
27+
| 1/1/2001 | 0 |
28+
| 2/1/2001 | 10 |
29+
| 3/1/2001 | 20 |
30+
| 4/1/2001 | 30 |
31+
| 5/1/2001 | 40 |
32+
| 6/1/2001 | 50 |
33+
34+
First, we generate the lag feature for the horizon $h=1$ only. As you continue reading, it will become clear why we use individual horizons in each table.
35+
36+
Table 2: Lag featurization for $h=1$ <a name="tbl:classic-lag-1"></a>
37+
38+
| Date | $y_t$ | Origin | $y_{t-1}$ | $h$ |
39+
|:--- |:--- |:--- |:--- |:--- |
40+
| 1/1/2001 | 0 | 12/1/2000 | - | 1 |
41+
| 2/1/2001 | 10 | 1/1/2001 | 0 | 1 |
42+
| 3/1/2001 | 20 | 2/1/2001 | 10 | 1 |
43+
| 4/1/2001 | 30 | 3/1/2001 | 20 | 1 |
44+
| 5/1/2001 | 40 | 4/1/2001 | 30 | 1 |
45+
| 6/1/2001 | 50 | 4/1/2001 | 40 | 1 |
46+
47+
Table 2 is generated from Table 1 by shifting the $y_t$ column down by a single observation. We've added a column named `Origin` that has the dates that the lag features originate from. Next, we generate the lagging feature for the forecast horizon $h=2$ only.
48+
49+
Table 3: Lag featurization for $h=2$ <a name="tbl:classic-lag-2"></a>
50+
51+
| Date | $y_t$ | Origin | $y_{t-2}$ | $h$ |
52+
|:--- |:--- |:--- |:--- |:--- |
53+
| 1/1/2001 | 0 | 11/1/2000 | - | 2 |
54+
| 2/1/2001 | 10 | 12/1/2000 | - | 2 |
55+
| 3/1/2001 | 20 | 1/1/2001 | 0 | 2 |
56+
| 4/1/2001 | 30 | 2/1/2001 | 10 | 2 |
57+
| 5/1/2001 | 40 | 3/1/2001 | 20 | 2 |
58+
| 6/1/2001 | 50 | 4/1/2001 | 30 | 2 |
59+
60+
Table 3 is generated from Table 1 by shifting the $y_t$ column down by two observations. Finally, we will generate the lagging feature for the forecast horizon $h=3$ only.
61+
62+
Table 4: Lag featurization for $h=3$ <a name="tbl:classic-lag-3"></a>
63+
64+
| Date | $y_t$ | Origin | $y_{t-3}$ | $h$ |
65+
|:--- |:--- |:--- |:--- |:--- |
66+
| 1/1/2001 | 0 | 10/1/2000 | - | 3 |
67+
| 2/1/2001 | 10 | 11/1/2000 | - | 3 |
68+
| 3/1/2001 | 20 | 12/1/2000 | - | 3 |
69+
| 4/1/2001 | 30 | 1/1/2001 | 0 | 3 |
70+
| 5/1/2001 | 40 | 2/1/2001 | 10 | 3 |
71+
| 6/1/2001 | 50 | 3/1/2001 | 20 | 3 |
72+
73+
Next, we concatenate Tables 1, 2, and 3 and rearrange the rows. The result is in the following table:
74+
75+
Table 5: Lag featurization complete <a name="tbl:automl-lag-complete"></a>
76+
77+
| Date | $y_t$ | Origin | $y_{t-1}^{(h)}$ | $h$ |
78+
|:--- |:--- |:--- |:--- |:--- |
79+
| 1/1/2001 | 0 | 12/1/2000 | - | 1 |
80+
| 1/1/2001 | 0 | 11/1/2000 | - | 2 |
81+
| 1/1/2001 | 0 | 10/1/2000 | - | 3 |
82+
| 2/1/2001 | 10 | 1/1/2001 | 0 | 1 |
83+
| 2/1/2001 | 10 | 12/1/2000 | - | 2 |
84+
| 2/1/2001 | 10 | 11/1/2000 | - | 3 |
85+
| 3/1/2001 | 20 | 2/1/2001 | 10 | 1 |
86+
| 3/1/2001 | 20 | 1/1/2001 | 0 | 2 |
87+
| 3/1/2001 | 20 | 12/1/2000 | - | 3 |
88+
| 4/1/2001 | 30 | 3/1/2001 | 20 | 1 |
89+
| 4/1/2001 | 30 | 2/1/2001 | 10 | 2 |
90+
| 4/1/2001 | 30 | 1/1/2001 | 0 | 3 |
91+
| 5/1/2001 | 40 | 4/1/2001 | 30 | 1 |
92+
| 5/1/2001 | 40 | 3/1/2001 | 20 | 2 |
93+
| 5/1/2001 | 40 | 2/1/2001 | 10 | 3 |
94+
| 6/1/2001 | 50 | 4/1/2001 | 40 | 1 |
95+
| 6/1/2001 | 50 | 4/1/2001 | 30 | 2 |
96+
| 6/1/2001 | 50 | 3/1/2001 | 20 | 3 |
97+
98+
99+
In the final table, we've changed the name of the lag column to $y_{t-1}^{(h)}$ to reflect that the lag is generated with respect to a specific horizon. The table shows that the lags we generated with respect to the horizon can be mapped to the conventional ways of generating lags in the previous tables.
100+
101+
Table 5 is an example of the data augmentation that AutoML applies to training data to enable direct forecasting from regression models. When the configuration includes lag features, AutoML creates horizon dependent lags along with an integer-valued horizon feature. This enables AutoML's forecasting regression models to make a prediction at horizon $h$ without regard to the prediction at $h-1$, in contrast to recursively defined models like ARIMA.
102+
103+
> [!NOTE]
104+
> Generation of horizon dependent lag features adds new _rows_ to the dataset. The number of new rows is proportional to forecast horizon. This dataset size growth can lead to out-of-memory errors on smaller compute nodes or when dataset size is already large. See the [frequently asked questions](./how-to-automl-forecasting-faq.md#how-do-i-fix-an-out-of-memory-error) article for solutions to this problem.
105+
106+
Another consequence of this lagging strategy is that lag order and forecast horizon are decoupled. If, for example, your forecast horizon is seven, and you want AutoML to use lag features, you do not have to set the lag order to seven to ensure prediction over a full forecast horizon. Since AutoML generates lags with respect to horizon, you can set the lag order to one and AutoML will augment the data so that lags of any order are valid up to forecast horizon.
107+
108+
## Next steps
109+
* Learn more about [how to set up AutoML to train a time-series forecasting model](./how-to-auto-train-forecast.md).
110+
* Browse [AutoML Forecasting Frequently Asked Questions](./how-to-automl-forecasting-faq.md).
111+
* Learn about [calendar features for time series forecasting in AutoML](./concept-automl-forecasting-calendar-features.md).
112+
* Learn about [how AutoML uses machine learning to build forecasting models](./concept-automl-forecasting-methods.md).

0 commit comments

Comments
 (0)