Skip to content

Commit 8700408

Browse files
authored
Pics and info for forecasting validation and rolling window
1 parent 0902553 commit 8700408

File tree

1 file changed

+20
-0
lines changed

1 file changed

+20
-0
lines changed

articles/machine-learning/how-to-auto-train-forecast.md

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -101,6 +101,15 @@ test_labels = test_data.pop(label).values
101101
> points, and model accuracy could suffer.
102102
103103
<a name="config"></a>
104+
105+
## Train and validation data
106+
You can specify separate train and validation sets directly in the `AutoMLConfig` constructor.
107+
108+
### Rolling Origin Cross Validation
109+
Rolling Origin Cross Validation (ROCV) is automatically used for forecasting tasks by passing the training and validation data together and setting the number of cross validation folds using `n_cross_validations`. For time series forecasting ROCV is used to split time series in a temporally consistent way. ROCV divides the series into training and validation data using an origin time point. Sliding the origin in time generates the cross-validation folds. This strategy will preserve the time series data integrity and eliminate the risk of data leakage.
110+
111+
<insert picture>
112+
104113
## Configure and run experiment
105114

106115
For forecasting tasks, automated machine learning uses pre-processing and estimation steps that are specific to time-series data. The following pre-processing steps will be executed:
@@ -201,6 +210,17 @@ For more information on AML compute and VM sizes that include GPU's, see the [AM
201210

202211
View the [Beverage Production Forecasting notebook](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/automated-machine-learning/forecasting-beer-remote/auto-ml-forecasting-beer-remote.ipynb) for a detailed code example leveraging DNNs.
203212

213+
### Target Rolling Window Aggregation
214+
Often the best information a forecaster can have is the recent value of the target. Creating cumulative statistics of the target may increase the accuracy of your predictions. Target rolling window aggregations allows you to add a rolling aggregation of data values as features. To enable target rolling windows set the `target_rolling_window_size` to your desired integer window size.
215+
216+
An example of this can be seen when predicting energy demand. You might add a rolling window feature of three days to account for themral changes of heated space. In the example below, we've create this window of size three by setting `target_rolling_window_size=3` in the `AutoMLConfig` constructor. The table shows feature engineering that occurs when window aggregation is applied. Columns for minimum, maximum, and sum are generated on a sliding window of three based on the defined settings. Each row has a new calculated feature, in the case of the time-stamp for September 8, 2017 4:00am the maximum, minimm, and sum values are calculated using the demand values for Septeber 8, 2017 1:00AM - 3:00AM. This window of three shifts along to populate data for the remaining rows.
217+
218+
<insert picture>
219+
220+
Generating and using these additional features as extra contextual data helps with the accuracy of the train model.
221+
222+
View a Python notebook example of leveraging the [target rolling window aggregate feature](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/automated-machine-learning/forecasting-energy-demand/auto-ml-forecasting-energy-demand.ipynb).
223+
204224
### View feature engineering summary
205225

206226
For time-series task types in automated machine learning, you can view details from the feature engineering process. The following code shows each raw feature along with the following attributes:

0 commit comments

Comments
 (0)