Skip to content

Commit df82295

Browse files
authored
Merge pull request #112548 from cartacioS/patch-37
Pics and info for forecasting validation and rolling window
2 parents c49d819 + e8bf2d1 commit df82295

File tree

3 files changed

+1272
-0
lines changed

3 files changed

+1272
-0
lines changed

articles/machine-learning/how-to-auto-train-forecast.md

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -101,6 +101,25 @@ test_labels = test_data.pop(label).values
101101
> points, and model accuracy could suffer.
102102
103103
<a name="config"></a>
104+
105+
## Train and validation data
106+
You can specify separate train and validation sets directly in the `AutoMLConfig` constructor.
107+
108+
### Rolling Origin Cross Validation
109+
For time series forecasting Rolling Origin Cross Validation (ROCV) is used to split time series in a temporally consistent way. ROCV divides the series into training and validation data using an origin time point. Sliding the origin in time generates the cross-validation folds.
110+
111+
![alt text](./media/how-to-auto-train-forecast/ROCV.svg)
112+
113+
This strategy will preserve the time series data integrity and eliminate the risk of data leakage. ROCV is automatically used for forecasting tasks by passing the training and validation data together and setting the number of cross validation folds using `n_cross_validations`.
114+
115+
```python
116+
automl_config = AutoMLConfig(task='forecasting',
117+
n_cross_validations=3,
118+
...
119+
**time_series_settings)
120+
```
121+
Learn more about the [AutoMLConfig](#configure-and-run-experiment).
122+
104123
## Configure and run experiment
105124

106125
For forecasting tasks, automated machine learning uses pre-processing and estimation steps that are specific to time-series data. The following pre-processing steps will be executed:
@@ -201,6 +220,17 @@ For more information on AML compute and VM sizes that include GPU's, see the [AM
201220

202221
View the [Beverage Production Forecasting notebook](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/automated-machine-learning/forecasting-beer-remote/auto-ml-forecasting-beer-remote.ipynb) for a detailed code example leveraging DNNs.
203222

223+
### Target Rolling Window Aggregation
224+
Often the best information a forecaster can have is the recent value of the target. Creating cumulative statistics of the target may increase the accuracy of your predictions. Target rolling window aggregations allows you to add a rolling aggregation of data values as features. To enable target rolling windows set the `target_rolling_window_size` to your desired integer window size.
225+
226+
An example of this can be seen when predicting energy demand. You might add a rolling window feature of three days to account for thermal changes of heated spaces. In the example below, we've created this window of size three by setting `target_rolling_window_size=3` in the `AutoMLConfig` constructor. The table shows feature engineering that occurs when window aggregation is applied. Columns for minimum, maximum, and sum are generated on a sliding window of three based on the defined settings. Each row has a new calculated feature, in the case of the time-stamp for September 8, 2017 4:00am the maximum, minimum, and sum values are calculated using the demand values for September 8, 2017 1:00AM - 3:00AM. This window of three shifts along to populate data for the remaining rows.
227+
228+
![alt text](./media/how-to-auto-train-forecast/target-roll.svg)
229+
230+
Generating and using these additional features as extra contextual data helps with the accuracy of the train model.
231+
232+
View a Python code example leveraging the [target rolling window aggregate feature](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/automated-machine-learning/forecasting-energy-demand/auto-ml-forecasting-energy-demand.ipynb).
233+
204234
### View feature engineering summary
205235

206236
For time-series task types in automated machine learning, you can view details from the feature engineering process. The following code shows each raw feature along with the following attributes:

0 commit comments

Comments
 (0)