Skip to content

Commit 39a732a

Browse files
authored
Merge pull request #46626 from matthewconners/patch-5
Updated how to guide with new features and plots
2 parents 5f10740 + 4a1234c commit 39a732a

File tree

1 file changed

+172
-26
lines changed

1 file changed

+172
-26
lines changed

articles/machine-learning/service/how-to-build-deploy-forecast-models.md

Lines changed: 172 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ ms.topic: conceptual
88
ms.reviewer: jmartens
99
ms.author: mattcon
1010
author: matthewconners
11-
ms.date: 05/07/2018
11+
ms.date: 07/13/2018
1212
---
1313

1414
# Build and deploy forecasting models with Azure Machine Learning
@@ -31,7 +31,7 @@ Consult the [package reference documentation](https://aka.ms/aml-packages/foreca
3131
- An Azure Machine Learning Model Management account
3232
- Azure Machine Learning Workbench installed
3333

34-
If these three are not yet created or installed, follow the [Azure Machine Learning Quickstart and Workbench installation](../service/quickstart-installation.md) article.
34+
If these three are not yet created or installed, follow the [Azure Machine Learning Quickstart and Workbench installation](../service/quickstart-installation.md) article.
3535

3636
1. The Azure Machine Learning Package for Forecasting must be installed. Learn how to [install this package here](https://aka.ms/aml-packages/forecasting).
3737

@@ -72,19 +72,20 @@ import pkg_resources
7272
from datetime import timedelta
7373
import matplotlib
7474
matplotlib.use('agg')
75+
%matplotlib inline
7576
from matplotlib import pyplot as plt
7677

7778
from sklearn.linear_model import Lasso, ElasticNet
7879
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
7980
from sklearn.neighbors import KNeighborsRegressor
8081

8182
from ftk import TimeSeriesDataFrame, ForecastDataFrame, AzureMLForecastPipeline
82-
from ftk.tsutils import last_n_periods_split
83+
from ftk.ts_utils import last_n_periods_split
8384

8485
from ftk.transforms import TimeSeriesImputer, TimeIndexFeaturizer, DropColumns
8586
from ftk.transforms.grain_index_featurizer import GrainIndexFeaturizer
86-
from ftk.models import Arima, SeasonalNaive, Naive, RegressionForecaster, ETS
87-
from ftk.models.forecasterunion import ForecasterUnion
87+
from ftk.models import Arima, SeasonalNaive, Naive, RegressionForecaster, ETS, BestOfForecaster
88+
from ftk.models.forecaster_union import ForecasterUnion
8889
from ftk.model_selection import TSGridSearchCV, RollingOriginValidator
8990

9091
from azuremltkbase.deployment import AMLSettings
@@ -497,12 +498,11 @@ The [TimeSeriesDataFrame.ts_report](https://docs.microsoft.com/en-us/python/api/
497498

498499

499500
```python
500-
%matplotlib inline
501501
whole_tsdf.ts_report()
502502
```
503503

504504
-------------------------------- Data Overview ---------------------------------
505-
<class 'ftk.dataframets.TimeSeriesDataFrame'>
505+
<class 'ftk.time_series_data_frame.TimeSeriesDataFrame'>
506506
MultiIndex: 28947 entries, (1990-06-20 23:59:59, 2, dominicks) to (1992-10-07 23:59:59, 137, tropicana)
507507
Data columns (total 17 columns):
508508
week 28947 non-null int64
@@ -658,12 +658,6 @@ whole_tsdf.ts_report()
658658

659659
![png](./media/how-to-build-deploy-forecast-models/output_15_6.png)
660660

661-
![png](./media/how-to-build-deploy-forecast-models/output_59_0.png)
662-
![png](./media/how-to-build-deploy-forecast-models/output_61_0.png)
663-
![png](./media/how-to-build-deploy-forecast-models/output_63_0.png)
664-
![png](./media/how-to-build-deploy-forecast-models/output_63_1.png)
665-
666-
667661

668662
## Integrate with external data
669663

@@ -887,7 +881,7 @@ whole_tsdf.head()
887881

888882
## Preprocess data and impute missing values
889883

890-
Start by splitting the data into training set and a testing set with the [ftk.tsutils.last_n_periods_split](https://docs.microsoft.com/en-us/python/api/ftk.ts_utils?view=azure-ml-py-latest) utility function. The resulting testing set contains the last 40 observations of each time series.
884+
Start by splitting the data into training set and a testing set with the [last_n_periods_split](https://docs.microsoft.com/en-us/python/api/ftk.ts_utils?view=azure-ml-py-latest) utility function. The resulting testing set contains the last 40 observations of each time series.
891885

892886

893887
```python
@@ -969,7 +963,7 @@ print(ts_regularity[ts_regularity['regular'] == False])
969963
[213 rows x 2 columns]
970964

971965

972-
You can see that most of the series (213 out of 249) are irregular. An [imputation transform](https://docs.microsoft.com/en-us/python/api/ftk.transforms.ts_imputer?view=azure-ml-py-latest) is required to fill in missing sales quantity values. While there are many imputation options, the following sample code uses a linear interpolation.
966+
You can see that most of the series (213 out of 249) are irregular. An [imputation transform](https://docs.microsoft.com/en-us/python/api/ftk.transforms.ts_imputer.timeseriesimputer?view=azure-ml-py-latest) is required to fill in missing sales quantity values. While there are many imputation options, the following sample code uses a linear interpolation.
973967

974968

975969
```python
@@ -1035,7 +1029,7 @@ arima_model = Arima(oj_series_freq, arima_order)
10351029

10361030
### Combine Multiple Models
10371031

1038-
The [ForecasterUnion](https://docs.microsoft.com/en-us/python/api/ftk.models.forecaster_union.forecasterunion?view=azure-ml-py-latest) estimator allows you to combine multiple estimators and fit/predict on them using one line of code.
1032+
The [ForecasterUnion](https://docs.microsoft.com/en-us/python/api/ftk.models.forecaster_union?view=azure-ml-py-latest) estimator allows you to combine multiple estimators and fit/predict on them using one line of code.
10391033

10401034

10411035
```python
@@ -1200,10 +1194,10 @@ test_feature_tsdf = pipeline_ml.transform(test_tsdf)
12001194
print(train_feature_tsdf.head())
12011195
```
12021196

1203-
F1 2018-05-04 11:00:54,308 INFO azureml.timeseries - pipeline fit_transform started.
1204-
F1 2018-05-04 11:01:02,545 INFO azureml.timeseries - pipeline fit_transform finished. Time elapsed 0:00:08.237301
1205-
F1 2018-05-04 11:01:02,576 INFO azureml.timeseries - pipeline transforms started.
1206-
F1 2018-05-04 11:01:19,048 INFO azureml.timeseries - pipeline transforms finished. Time elapsed 0:00:16.471961
1197+
F1 2018-06-14 23:10:03,472 INFO azureml.timeseries - pipeline fit_transform started.
1198+
F1 2018-06-14 23:10:07,317 INFO azureml.timeseries - pipeline fit_transform finished. Time elapsed 0:00:03.845078
1199+
F1 2018-06-14 23:10:07,317 INFO azureml.timeseries - pipeline transforms started.
1200+
F1 2018-06-14 23:10:16,499 INFO azureml.timeseries - pipeline transforms finished. Time elapsed 0:00:09.182314
12071201
feat price AGE60 EDUC ETHNIC \
12081202
WeekLastDay store brand
12091203
1990-06-20 23:59:59 2 dominicks 1.00 1.59 0.23 0.25 0.11
@@ -1365,14 +1359,17 @@ all_errors.sort_values('MedianAPE')
13651359

13661360
Some machine learning models were able to take advantage of the added features and the similarities between series to get better forecast accuracy.
13671361

1368-
**Cross-Validation and Parameter Sweeping**
1362+
### Cross Validation, Parameter, and Model Sweeping
13691363

1370-
The package adapts some traditional machine learning functions for a forecasting application. [RollingOriginValidator](https://docs.microsoft.com/python/api/ftk.model_selection.cross_validation.rollingoriginvalidator) does cross-validation temporally, respecting what would and would not be known in a forecasting framework.
1364+
The package adapts some traditional machine learning functions for a forecasting application. [RollingOriginValidator](https://docs.microsoft.com/python/api/ftk.model_selection.cross_validation.rollingoriginvalidator?view=azure-ml-py-latest) does cross-validation temporally, respecting what would and would not be known in a forecasting framework.
13711365

13721366
In the figure below, each square represents data from one time point. The blue squares represent training and orange squares represent testing in each fold. Testing data must come from the time points after the largest training time point. Otherwise, future data is leaked into training data causing the model evaluation to become invalid.
1373-
13741367
![png](./media/how-to-build-deploy-forecast-models/cv_figure.PNG)
13751368

1369+
**Parameter Sweeping**
1370+
The [TSGridSearchCV](https://docs.microsoft.com/en-us/python/api/ftk.model_selection.search.tsgridsearchcv?view=azure-ml-py-latest) class exhaustively searches over specified parameter values and uses `RollingOriginValidator` to evaluate parameter performance in order to find the best parameters.
1371+
1372+
13761373
```python
13771374
# Set up the `RollingOriginValidator` to do 2 folds of rolling origin cross-validation
13781375
rollcv = RollingOriginValidator(n_splits=2)
@@ -1391,6 +1388,102 @@ print('Best paramter: {}'.format(randomforest_cv_fitted.best_params_))
13911388
Best paramter: {'estimator__n_estimators': 100}
13921389

13931390

1391+
**Model Sweeping**
1392+
The `BestOfForecaster` class selects the model with the best performance from a list of given models. Similar to `TSGridSearchCV`, it also uses RollingOriginValidator for cross validation and performance evaluation.
1393+
Here we pass a list of two models to demonstrate the usage of `BestOfForecaster`
1394+
1395+
1396+
```python
1397+
best_of_forecaster = BestOfForecaster(forecaster_list=[('naive', naive_model),
1398+
('random_forest', random_forest_model)])
1399+
best_of_forecaster_fitted = best_of_forecaster.fit(train_feature_tsdf,
1400+
validator=RollingOriginValidator(n_step=20, max_horizon=40))
1401+
best_of_forecaster_prediction = best_of_forecaster_fitted.predict(test_feature_tsdf)
1402+
best_of_forecaster_prediction.head()
1403+
```
1404+
1405+
1406+
1407+
1408+
<table border="1" class="dataframe">
1409+
<thead>
1410+
<tr style="text-align: right;">
1411+
<th></th>
1412+
<th></th>
1413+
<th></th>
1414+
<th></th>
1415+
<th></th>
1416+
<th>PointForecast</th>
1417+
<th>DistributionForecast</th>
1418+
<th>Quantity</th>
1419+
</tr>
1420+
<tr>
1421+
<th>WeekLastDay</th>
1422+
<th>store</th>
1423+
<th>brand</th>
1424+
<th>ForecastOriginTime</th>
1425+
<th>ModelName</th>
1426+
<th></th>
1427+
<th></th>
1428+
<th></th>
1429+
</tr>
1430+
</thead>
1431+
<tbody>
1432+
<tr>
1433+
<th>1992-01-08 23:59:59</th>
1434+
<th>2</th>
1435+
<th>dominicks</th>
1436+
<th>1992-01-01 23:59:59</th>
1437+
<th>random_forest</th>
1438+
<td>9299.20</td>
1439+
<td>&lt;scipy.stats._distn_infrastructure.rv_frozen o...</td>
1440+
<td>11712.00</td>
1441+
</tr>
1442+
<tr>
1443+
<th>1992-01-15 23:59:59</th>
1444+
<th>2</th>
1445+
<th>dominicks</th>
1446+
<th>1992-01-01 23:59:59</th>
1447+
<th>random_forest</th>
1448+
<td>10259.20</td>
1449+
<td>&lt;scipy.stats._distn_infrastructure.rv_frozen o...</td>
1450+
<td>4032.00</td>
1451+
</tr>
1452+
<tr>
1453+
<th>1992-01-22 23:59:59</th>
1454+
<th>2</th>
1455+
<th>dominicks</th>
1456+
<th>1992-01-01 23:59:59</th>
1457+
<th>random_forest</th>
1458+
<td>6828.80</td>
1459+
<td>&lt;scipy.stats._distn_infrastructure.rv_frozen o...</td>
1460+
<td>6336.00</td>
1461+
</tr>
1462+
<tr>
1463+
<th>1992-01-29 23:59:59</th>
1464+
<th>2</th>
1465+
<th>dominicks</th>
1466+
<th>1992-01-01 23:59:59</th>
1467+
<th>random_forest</th>
1468+
<td>16633.60</td>
1469+
<td>&lt;scipy.stats._distn_infrastructure.rv_frozen o...</td>
1470+
<td>13632.00</td>
1471+
</tr>
1472+
<tr>
1473+
<th>1992-02-05 23:59:59</th>
1474+
<th>2</th>
1475+
<th>dominicks</th>
1476+
<th>1992-01-01 23:59:59</th>
1477+
<th>random_forest</th>
1478+
<td>12774.40</td>
1479+
<td>&lt;scipy.stats._distn_infrastructure.rv_frozen o...</td>
1480+
<td>45120.00</td>
1481+
</tr>
1482+
</tbody>
1483+
</table>
1484+
1485+
1486+
13941487
**Build the final pipeline**
13951488
Now that you have identified the best model, you can build and fit your final pipeline with all transformers and the best model.
13961489

@@ -1411,9 +1504,62 @@ print('Median of APE of final pipeline: {0}'.format(final_median_ape))
14111504
Median of APE of final pipeline: 42.54336821266968
14121505

14131506

1414-
## Operationalization: deploy and consume
1507+
## Visualization
1508+
The `ForecastDataFrame` class provides plotting functions for visualizing and analyzing forecasting results. Use the commonly used charts with your data. Please see the sample notebook below on plotting functions for all the functions available.
1509+
1510+
The `show_error` function plots performance metrics aggregated by an arbitrary column. By default, the `show_error` function aggregates by the `grain_colnames` of the `ForecastDataFrame`. It's often useful to identify the grains/groups with the best or worst performance, especially when you have a large number of time series. The `performance_percent` argument of `show_error` allows you to specify a performance interval and plot the error of a subset of grains/groups.
1511+
1512+
Plot the grains with the bottom 5% performance, i.e. top 5% MedianAPE
1513+
1514+
1515+
```python
1516+
fig, ax = best_of_forecaster_prediction.show_error(err_name='MedianAPE', err_fun=calc_median_ape, performance_percent=(0.95, 1))
1517+
```
1518+
1519+
![png](./media/how-to-build-deploy-forecast-models/output_59_0.png)
1520+
1521+
1522+
Plot the grains with the top 5% of performance, i.e. bottom 5% MedianAPE.
1523+
1524+
1525+
```python
1526+
fig, ax = best_of_forecaster_prediction.show_error(err_name='MedianAPE', err_fun=calc_median_ape, performance_percent=(0, 0.05))
1527+
```
1528+
1529+
1530+
![png](./media/how-to-build-deploy-forecast-models/output_61_0.png)
1531+
1532+
1533+
Once you have an idea of the overall performance, you may want to explore individual grains, especially those that performed poorly. The `plot_forecast_by_grain` method plots forecast vs. actual of specified grains. Here, we plot the grain with the best performance and the grain with the worst performance discovered in the `show_error` plot.
1534+
1535+
1536+
```python
1537+
fig_ax = best_of_forecaster_prediction.plot_forecast_by_grain(grains=[(33, 'tropicana'), (128, 'minute.maid')])
1538+
```
1539+
1540+
1541+
![png](./media/how-to-build-deploy-forecast-models/output_63_0.png)
1542+
1543+
1544+
1545+
![png](./media/how-to-build-deploy-forecast-models/output_63_1.png)
1546+
1547+
1548+
1549+
## Additional Notebooks
1550+
For a deeper dive on the major features of AMLPF, please refer to the following notebooks with more details and examples of each feature:
1551+
[Notebook on TimeSeriesDataFrame](https://azuremlftkrelease.blob.core.windows.net/samples/feature_notebooks/Introduction_to_TimeSeriesDataFrames.ipynb)
1552+
[Notebook on Data Wrangling](https://azuremlftkrelease.blob.core.windows.net/samples/feature_notebooks/Data_Wrangling_Sample.ipynb)
1553+
[Notebook on Transformers](https://azuremlftkrelease.blob.core.windows.net/samples/feature_notebooks/Forecast_Package_Transforms.ipynb)
1554+
[Notebook on Models](https://azuremlftkrelease.blob.core.windows.net/samples/feature_notebooks/AMLPF_models_sample_notebook.ipynb)
1555+
[Notebook on Cross Validation](https://azuremlftkrelease.blob.core.windows.net/samples/feature_notebooks/Time_Series_Cross_Validation.ipynb)
1556+
[Notebook on Lag Transformer and OriginTime](https://azuremlftkrelease.blob.core.windows.net/samples/feature_notebooks/Constructing_Lags_and_Explaining_Origin_Times.ipynb)
1557+
[Notebook on Plotting Functions](https://azuremlftkrelease.blob.core.windows.net/samples/feature_notebooks/Plotting_Functions_in_AMLPF.ipynb)
1558+
1559+
## Operationalization
14151560

1416-
In this section, you deploy a pipeline as an Azure Machine Learning web service and consume it for training and scoring. Scoring the deployed web service retrains the model and generates forecasts on new data.
1561+
In this section, you deploy a pipeline as an Azure Machine Learning web service and consume it for training and scoring.
1562+
Currently, only pipelines there are not fitted are supported for deployment. Scoring the deployed web service retrains the model and generates forecasts on new data.
14171563

14181564
### Set model deployment parameters
14191565

@@ -1480,7 +1626,7 @@ aml_deployment = ForecastWebserviceFactory(deployment_name=deployment_name,
14801626
aml_settings=aml_settings,
14811627
pipeline=pipeline_deploy,
14821628
deployment_working_directory=deployment_working_directory,
1483-
ftk_wheel_loc='https://azuremlpackages.blob.core.windows.net/forecasting/azuremlftk-0.1.18055.3a1-py3-none-any.whl')
1629+
ftk_wheel_loc='https://azuremlftkrelease.blob.core.windows.net/dailyrelease/azuremlftk-0.1.18165.29a1-py3-none-any.whl')
14841630
```
14851631

14861632
### Create the web service

0 commit comments

Comments
 (0)