You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/machine-learning/how-to-auto-train-forecast.md
+73-70Lines changed: 73 additions & 70 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -466,15 +466,16 @@ The table shows resulting feature engineering that occurs when window aggregatio
466
466
467
467

468
468
469
-
You can enable lag and rolling window aggregation features by setting the rolling window size, which was three in the previous example, and the lag orders you wish to create. In the following sample, we set both of these settings to `auto` so that AutoML will automatically determine these values by analyzing the correlation structure of your data:
469
+
You can enable lag and rolling window aggregation features for the target by setting the rolling window size, which was three in the previous example, and the lag orders you want to create. You can also enable lags for features with the `feature_lags` setting. In the following sample, we set all of these settings to `auto` so that AutoML will automatically determine settings by analyzing the correlation structure of your data:
@@ -500,7 +502,7 @@ AutoML has several actions it can take for short series. These actions are confi
500
502
|Setting|Description
501
503
|---|---
502
504
|`auto`| The default value for short series handling. <br> - _If all series are short_, pad the data. <br> - _If not all series are short_, drop the short series.
503
-
|`pad`| If `short_series_handling_config = pad`, then automated ML adds random values to each short series found. The following lists the column types and what they're padded with: <br> - Object columns with NaNs <br> - Numeric columns with 0 <br> - Boolean/logic columns with False <br> - The target column is padded with random values with mean of zero and standard deviation of 1.
505
+
|`pad`| If `short_series_handling_config = pad`, then automated ML adds random values to each short series found. The following lists the column types and what they're padded with: <br> - Object columns with NaNs <br> - Numeric columns with 0 <br> - Boolean/logic columns with False <br> - The target column is padded with white noise.
504
506
|`drop`| If `short_series_handling_config = drop`, then automated ML drops the short series, and it will not be used for training or prediction. Predictions for these series will return NaN's.
505
507
|`None`| No series is padded or dropped
506
508
@@ -528,7 +530,7 @@ forecasting:
528
530
---
529
531
530
532
>[!WARNING]
531
-
>Padding may impact the accuracy of the resulting model, since we are introducing artificial data just to get past training without failures. If many of the series are short, then you may also see some impact in explainability results
533
+
>Padding may impact the accuracy of the resulting model, since we are introducing artificial data to avoid training failures. If many of the series are short, then you may also see some impact in explainability results
@@ -1083,12 +1085,12 @@ Next, we define a factory function that creates pipelines for orchestration of m
1083
1085
1084
1086
Parameter|Description
1085
1087
--|--
1086
-
**instance_count** | Number of compute nodes to use in the training job
1087
-
**max_concurrency_per_instance** | Number of AutoML processes to run on each node. Hence, the total concurrency of a many models jobs is `instance_count * max_concurrency_per_instance`.
1088
-
**prs_step_timeout_in_seconds** | Many models component timeout given in number of seconds.
1089
-
**enable_event_logger** | Flag to enable event logging.
1088
+
**max_nodes** | Number of compute nodes to use in the training job
1089
+
**max_concurrency_per_node** | Number of AutoML processes to run on each node. Hence, the total concurrency of a many models jobs is `max_nodes * max_concurrency_per_node`.
1090
+
**parallel_step_timeout_in_seconds** | Many models component timeout given in number of seconds.
1090
1091
**retrain_failed_models** | Flag to enable re-training for failed models. This is useful if you've done previous many models runs that resulted in failed AutoML jobs on some data partitions. When this flag is enabled, many models will only launch training jobs for previously failed partitions.
1091
-
**forecast_mode** | Inference mode for model evaluation. Valid values are `"recursive"` and "`rolling`". See the [model evaluation article](concept-automl-forecasting-evaluation.md) for more information.
1092
+
**forecast_mode** | Inference mode for model evaluation. Valid values are `"recursive"` and "`rolling`". See the [model evaluation article](concept-automl-forecasting-evaluation.md) for more information.
1093
+
**forecast_step** | Step size for rolling forecast. See the [model evaluation article](concept-automl-forecasting-evaluation.md) for more information.
1092
1094
1093
1095
The following sample illustrates a factory method for constructing many models training and model evaluation pipelines:
You launch the pipeline job with the following command, assuming the many models pipeline configuration is at the path `./automl-mm-forecasting-pipeline.yml`:
1291
1294
1292
1295
```azurecli
1293
-
az ml job create --file automl-mm-forecasting-pipeline.yml
1296
+
az ml job create --file automl-mm-forecasting-pipeline.yml -w <Workspace> -g <Resource Group> --subscription <Subscription>
1294
1297
```
1295
1298
1296
1299
---
@@ -1359,9 +1362,9 @@ Parameter|Description
1359
1362
--|--
1360
1363
**forecast_level** | The level of the hierarchy to retrieve forecasts for
1361
1364
**allocation_method** | Allocation method to use when forecasts are disaggregated. Valid values are `"proportions_of_historical_average"` and `"average_historical_proportions"`.
1362
-
**instance_count** | Number of compute nodes to use in the training job
1363
-
**max_concurrency_per_instance** | Number of AutoML processes to run on each node. Hence, the total concurrency of a HTS job is `instance_count * max_concurrency_per_instance`.
1364
-
**prs_step_timeout_in_seconds** | Many models component timeout given in number of seconds.
1365
+
**max_nodes** | Number of compute nodes to use in the training job
1366
+
**max_concurrency_per_node** | Number of AutoML processes to run on each node. Hence, the total concurrency of a HTS job is `max_nodes * max_concurrency_per_node`.
1367
+
**parallel_step_timeout_in_seconds** | Many models component timeout given in number of seconds.
1365
1368
**forecast_mode** | Inference mode for model evaluation. Valid values are `"recursive"` and "`rolling`". See the [model evaluation article](concept-automl-forecasting-evaluation.md) for more information.
You launch the pipeline job with the following command, assuming the many models pipeline configuration is at the path `./automl-hts-forecasting-pipeline.yml`:
1561
1564
1562
1565
```azurecli
1563
-
az ml job create --file automl-hts-forecasting-pipeline.yml
1566
+
az ml job create --file automl-hts-forecasting-pipeline.yml -w <Workspace> -g <Resource Group> --subscription <Subscription>
You can use the stored run ID to return information about the job. The `--web` parameter opens the Azure Machine Learning studio web UI where you can drill into details on the job:
@@ -629,7 +629,7 @@ jobs:
629
629
Now, you launch the pipeline run using the following command, assuming the pipeline configuration is at the path `./automl-classification-pipeline.yml`:
630
630
631
631
```azurecli
632
-
> run_id=$(az ml job create --file automl-classification-pipeline.yml)
0 commit comments