Skip to content

Commit f19b7c2

Browse files
address review pt 1
1 parent 0db564a commit f19b7c2

File tree

2 files changed

+75
-72
lines changed

2 files changed

+75
-72
lines changed

articles/machine-learning/how-to-auto-train-forecast.md

Lines changed: 73 additions & 70 deletions
Original file line numberDiff line numberDiff line change
@@ -466,15 +466,16 @@ The table shows resulting feature engineering that occurs when window aggregatio
466466

467467
![target rolling window](./media/how-to-auto-train-forecast/target-roll.svg)
468468

469-
You can enable lag and rolling window aggregation features by setting the rolling window size, which was three in the previous example, and the lag orders you wish to create. In the following sample, we set both of these settings to `auto` so that AutoML will automatically determine these values by analyzing the correlation structure of your data:
469+
You can enable lag and rolling window aggregation features for the target by setting the rolling window size, which was three in the previous example, and the lag orders you want to create. You can also enable lags for features with the `feature_lags` setting. In the following sample, we set all of these settings to `auto` so that AutoML will automatically determine settings by analyzing the correlation structure of your data:
470470

471471
# [Python SDK](#tab/python)
472472

473473
```python
474474
forecasting_job.set_forecast_settings(
475475
..., # other settings
476476
target_lags='auto',
477-
target_rolling_window_size='auto'
477+
target_rolling_window_size='auto',
478+
feature_lags='auto'
478479
)
479480
```
480481

@@ -486,6 +487,7 @@ forecasting_job.set_forecast_settings(
486487
forecasting:
487488
target_lags: auto
488489
target_rolling_window_size: auto
490+
feature_lags: auto
489491
# other settings
490492
```
491493

@@ -500,7 +502,7 @@ AutoML has several actions it can take for short series. These actions are confi
500502
|Setting|Description
501503
|---|---
502504
|`auto`| The default value for short series handling. <br> - _If all series are short_, pad the data. <br> - _If not all series are short_, drop the short series.
503-
|`pad`| If `short_series_handling_config = pad`, then automated ML adds random values to each short series found. The following lists the column types and what they're padded with: <br> - Object columns with NaNs <br> - Numeric columns with 0 <br> - Boolean/logic columns with False <br> - The target column is padded with random values with mean of zero and standard deviation of 1.
505+
|`pad`| If `short_series_handling_config = pad`, then automated ML adds random values to each short series found. The following lists the column types and what they're padded with: <br> - Object columns with NaNs <br> - Numeric columns with 0 <br> - Boolean/logic columns with False <br> - The target column is padded with white noise.
504506
|`drop`| If `short_series_handling_config = drop`, then automated ML drops the short series, and it will not be used for training or prediction. Predictions for these series will return NaN's.
505507
|`None`| No series is padded or dropped
506508

@@ -528,7 +530,7 @@ forecasting:
528530
---
529531

530532
>[!WARNING]
531-
>Padding may impact the accuracy of the resulting model, since we are introducing artificial data just to get past training without failures. If many of the series are short, then you may also see some impact in explainability results
533+
>Padding may impact the accuracy of the resulting model, since we are introducing artificial data to avoid training failures. If many of the series are short, then you may also see some impact in explainability results
532534

533535
#### Frequency & target data aggregation
534536

@@ -792,7 +794,7 @@ ml_client_metrics_registry = MLClient(
792794
793795
# Get an inference component from the registry
794796
inference_component = ml_client_registry.components.get(
795-
name="forecasting_model_inference",
797+
name="automl_forecasting_inference",
796798
label="latest"
797799
)
798800
@@ -968,7 +970,7 @@ jobs:
968970
# Configure the inference node to make rolling forecasts on the test set
969971
inference_node:
970972
type: command
971-
component: azureml://registries/azureml-preview/components/automl_forecasting_inference@latest
973+
component: azureml://registries/azureml-preview/components/automl_forecasting_inference
972974
inputs:
973975
target_column_name: ${{parent.inputs.target_column_name}}
974976
forecast_mode: rolling
@@ -983,7 +985,7 @@ jobs:
983985
# Configure the metrics calculation node
984986
compute_metrics:
985987
type: command
986-
component: azureml://registries/azureml/compute_metrics@latest
988+
component: azureml://registries/azureml/compute_metrics
987989
inputs:
988990
task: "tabular-forecasting"
989991
ground_truth: ${{parent.jobs.inference_node.outputs.inference_output_file}}
@@ -998,7 +1000,7 @@ Note that AutoML requires training data in [MLTable format](#training-and-valida
9981000
Now, you launch the pipeline run using the following command, assuming the pipeline configuration is at the path `./automl-forecasting-pipeline.yml`:
9991001

10001002
```azurecli
1001-
run_id=$(az ml job create --file automl-forecasting-pipeline.yml)
1003+
run_id=$(az ml job create --file automl-forecasting-pipeline.yml -w <Workspace> -g <Resource Group> --subscription <Subscription>)
10021004
```
10031005

10041006
---
@@ -1083,12 +1085,12 @@ Next, we define a factory function that creates pipelines for orchestration of m
10831085

10841086
Parameter|Description
10851087
--|--
1086-
**instance_count** | Number of compute nodes to use in the training job
1087-
**max_concurrency_per_instance** | Number of AutoML processes to run on each node. Hence, the total concurrency of a many models jobs is `instance_count * max_concurrency_per_instance`.
1088-
**prs_step_timeout_in_seconds** | Many models component timeout given in number of seconds.
1089-
**enable_event_logger** | Flag to enable event logging.
1088+
**max_nodes** | Number of compute nodes to use in the training job
1089+
**max_concurrency_per_node** | Number of AutoML processes to run on each node. Hence, the total concurrency of a many models jobs is `max_nodes * max_concurrency_per_node`.
1090+
**parallel_step_timeout_in_seconds** | Many models component timeout given in number of seconds.
10901091
**retrain_failed_models** | Flag to enable re-training for failed models. This is useful if you've done previous many models runs that resulted in failed AutoML jobs on some data partitions. When this flag is enabled, many models will only launch training jobs for previously failed partitions.
1091-
**forecast_mode** | Inference mode for model evaluation. Valid values are `"recursive"` and "`rolling`". See the [model evaluation article](concept-automl-forecasting-evaluation.md) for more information.
1092+
**forecast_mode** | Inference mode for model evaluation. Valid values are `"recursive"` and "`rolling`". See the [model evaluation article](concept-automl-forecasting-evaluation.md) for more information.
1093+
**forecast_step** | Step size for rolling forecast. See the [model evaluation article](concept-automl-forecasting-evaluation.md) for more information.
10921094

10931095
The following sample illustrates a factory method for constructing many models training and model evaluation pipelines:
10941096

@@ -1132,31 +1134,33 @@ def many_models_train_evaluate_factory(
11321134
train_data_input,
11331135
test_data_input,
11341136
automl_config_input,
1135-
max_concurrency_per_instance=4,
1136-
prs_step_timeout=3700,
1137-
instance_count=4,
1138-
enable_event_logger=True,
1137+
compute_name,
1138+
max_concurrency_per_node=4,
1139+
parallel_step_timeout_in_seconds=3700,
1140+
max_nodes=4,
11391141
retrain_failed_model=False,
1140-
forecast_mode="rolling"
1142+
forecast_mode="rolling",
1143+
forecast_step=1
11411144
):
11421145
mm_train_node = mm_train_component(
11431146
raw_data=train_data_input,
11441147
automl_config=automl_config_input,
1145-
max_concurrency_per_instance=max_concurrency_per_instance,
1146-
prs_step_timeout_in_seconds=prs_step_timeout,
1147-
instance_count=instance_count,
1148-
enable_event_logger=enable_event_logger,
1149-
retrain_failed_model=retrain_failed_model
1148+
max_nodes=max_nodes,
1149+
max_concurrency_per_node=max_concurrency_per_node,
1150+
parallel_step_timeout_in_seconds=parallel_step_timeout_in_seconds,
1151+
retrain_failed_model=retrain_failed_model,
1152+
compute_name=compute_name
11501153
)
11511154
11521155
mm_inference_node = mm_inference_component(
11531156
raw_data=test_data_input,
1154-
enable_event_logger=enable_event_logger,
1155-
instance_count=instance_count,
1156-
max_concurrency_per_instance=max_concurrency_per_instance,
1157-
prs_step_timeout=prs_step_timeout,
1157+
max_nodes=max_nodes,
1158+
max_concurrency_per_node=max_concurrency_per_node,
1159+
parallel_step_timeout_in_seconds=parallel_step_timeout_in_seconds,
11581160
optional_train_metadata=mm_train_node.outputs.run_output,
1159-
forecast_mode=forecast_mode
1161+
forecast_mode=forecast_mode,
1162+
forecast_step=forecast_step,
1163+
compute_name=compute_name
11601164
)
11611165
11621166
compute_metrics_node = compute_metrics_component(
@@ -1187,9 +1191,10 @@ pipeline_job = many_models_train_evaluate_factory(
11871191
automl_config=Input(
11881192
type="uri_file",
11891193
path="./automl_settings_mm.yml"
1190-
)
1194+
),
1195+
compute_name="<cluster name>"
11911196
)
1192-
pipeline_job.settings.default_compute = "cluster-name"
1197+
pipeline_job.settings.default_compute = "<cluster name>"
11931198
11941199
returned_pipeline_job = ml_client.jobs.create_or_update(
11951200
pipeline_job,
@@ -1223,11 +1228,10 @@ inputs:
12231228
automl_config_input:
12241229
type: uri_file
12251230
path: "./automl_settings_mm.yml"
1226-
max_concurrency_per_instance: 4
1227-
prs_step_timeout: 3700
1228-
instance_count: 4
1231+
max_nodes: 4
1232+
max_concurrency_per_node: 4
1233+
parallel_step_timeout_in_seconds: 3700
12291234
forecast_mode: rolling
1230-
enable_event_logger: True
12311235
retrain_failed_model: False
12321236
12331237
# pipeline outputs
@@ -1241,30 +1245,29 @@ jobs:
12411245
# Configure AutoML many models training component
12421246
mm_train_node:
12431247
type: command
1244-
component: azureml://registries/azureml-preview/components/automl_many_models_training@latest
1248+
component: azureml://registries/azureml-preview/components/automl_many_models_training
12451249
inputs:
12461250
raw_data: ${{parent.inputs.train_data_input}}
12471251
automl_config: ${{parent.inputs.automl_config_input}}
1248-
instance_count: ${{parent.inputs.instance_count}}
1249-
max_concurrency_per_instance: ${{parent.inputs.max_concurrency_per_instance}}
1250-
prs_step_timeout: ${{parent.inputs.prs_step_timeout}}
1252+
max_nodes: ${{parent.inputs.max_nodes}}
1253+
max_concurrency_per_node: ${{parent.inputs.max_concurrency_per_node}}
1254+
parallel_step_timeout_in_seconds: ${{parent.inputs.parallel_step_timeout_in_seconds}}
12511255
retrain_failed_model: ${{parent.inputs.retrain_failed_model}}
12521256
outputs:
12531257
run_output:
12541258
type: uri_folder
12551259
1256-
12571260
# Configure the inference node to make rolling forecasts on the test set
12581261
mm_inference_node:
12591262
type: command
1260-
component: azureml://registries/azureml-preview/components/automl_many_models_inference@latest
1263+
component: azureml://registries/azureml-preview/components/automl_many_models_inference
12611264
inputs:
12621265
raw_data: ${{parent.inputs.test_data_input}}
1263-
max_concurrency_per_instance: ${{parent.inputs.max_concurrency_per_instance}}
1264-
prs_step_timeout: ${{parent.inputs.prs_step_timeout}}
1266+
max_concurrency_per_node: ${{parent.inputs.max_concurrency_per_node}}
1267+
parallel_step_timeout_in_seconds: ${{parent.inputs.parallel_step_timeout_in_seconds}}
12651268
forecast_mode: ${{parent.inputs.forecast_mode}}
12661269
forecast_step: 1
1267-
instance_count: ${{parent.inputs.instance_count}}
1270+
max_nodes: ${{parent.inputs.max_nodes}}
12681271
optional_train_metadata: ${{parent.jobs.mm_train_node.outputs.run_output}}
12691272
outputs:
12701273
run_output:
@@ -1277,7 +1280,7 @@ jobs:
12771280
# Configure the metrics calculation node
12781281
compute_metrics:
12791282
type: command
1280-
component: azureml://registries/azureml/components/compute_metrics@latest
1283+
component: azureml://registries/azureml/components/compute_metrics
12811284
inputs:
12821285
task: "tabular-forecasting"
12831286
ground_truth: ${{parent.jobs.mm_inference_node.outputs.evaluation_data}}
@@ -1290,7 +1293,7 @@ jobs:
12901293
You launch the pipeline job with the following command, assuming the many models pipeline configuration is at the path `./automl-mm-forecasting-pipeline.yml`:
12911294

12921295
```azurecli
1293-
az ml job create --file automl-mm-forecasting-pipeline.yml
1296+
az ml job create --file automl-mm-forecasting-pipeline.yml -w <Workspace> -g <Resource Group> --subscription <Subscription>
12941297
```
12951298

12961299
---
@@ -1359,9 +1362,9 @@ Parameter|Description
13591362
--|--
13601363
**forecast_level** | The level of the hierarchy to retrieve forecasts for
13611364
**allocation_method** | Allocation method to use when forecasts are disaggregated. Valid values are `"proportions_of_historical_average"` and `"average_historical_proportions"`.
1362-
**instance_count** | Number of compute nodes to use in the training job
1363-
**max_concurrency_per_instance** | Number of AutoML processes to run on each node. Hence, the total concurrency of a HTS job is `instance_count * max_concurrency_per_instance`.
1364-
**prs_step_timeout_in_seconds** | Many models component timeout given in number of seconds.
1365+
**max_nodes** | Number of compute nodes to use in the training job
1366+
**max_concurrency_per_node** | Number of AutoML processes to run on each node. Hence, the total concurrency of a HTS job is `max_nodes * max_concurrency_per_node`.
1367+
**parallel_step_timeout_in_seconds** | Many models component timeout given in number of seconds.
13651368
**forecast_mode** | Inference mode for model evaluation. Valid values are `"recursive"` and "`rolling`". See the [model evaluation article](concept-automl-forecasting-evaluation.md) for more information.
13661369

13671370
# [Python SDK](#tab/python)
@@ -1404,25 +1407,25 @@ def hts_train_evaluate_factory(
14041407
train_data_input,
14051408
test_data_input,
14061409
automl_config_input,
1407-
max_concurrency_per_instance=4,
1408-
prs_step_timeout=3700,
1409-
instance_count=4,
1410+
max_concurrency_per_node=4,
1411+
parallel_step_timeout_in_seconds=3700,
1412+
max_nodes=4,
14101413
forecast_mode="rolling",
14111414
forecast_level="SKU",
14121415
allocation_method='proportions_of_historical_average'
14131416
):
14141417
hts_train = hts_train_component(
14151418
raw_data=train_data_input,
14161419
automl_config=automl_config_input,
1417-
max_concurrency_per_instance=max_concurrency_per_instance,
1418-
prs_step_timeout_in_seconds=prs_step_timeout,
1419-
instance_count=instance_count
1420+
max_concurrency_per_node=max_concurrency_per_node,
1421+
parallel_step_timeout_in_seconds=parallel_step_timeout_in_seconds,
1422+
max_nodes=max_nodes
14201423
)
14211424
hts_inference = hts_inference_component(
14221425
raw_data=test_data_input,
1423-
instance_count=instance_count,
1424-
max_concurrency_per_instance=max_concurrency_per_instance,
1425-
prs_step_timeout=prs_step_timeout,
1426+
max_nodes=max_nodes,
1427+
max_concurrency_per_node=max_concurrency_per_node,
1428+
parallel_step_timeout_in_seconds=parallel_step_timeout_in_seconds,
14261429
optional_train_metadata=hts_train.outputs.run_output,
14271430
forecast_level=forecast_level,
14281431
allocation_method=allocation_method,
@@ -1492,9 +1495,9 @@ inputs:
14921495
automl_config_input:
14931496
type: uri_file
14941497
path: "./automl_settings_hts.yml"
1495-
max_concurrency_per_instance: 4
1496-
prs_step_timeout: 3700
1497-
instance_count: 4
1498+
max_concurrency_per_node: 4
1499+
parallel_step_timeout_in_seconds: 3700
1500+
max_nodes: 4
14981501
forecast_mode: rolling
14991502
allocation_method: proportions_of_historical_average
15001503
forecast_level: # forecast level
@@ -1510,13 +1513,13 @@ jobs:
15101513
# Configure AutoML many models training component
15111514
hts_train_node:
15121515
type: command
1513-
component: azureml://registries/azureml-preview/components/automl_hts_training@latest
1516+
component: azureml://registries/azureml-preview/components/automl_hts_training
15141517
inputs:
15151518
raw_data: ${{parent.inputs.train_data_input}}
15161519
automl_config: ${{parent.inputs.automl_config_input}}
1517-
instance_count: ${{parent.inputs.instance_count}}
1518-
max_concurrency_per_instance: ${{parent.inputs.max_concurrency_per_instance}}
1519-
prs_step_timeout: ${{parent.inputs.prs_step_timeout}}
1520+
max_nodes: ${{parent.inputs.max_nodes}}
1521+
max_concurrency_per_node: ${{parent.inputs.max_concurrency_per_node}}
1522+
parallel_step_timeout_in_seconds: ${{parent.inputs.parallel_step_timeout_in_seconds}}
15201523
outputs:
15211524
run_output:
15221525
type: uri_folder
@@ -1525,14 +1528,14 @@ jobs:
15251528
# Configure the inference node to make rolling forecasts on the test set
15261529
hts_inference_node:
15271530
type: command
1528-
component: azureml://registries/azureml-preview/components/automl_hts_inference@latest
1531+
component: azureml://registries/azureml-preview/components/automl_hts_inference
15291532
inputs:
15301533
raw_data: ${{parent.inputs.test_data_input}}
1531-
max_concurrency_per_instance: ${{parent.inputs.max_concurrency_per_instance}}
1532-
prs_step_timeout: ${{parent.inputs.prs_step_timeout}}
1534+
max_concurrency_per_node: ${{parent.inputs.max_concurrency_per_node}}
1535+
parallel_step_timeout_in_seconds: ${{parent.inputs.parallel_step_timeout_in_seconds}}
15331536
forecast_mode: ${{parent.inputs.forecast_mode}}
15341537
forecast_step: 1
1535-
instance_count: ${{parent.inputs.instance_count}}
1538+
max_nodes: ${{parent.inputs.max_nodes}}
15361539
optional_train_metadata: ${{parent.jobs.hts_train_node.outputs.run_output}}
15371540
forecast_level: ${{parent.inputs.forecast_level}}
15381541
allocation_method: ${{parent.inputs.allocation_method}}
@@ -1547,7 +1550,7 @@ jobs:
15471550
# Configure the metrics calculation node
15481551
compute_metrics:
15491552
type: command
1550-
component: azureml://registries/azureml/components/compute_metrics@latest
1553+
component: azureml://registries/azureml/components/compute_metrics
15511554
inputs:
15521555
task: "tabular-forecasting"
15531556
ground_truth: ${{parent.jobs.hts_inference_node.outputs.evaluation_data}}
@@ -1560,7 +1563,7 @@ jobs:
15601563
You launch the pipeline job with the following command, assuming the many models pipeline configuration is at the path `./automl-hts-forecasting-pipeline.yml`:
15611564

15621565
```azurecli
1563-
az ml job create --file automl-hts-forecasting-pipeline.yml
1566+
az ml job create --file automl-hts-forecasting-pipeline.yml -w <Workspace> -g <Resource Group> --subscription <Subscription>
15641567
```
15651568

15661569
---

articles/machine-learning/how-to-configure-auto-train.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -471,7 +471,7 @@ returned_job.services["Studio"].endpoint
471471
In following CLI command, we assume the job YAML configuration is at the path, `./automl-classification-job.yml`:
472472

473473
```azurecli
474-
run_id=$(az ml job create --file automl-classification-job.yml)
474+
run_id=$(az ml job create --file automl-classification-job.yml -w <Workspace> -g <Resource Group> --subscription <Subscription>)
475475
```
476476

477477
You can use the stored run ID to return information about the job. The `--web` parameter opens the Azure Machine Learning studio web UI where you can drill into details on the job:
@@ -629,7 +629,7 @@ jobs:
629629
Now, you launch the pipeline run using the following command, assuming the pipeline configuration is at the path `./automl-classification-pipeline.yml`:
630630

631631
```azurecli
632-
> run_id=$(az ml job create --file automl-classification-pipeline.yml)
632+
> run_id=$(az ml job create --file automl-classification-pipeline.yml -w <Workspace> -g <Resource Group> --subscription <Subscription>)
633633
> az ml job show -n $run_id --web
634634
```
635635

0 commit comments

Comments
 (0)