You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/machine-learning/concept-automl-forecasting-evaluation.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -75,7 +75,7 @@ Evaluation is the process of generating predictions on a test set held-out from
75
75
76
76
The following diagram shows a simple example with three forecasting windows:
77
77
78
-
:::image type="content" source="media/concept-automl-forecasting-evaluation/rolling-eval-diagram.png" alt-text="Diagram demonstrating a rolling forecast on a test set.":::
78
+
:::image type="content" source="media/concept-automl-forecasting-evaluation/rolling-evaluation-diagram.png" alt-text="Diagram demonstrating a rolling forecast on a test set.":::
79
79
80
80
The diagram illustrates three rolling evaluation parameters:
81
81
@@ -85,7 +85,7 @@ The diagram illustrates three rolling evaluation parameters:
85
85
86
86
Importantly, the context advances along with the forecasting window. This means that actual values from the test set are used to make forecasts when they fall within the current context window. The latest date of actual values used for a given forecast window is called the **origin time** of the window. The following table shows an example output from the three-window rolling forecast with a horizon of three days and a step size of one day:
87
87
88
-
:::image type="content" source="media/concept-automl-forecasting-evaluation/rolling-eval-table.png" alt-text="Example output table from a rolling forecast.":::
88
+
:::image type="content" source="media/concept-automl-forecasting-evaluation/rolling-evaluation-table.png" alt-text="Example output table from a rolling forecast.":::
89
89
90
90
With a table like this, we can visualize the forecasts vs. the actuals and compute desired evaluation metrics. AutoML pipelines can generate rolling forecasts on a test set with an [inference component](how-to-auto-train-forecast.md#orchestrating-training-inference-and-evaluation-with-components-and-pipelines).
91
91
@@ -99,7 +99,7 @@ The choice of evaluation summary or metric is usually driven by the specific bus
99
99
* Plots of observed target values vs. forecasted values to check that certain dynamics of the data are captured by the model,
100
100
* MAPE (mean absolute percentage error) between actual and forecasted values,
101
101
* RMSE (root mean squared error), possibly with a normalization, between actual and forecasted values,
102
-
* MAE (mean absolute error), possibly with a normalization, between actual and forecasted values.
102
+
* MAE (mean absolute error), possibly with a normalization, between actual and forecasted values.
103
103
104
104
There are many other possibilities, depending on the business scenario. You may need to create your own post-processing utilities for computing evaluation metrics from inference results or rolling forecasts. For more information on metrics, see our [regression and forecasting metrics](how-to-understand-automated-ml.md#regressionforecasting-metrics) article section.
Copy file name to clipboardExpand all lines: articles/machine-learning/how-to-auto-train-forecast.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1043,7 +1043,7 @@ The many models training component accepts a YAML format configuration file of A
1043
1043
1044
1044
Parameter|Description
1045
1045
--|--
1046
-
| **partition_column_names** | Column names in the data that, when grouped, define the data partitions. Many models launches an independent training job on each partition.
1046
+
| **partition_column_names** | Column names in the data that, when grouped, define the data partitions. The many models training component launches an independent training job on each partition.
1047
1047
| **allow_multi_partitions** | An optional flag that allows training one model per partition when each partition contains more than one unique time series. The default value is False.
1048
1048
1049
1049
The following sample provides a configuration template:
@@ -1367,7 +1367,7 @@ Parameter|Description
1367
1367
**forecast_level** | The level of the hierarchy to retrieve forecasts for
1368
1368
**allocation_method** | Allocation method to use when forecasts are disaggregated. Valid values are `"proportions_of_historical_average"` and `"average_historical_proportions"`.
1369
1369
**max_nodes** | Number of compute nodes to use in the training job
1370
-
**max_concurrency_per_node** | Number of AutoML processes to run on each node. Hence, the total concurrency of a HTS job is `max_nodes * max_concurrency_per_node`.
1370
+
**max_concurrency_per_node** | Number of AutoML processes to run on each node. Hence, the total concurrency of an HTS job is `max_nodes * max_concurrency_per_node`.
1371
1371
**parallel_step_timeout_in_seconds** | Many models component timeout given in number of seconds.
1372
1372
**forecast_mode** | Inference mode for model evaluation. Valid values are `"recursive"` and "`rolling`". See the [model evaluation article](concept-automl-forecasting-evaluation.md) for more information.
1373
1373
**forecast_step** | Step size for rolling forecast. See the [model evaluation article](concept-automl-forecasting-evaluation.md) for more information.
Copy file name to clipboardExpand all lines: articles/machine-learning/how-to-configure-auto-train.md
+7-7Lines changed: 7 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -352,7 +352,7 @@ The recommendations are similar to those noted for regression scenarios.
352
352
353
353
### Data featurization
354
354
355
-
In every automated ML experiment, your data is automatically transformed to numbers and vectors of numbers plus (i.e. converting text to numeric) also scaled and normalized to help *certain* algorithms that are sensitive to features that are on different scales. This data transformation, scaling and normalization is referred to as featurization.
355
+
In every automated ML experiment, your data is automatically transformed to numbers and vectors of numbers and also scaled and normalized to help algorithms that are sensitive to features that are on different scales. These data transformations are called _featurization_.
356
356
357
357
> [!NOTE]
358
358
> Automated machine learning featurization steps (feature normalization, handling missing data, converting text to numeric, etc.) become part of the underlying model. When using the model for predictions, the same featurization steps applied during training are applied to your input data automatically.
@@ -482,11 +482,11 @@ az ml job show -n $run_id --web
482
482
483
483
---
484
484
485
-
### Multiple child runs on clusters
485
+
### Multiple child runs on a cluster
486
486
487
487
Automated ML experiment child runs can be performed on a cluster that is already running another experiment. However, the timing depends on how many nodes the cluster has, and if those nodes are available to run a different experiment.
488
488
489
-
Each node in the cluster acts as an individual virtual machine (VM) that can accomplish a single training run; for automated ML this means a child run. If all the nodes are busy, the new experiment is queued. But if there are free nodes, the new experiment will run automated ML child runs in parallel in the available nodes/VMs.
489
+
Each node in the cluster acts as an individual virtual machine (VM) that can accomplish a single training run; for automated ML this means a child run. If all the nodes are busy, a new experiment is queued. But if there are free nodes, the new experiment will run automated ML child runs in parallel in the available nodes/VMs.
490
490
491
491
To help manage child runs and when they can be performed, we recommend you create a dedicated cluster per experiment, and match the number of `max_concurrent_iterations` of your experiment to the number of nodes in the cluster. This way, you use all the nodes of the cluster at the same time with the number of concurrent child runs/iterations you want.
492
492
@@ -658,7 +658,7 @@ Property | Description
658
658
training_mode | Indicates training mode; `distributed` or `non_distributed`. Defaults to `non_distributed`.
659
659
max_nodes | The number of nodes to use for training by each AutoML trial. This setting must be greater than or equal to 4.
660
660
661
-
The following code samples shows an example of these settings for a classification job:
661
+
The following code sample shows an example of these settings for a classification job:
662
662
663
663
# [Python SDK](#tab/python)
664
664
@@ -698,16 +698,16 @@ limits:
698
698
699
699
### Distributed training for forecasting
700
700
701
-
To learn how distributed training works for forecasting tasks, see our [forecasting at scale](concept-automl-forecasting-at-scale.md#distributed-dnn-training) article. To use distributed training for forecasting, you need to set set the `training_mode`, `enable_dnn_training`, `max_nodes`, and optionally the `max_concurrent_trials` properties of the job object.
701
+
To learn how distributed training works for forecasting tasks, see our [forecasting at scale](concept-automl-forecasting-at-scale.md#distributed-dnn-training) article. To use distributed training for forecasting, you need to set the `training_mode`, `enable_dnn_training`, `max_nodes`, and optionally the `max_concurrent_trials` properties of the job object.
702
702
703
703
Property | Description
704
704
-- | --
705
705
training_mode | Indicates training mode; `distributed` or `non_distributed`. Defaults to `non_distributed`.
706
706
enable_dnn_training | Flag to enable deep neural network models.
707
707
max_concurrent_trials | This is the maximum number of trial models to train in parallel. Defaults to 1.
708
-
max_nodes | The total number of nodes to use for training. This setting must be greater than or equal to 2. For forecasting, each trial model is trained using $\text{max}\left(2, \text{floor}( \text{max\_nodes} / \text{max\_concurrent\_trials}) \right)$ nodes.
708
+
max_nodes | The total number of nodes to use for training. This setting must be greater than or equal to 2. For forecasting tasks, each trial model is trained using $\text{max}\left(2, \text{floor}( \text{max\_nodes} / \text{max\_concurrent\_trials}) \right)$ nodes.
709
709
710
-
The following code samples shows an example of these settings for a forecasting job:
710
+
The following code sample shows an example of these settings for a forecasting job:
Copy file name to clipboardExpand all lines: articles/machine-learning/how-to-understand-automated-ml.md
+3-6Lines changed: 3 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -88,10 +88,7 @@ weighted_accuracy|Weighted accuracy is accuracy where each sample is weighted by
88
88
89
89
### Binary vs. multiclass classification metrics
90
90
91
-
Automated ML automatically detects if the data is binary and also allows users to activate binary classification metrics even if the data is multiclass by specifying a `true` class. Multiclass classification metrics is reported no matter if a dataset has two classes or more than two classes. Binary classification metrics is only reported when the data is binary, or the users activate the option.
92
-
93
-
> [!Note]
94
-
> When a binary classification task is detected, we use `numpy.unique` to find the set of labels and the later label will be used as the `true` class. Since there is a sorting procedure in `numpy.unique`, the choice of `true` class will be stable.
91
+
Automated ML automatically detects if the data is binary and also allows users to activate binary classification metrics even if the data is multiclass by specifying a `true` class. Multiclass classification metrics are reported if a dataset has two or more classes. Binary classification metrics are reported only when the data is binary.
95
92
96
93
Note, multiclass classification metrics are intended for multiclass classification. When applied to a binary dataset, these metrics don't treat any class as the `true` class, as you might expect. Metrics that are clearly meant for multiclass are suffixed with `micro`, `macro`, or `weighted`. Examples include `average_precision_score`, `f1_score`, `precision_score`, `recall_score`, and `AUC`. For example, instead of calculating recall as `tp / (tp + fn)`, the multiclass averaged recall (`micro`, `macro`, or `weighted`) averages over both classes of a binary classification dataset. This is equivalent to calculating the recall for the `true` class and the `false` class separately, and then taking the average of the two.
97
94
@@ -336,13 +333,13 @@ The Azure Machine Learning Responsible AI dashboard provides a single interface
336
333
* Machine learning interpretability
337
334
* Error analysis
338
335
339
-
While model evaluation metrics and charts are good for measuring the general quality of a model, operations such as inspecting you model’s fairness, viewing its explanations (also known as which dataset features a model used to make its predictions), inspecting its errors (what are the blindspots of the model) are essential when practicing responsible AI. That's why automated ML provides a Responsible AI dashboard to help you observe a variety of insights for your model. See how to view the Responsible AI dashboard in the [Azure Machine Learning studio.](how-to-use-automated-ml-for-ml-models.md#responsible-ai-dashboard-preview)
336
+
While model evaluation metrics and charts are good for measuring the general quality of a model, operations such as inspecting the model’s fairness, viewing its explanations (also known as which dataset features a model used to make its predictions), inspecting its errors and potential blind spots are essential when practicing responsible AI. That's why automated ML provides a Responsible AI dashboard to help you observe a variety of insights for your model. See how to view the Responsible AI dashboard in the [Azure Machine Learning studio.](how-to-use-automated-ml-for-ml-models.md#responsible-ai-dashboard-preview)
340
337
341
338
See how you can generate this [dashboard via the UI or the SDK.](how-to-responsible-ai-insights-sdk-cli.md)
342
339
343
340
## Model explanations and feature importances
344
341
345
-
While model evaluation metrics and charts are good for measuring the general quality of a model, inspecting which dataset features a model used to make its predictions is essential when practicing responsible AI. That's why automated ML provides a model explanations dashboard to measure and report the relative contributions of dataset features. See how to [view the explanations dashboard in the Azure Machine Learning studio](how-to-use-automated-ml-for-ml-models.md#responsible-ai-dashboard-preview).
342
+
While model evaluation metrics and charts are good for measuring the general quality of a model, inspecting which dataset features a model uses to make predictions is essential when practicing responsible AI. That's why automated ML provides a model explanations dashboard to measure and report the relative contributions of dataset features. See how to [view the explanations dashboard in the Azure Machine Learning studio](how-to-use-automated-ml-for-ml-models.md#responsible-ai-dashboard-preview).
346
343
347
344
> [!NOTE]
348
345
> Interpretability, best model explanation, is not available for automated ML forecasting experiments that recommend the following algorithms as the best model or ensemble:
0 commit comments