Skip to content

Commit bda75d0

Browse files
Merge pull request #576 from GitHubber17/300645-aa
Freshness - Machine Learning how-to and concepts
2 parents 5bad00a + cb11ded commit bda75d0

File tree

4 files changed

+5
-9
lines changed

4 files changed

+5
-9
lines changed

articles/machine-learning/concept-automl-forecasting-methods.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -92,7 +92,7 @@ where $H$ is the forecast horizon, $l_{\text{max}}$ is the maximum lag order, an
9292

9393
$T_{\text{CV}} = 2H + (n_{\text{CV}} - 1) n_{\text{step}} + \text{max}(l_{\text{max}}, s_{\text{window}}) + 1$,
9494

95-
where $n_{\text{CV}}$ is the number of cross-validation folds and $n_{\text{step}}$ is the CV step size, or offset between CV folds. The basic logic behind these formulas is that you should always have at least a horizon of training observations for each time series, including some padding for lags and cross-validation splits. See [forecasting model selection](./concept-automl-forecasting-sweeping.md#model-selection) for more details on cross-validation for forecasting.
95+
where $n_{\text{CV}}$ is the number of cross-validation folds and $n_{\text{step}}$ is the CV step size, or offset between CV folds. The basic logic behind these formulas is that you should always have at least a horizon of training observations for each time series, including some padding for lags and cross-validation splits. See [forecasting model selection](./concept-automl-forecasting-sweeping.md#model-selection-in-automl) for more details on cross-validation for forecasting.
9696

9797
### Missing data handling
9898
AutoML's time series models require regularly spaced observations in time. Regularly spaced, here, includes cases like monthly or yearly observations where the number of days between observations may vary. Prior to modeling, AutoML must ensure there are no missing series values _and_ that the observations are regular. Hence, there are two missing data cases:

articles/machine-learning/concept-automl-forecasting-sweeping.md

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ ms.service: azure-machine-learning
1010
ms.subservice: automl
1111
ms.topic: concept-article
1212
ms.custom: automl, sdkv1
13-
ms.date: 09/25/2024
13+
ms.date: 10/01/2024
1414

1515
#customer intent: As a developer, I want to use AutoML in Azure Machine Learning, so I can search for (sweep) and select forecasting models.
1616
---
@@ -19,8 +19,6 @@ ms.date: 09/25/2024
1919

2020
This article describes how automated machine learning (AutoML) in Azure Machine Learning searches for and selects forecasting models. If you're interested in learning more about the forecasting methodology in AutoML, see [Overview of forecasting methods in AutoML](concept-automl-forecasting-methods.md). To explore training examples for forecasting models in AutoML, see [Set up AutoML to train a time-series forecasting model with the SDK and CLI](how-to-auto-train-forecast.md).
2121

22-
<a name="model-sweeping"></a>
23-
2422
## Model sweeping in AutoML
2523

2624
The central task for AutoML is to train and evaluate several models and choose the best one with respect to the given primary metric. The word "model" in this case refers to both the model class, such as ARIMA or Random Forest, and the specific hyper-parameter settings that distinguish models within a class. For instance, ARIMA refers to a class of models that share a mathematical template and a set of statistical assumptions. Training, or _fitting_, an ARIMA model requires a list of positive integers that specify the precise mathematical form of the model. These values are the hyper-parameters. The models ARIMA(1, 0, 1) and ARIMA(2, 1, 2) have the same class, but different hyper-parameters. These definitions can be separately fit with the training data and evaluated against each other. AutoML searches, or _sweeps_, over different model classes and within classes by varying the hyper-parameters.
@@ -41,8 +39,6 @@ For a description of the different model types, see the [Forecasting models in A
4139

4240
The amount of sweeping by AutoML depends on the forecasting job configuration. You can specify the stopping criteria as a time limit or a limit on the number of trials, or the equivalent number of models. Early termination logic can be used in both cases to stop sweeping if the primary metric isn't improving.
4341

44-
<a name="model-selection"></a>
45-
4642
## Model selection in AutoML
4743

4844
AutoML follows a three-phase process to search for and select forecasting models:

articles/machine-learning/how-to-auto-train-forecast.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -131,7 +131,7 @@ Add more detail to this configuration in subsequent sections of this how-to guid
131131

132132
---
133133

134-
You specify [validation data](concept-automated-ml.md#training-validation-and-test-data) in a similar way. Create a MLTable and specify a validation data input. Alternatively, if you don't supply validation data, AutoML automatically creates cross-validation splits from your training data to use for model selection. For more information, see [forecasting model selection](./concept-automl-forecasting-sweeping.md#model-selection). For more information about how much training data you need, see [training data length requirements](./concept-automl-forecasting-methods.md#data-length-requirements).
134+
You specify [validation data](concept-automated-ml.md#training-validation-and-test-data) in a similar way. Create a MLTable and specify a validation data input. Alternatively, if you don't supply validation data, AutoML automatically creates cross-validation splits from your training data to use for model selection. For more information, see [forecasting model selection](./concept-automl-forecasting-sweeping.md#model-selection-in-automl). For more information about how much training data you need, see [training data length requirements](./concept-automl-forecasting-methods.md#data-length-requirements).
135135

136136
Learn more about how AutoML applies cross validation to [prevent over fitting](concept-manage-ml-pitfalls.md#prevent-overfitting).
137137

@@ -587,7 +587,7 @@ forecasting:
587587

588588
#### Custom cross-validation settings
589589

590-
There are two customizable settings that control cross-validation for forecasting jobs: the number of folds, `n_cross_validations`, and the step size defining the time offset between folds, `cv_step_size`. For more information on the meaning of these parameters, see [forecasting model selection](./concept-automl-forecasting-sweeping.md#model-selection).
590+
There are two customizable settings that control cross-validation for forecasting jobs: the number of folds, `n_cross_validations`, and the step size defining the time offset between folds, `cv_step_size`. For more information on the meaning of these parameters, see [forecasting model selection](./concept-automl-forecasting-sweeping.md#model-selection-in-automl).
591591

592592
By default, AutoML sets both settings automatically based on characteristics of your data. Advanced users might want to set them manually. For example, suppose you have daily sales data and you want your validation setup to consist of five folds with a seven-day offset between adjacent folds. The following code sample shows how to set these values:
593593

articles/machine-learning/how-to-automl-forecasting-faq.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -116,7 +116,7 @@ To choose between them, note that NRMSE penalizes outliers in the training data
116116

117117
## Will AutoML always select the same best model from the same training data and configuration?
118118

119-
[AutoML's model search process](./concept-automl-forecasting-sweeping.md#model-sweeping) is not deterministic, so it doesn't always select the same model from the same data and configuration.
119+
[AutoML's model search process](./concept-automl-forecasting-sweeping.md#model-sweeping-in-automl) is not deterministic, so it doesn't always select the same model from the same data and configuration.
120120

121121
## How do I fix an out-of-memory error?
122122

0 commit comments

Comments
 (0)