You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/machine-learning/concept-automated-ml.md
+60-51Lines changed: 60 additions & 51 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -18,6 +18,8 @@ Automated machine learning, also referred to as automated ML or AutoML, is the p
18
18
19
19
Traditional machine learning model development is resource-intensive, requiring significant domain knowledge and time to produce and compare dozens of models. With automated machine learning, you'll accelerate the time it takes to get production-ready ML models with great ease and efficiency.
20
20
21
+
<aname="parity"></a>
22
+
21
23
## Ways to use AutoML in Azure Machine Learning
22
24
23
25
Azure Machine Learning offers the following two experiences for working with automated ML. See the following sections to understand [feature availability in each experience](#parity).
@@ -28,10 +30,6 @@ Azure Machine Learning offers the following two experiences for working with aut
28
30
*[Tutorial: Create a classification model with automated ML in Azure Machine Learning](tutorial-first-experiment-automated-ml.md).
29
31
*[Tutorial: Forecast demand with automated machine learning](tutorial-automated-ml-forecast.md)
30
32
31
-
<aname="parity"></a>
32
-
33
-
## AutoML settings and configuration
34
-
35
33
### Experiment settings
36
34
37
35
The following settings allow you to configure your automated ML experiment.
@@ -65,7 +63,7 @@ These settings can be applied to the best model as a result of your automated ML
|**Show best model based on non-primary metric**|✓||
67
65
|**Enable/disable ONNX model compatibility**|✓||
68
-
|**Test the model**| ✓||
66
+
|**Test the model**| ✓|✓ (preview)|
69
67
70
68
### Run control settings
71
69
@@ -183,9 +181,65 @@ You can also inspect the logged run information, which [contains metrics](how-to
183
181
184
182
While model building is automated, you can also [learn how important or relevant features are](how-to-configure-auto-train.md#explain) to the generated models.
## Guidance on local vs. remote managed ML compute targets
189
+
190
+
The web interface for automated ML always uses a remote [compute target](concept-compute-target.md). But when you use the Python SDK, you will choose either a local compute or a remote compute target for automated ML training.
191
+
192
+
***Local compute**: Training occurs on your local laptop or VM compute.
193
+
***Remote compute**: Training occurs on Machine Learning compute clusters.
194
+
195
+
### Choose compute target
196
+
Consider these factors when choosing your compute target:
197
+
198
+
***Choose a local compute**: If your scenario is about initial explorations or demos using small data and short trains (i.e. seconds or a couple of minutes per child run), training on your local computer might be a better choice. There is no setup time, the infrastructure resources (your PC or VM) are directly available.
199
+
***Choose a remote ML compute cluster**: If you are training with larger datasets like in production training creating models which need longer trains, remote compute will provide much better end-to-end time performance because `AutoML` will parallelize trains across the cluster's nodes. On a remote compute, the start-up time for the internal infrastructure will add around 1.5 minutes per child run, plus additional minutes for the cluster infrastructure if the VMs are not yet up and running.
200
+
201
+
### Pros and cons
202
+
Consider these pros and cons when choosing to use local vs. remote.
203
+
204
+
|| Pros (Advantages) |Cons (Handicaps) |
205
+
|---------|---------|---------|---------|
206
+
|**Local compute target**| <li> No environment start-up time | <li> Subset of features<li> Can't parallelize runs <li> Worse for large data. <li>No data streaming while training <li> No DNN-based featurization <li> Python SDK only |
207
+
|**Remote ML compute clusters**| <li> Full set of features <li> Parallelize child runs <li> Large data support<li> DNN-based featurization <li> Dynamic scalability of compute cluster on demand <li> No-code experience (web UI) also available | <li> Start-up time for cluster nodes <li> Start-up time for each child run |
208
+
209
+
### Feature availability
210
+
211
+
More features are available when you use the remote compute, as shown in the table below.
| Data streaming (Large data support, up to 100 GB) | ✓ ||
216
+
| DNN-BERT-based text featurization and training | ✓ ||
217
+
| Out-of-the-box GPU support (training and inference) | ✓ ||
218
+
| Image Classification and Labeling support | ✓ ||
219
+
| Auto-ARIMA, Prophet and ForecastTCN models for forecasting | ✓ ||
220
+
| Multiple runs/iterations in parallel | ✓ ||
221
+
| Create models with interpretability in AutoML studio web experience UI | ✓ ||
222
+
| Feature engineering customization in studio web experience UI| ✓ ||
223
+
| Azure ML hyperparameter tuning | ✓ ||
224
+
| Azure ML Pipeline workflow support | ✓ ||
225
+
| Continue a run | ✓ ||
226
+
| Forecasting | ✓ | ✓ |
227
+
| Create and run experiments in notebooks | ✓ | ✓ |
228
+
| Register and visualize experiment's info and metrics in UI | ✓ | ✓ |
229
+
| Data guardrails | ✓ | ✓ |
230
+
231
+
## Training, validation and test data
232
+
233
+
With automated ML you provide the **training data** to train ML models, and you can specify what type of model validation to perform. Automated ML performs model validation as part of training. That is, automated ML uses **validation data** to tune model hyperparameters based on the applied algorithm to find the best combination that best fits the training data. However, the same validation data is used for each iteration of tuning, which introduces model evaluation bias since the model continues to improve and fit to the validation data.
234
+
235
+
To help confirm that such bias isn't applied to the final recommended model, automated ML supports the use of **test data** to evaluate the final model that automated ML recommends at the end of your experiment. When you provide test data as part of your AutoML experiment configuration, this recommended model is tested by default at the end of your experiment (preview).
236
+
237
+
>[!IMPORTANT]
238
+
> Testing your models with a test dataset to evaluate generated models is a preview feature. This capability is an [experimental](/python/api/overview/azure/ml/#stable-vs-experimental) preview feature, and may change at any time.
239
+
240
+
Learn how to [configure AutoML experiments to use test data (preview) with the SDK](how-to-configure-cross-validation-data-splits.md#provide-test-data-preview) or with the [Azure Machine Learning studio](how-to-use-automated-ml-for-ml-models.md#create-and-run-experiment).
241
+
242
+
You can also [test any existing automated ML model (preview)](how-to-configure-auto-train.md#test-existing-automated-ml-model)), including models from child runs, by providing your own test data or by setting aside a portion of your training data.
189
243
190
244
## Feature engineering
191
245
@@ -234,51 +288,6 @@ The [Caruana ensemble selection algorithm](http://www.niculescu-mizil.org/papers
234
288
235
289
See the [how-to](how-to-configure-auto-train.md#ensemble) for changing default ensemble settings in automated machine learning.
236
290
237
-
## <aname="local-remote"></a>Guidance on local vs. remote managed ML compute targets
238
-
239
-
The web interface for automated ML always uses a remote [compute target](concept-compute-target.md). But when you use the Python SDK, you will choose either a local compute or a remote compute target for automated ML training.
240
-
241
-
***Local compute**: Training occurs on your local laptop or VM compute.
242
-
***Remote compute**: Training occurs on Machine Learning compute clusters.
243
-
244
-
### Choose compute target
245
-
Consider these factors when choosing your compute target:
246
-
247
-
***Choose a local compute**: If your scenario is about initial explorations or demos using small data and short trains (i.e. seconds or a couple of minutes per child run), training on your local computer might be a better choice. There is no setup time, the infrastructure resources (your PC or VM) are directly available.
248
-
***Choose a remote ML compute cluster**: If you are training with larger datasets like in production training creating models which need longer trains, remote compute will provide much better end-to-end time performance because `AutoML` will parallelize trains across the cluster's nodes. On a remote compute, the start-up time for the internal infrastructure will add around 1.5 minutes per child run, plus additional minutes for the cluster infrastructure if the VMs are not yet up and running.
249
-
250
-
### Pros and cons
251
-
Consider these pros and cons when choosing to use local vs. remote.
252
-
253
-
|| Pros (Advantages) |Cons (Handicaps) |
254
-
|---------|---------|---------|
255
-
|**Local compute target**| <li> No environment start-up time | <li> Subset of features<li> Can't parallelize runs <li> Worse for large data. <li>No data streaming while training <li> No DNN-based featurization <li> Python SDK only |
256
-
|**Remote ML compute clusters**| <li> Full set of features <li> Parallelize child runs <li> Large data support<li> DNN-based featurization <li> Dynamic scalability of compute cluster on demand <li> No-code experience (web UI) also available | <li> Start-up time for cluster nodes <li> Start-up time for each child run |
257
-
258
-
### Feature availability
259
-
260
-
More features are available when you use the remote compute, as shown in the table below.
You can specify separate **training data and validation data sets** directly in the `AutoMLConfig` constructor. Learn more about [how to configure data splits and cross validation](how-to-configure-cross-validation-data-splits.md) for your AutoML experiments.
89
+
You can specify separate **training data and validation data sets** directly in the `AutoMLConfig` constructor. Learn more about [how to configure training, validation, cross validation, and test data](how-to-configure-cross-validation-data-splits.md) for your AutoML experiments.
90
90
91
91
If you do not explicitly specify a `validation_data` or `n_cross_validation` parameter, automated ML applies default techniques to determine how validation is performed. This determination depends on the number of rows in the dataset assigned to your `training_data` parameter.
92
92
@@ -95,7 +95,15 @@ If you do not explicitly specify a `validation_data` or `n_cross_validation` par
95
95
|**Larger than 20,000 rows**| Train/validation data split is applied. The default is to take 10% of the initial training data set as the validation set. In turn, that validation set is used for metrics calculation.
96
96
|**Smaller than 20,000 rows**| Cross-validation approach is applied. The default number of folds depends on the number of rows. <br> **If the dataset is less than 1,000 rows**, 10 folds are used. <br> **If the rows are between 1,000 and 20,000**, then three folds are used.
97
97
98
-
At this time, you need to provide your own **test data** for model evaluation. For a code example of bringing your own test data for model evaluation see the **Test** section of [this Jupyter notebook](https://github.com/Azure/azureml-examples/blob/main/python-sdk/tutorials/automl-with-azureml/classification-credit-card-fraud/auto-ml-classification-credit-card-fraud.ipynb).
98
+
99
+
> [!TIP]
100
+
> You can upload **test data (preview)** to evaluate models that automated ML generated for you. These features are [experimental](/python/api/overview/azure/ml/#stable-vs-experimental) preview capabilities, and may change at any time.
101
+
> Learn how to:
102
+
> *[Pass in test data to your AutoMLConfig object](how-to-configure-cross-validation-data-splits.md#provide-test-data-preview).
103
+
> *[Test the models automated ML generated for your experiment](#test-models-preview).
104
+
>
105
+
> If you prefer a no-code experience, see [step 11 in Set up AutoML with the studio UI](how-to-use-automated-ml-for-ml-models.md#create-and-run-experiment)
106
+
99
107
100
108
### Large data
101
109
@@ -509,9 +517,59 @@ RunDetails(run).show()
509
517
510
518

511
519
520
+
## Test models (preview)
521
+
522
+
>[!IMPORTANT]
523
+
> Testing your models with a test dataset to evaluate automated ML generated models is a preview feature. This capability is an [experimental](/python/api/overview/azure/ml/#stable-vs-experimental) preview feature, and may change at any time.
524
+
525
+
Passing the `test_data` or `test_size` parameters into the `AutoMLConfig`, automatically triggers a remote test run that uses the provided test data to evaluate the best model that automated ML recommends upon completion of the experiment. This remote test run is done at the end of the experiment, once the best model is determined. See how to [pass test data into your `AutoMLConfig`](how-to-configure-cross-validation-data-splits.md#provide-test-data-preview).
526
+
527
+
### Get test run results
528
+
529
+
You can get the predictions and metrics from the remote test run from the [Azure Machine Learning studio](how-to-use-automated-ml-for-ml-models.md#view-remote-test-run-results-preview) or with the following code.
To test other existing automated ML models created, best run or child run, use [`ModelProxy()`](/python/api/azureml-train-automl-client/azureml.train.automl.model_proxy.modelproxy) to test a model after the main AutoML run has completed. `ModelProxy()` already returns the predictions and metrics and does not require further processing to retrieve the outputs.
556
+
557
+
> [!NOTE]
558
+
> ModelProxy is an [experimental](/python/api/overview/azure/ml/#stable-vs-experimental) preview class, and may change at any time.
559
+
560
+
The following code demonstrates how to test a model from any run by using [ModelProxy.test()](/python/api/azureml-train-automl-client/azureml.train.automl.model_proxy.modelproxy#test-test-data--azureml-data-abstract-dataset-abstractdataset--include-predictions-only--bool---false-----typing-tuple-azureml-data-abstract-dataset-abstractdataset--typing-dict-str--typing-any--) method. In the test() method you have the option to specify if you only want to see the predictions of the test run with the `include_predictions_only` parameter.
561
+
562
+
```python
563
+
from azureml.train.automl.model_proxy import ModelProxy
You can register a model, so you can come back to it for later use.
572
+
After you test a model and confirm you want to use it in production, you can register it for later use and
515
573
516
574
To register a model from an automated ML run, use the [`register_model()`](/python/api/azureml-train-automl-client/azureml.train.automl.run.automlrun#register-model-model-name-none--description-none--tags-none--iteration-none--metric-none-) method.
0 commit comments