You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/machine-learning/how-to-auto-train-image-models.md
+29-29Lines changed: 29 additions & 29 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -23,7 +23,7 @@ ms.date: 07/13/2022
23
23
24
24
In this article, you learn how to train computer vision models on image data with automated ML with the Azure Machine Learning CLI extension v2 or the Azure Machine Learning Python SDK v2.
25
25
26
-
Automated ML supports model training for computer vision tasks like image classification, object detection, and instance segmentation. Authoring AutoML models for computer vision tasks is currently supported via the Azure Machine Learning Python SDK. The resulting experimentation runs, models, and outputs are accessible from the Azure Machine Learning studio UI. [Learn more about automated ml for computer vision tasks on image data](concept-automated-ml.md).
26
+
Automated ML supports model training for computer vision tasks like image classification, object detection, and instance segmentation. Authoring AutoML models for computer vision tasks is currently supported via the Azure Machine Learning Python SDK. The resulting experimentation trials, models, and outputs are accessible from the Azure Machine Learning studio UI. [Learn more about automated ml for computer vision tasks on image data](concept-automated-ml.md).
27
27
28
28
## Prerequisites
29
29
@@ -102,7 +102,7 @@ In order to generate computer vision models, you need to bring labeled image dat
102
102
If your training data isin a different format (like, pascal VOCorCOCO), you can apply the helper scripts included with the sample notebooks to convert the data to JSONL. Learn more about how to [prepare data for computer vision tasks with automated ML](how-to-prepare-datasets-for-automl-images.md).
103
103
104
104
> [!Note]
105
-
> The training data needs to have at least 10 images in order to be able to submit an AutoML run.
105
+
> The training data needs to have at least 10 images in order to be able to submit an AutoML job.
106
106
107
107
> [!Warning]
108
108
> Creation of `MLTable`from data inJSONLformatis supported using the SDKandCLI only, for this capability. Creating the `MLTable` via UIisnot supported at this time.
For computer vision tasks, you can launch either [individual runs](#individual-runs), [manual sweeps](#manually-sweeping-model-hyperparameters) or [automatic sweeps](#automatically-sweeping-model-hyperparameters-automode). We recommend starting with an automatic sweep to get a first baseline model. Then, you can try out individual runs with certain models and hyperparameter configurations. Finally, with manual sweeps you can explore multiple hyperparameter values near the more promising models and hyperparameter configurations. This three step workflow (automatic sweep, individual runs, manual sweeps) avoids searching the entirety of the hyperparameter space, which grows exponentially in the number of hyperparameters.
280
+
For computer vision tasks, you can launch either [individual trials](#individual-trials), [manual sweeps](#manually-sweeping-model-hyperparameters) or [automatic sweeps](#automatically-sweeping-model-hyperparameters-automode). We recommend starting with an automatic sweep to get a first baseline model. Then, you can try out individual trials with certain models and hyperparameter configurations. Finally, with manual sweeps you can explore multiple hyperparameter values near the more promising models and hyperparameter configurations. This three step workflow (automatic sweep, individual trials, manual sweeps) avoids searching the entirety of the hyperparameter space, which grows exponentially in the number of hyperparameters.
281
281
282
282
Automatic sweeps can yield competitive results for many datasets. Additionally, they do not require advanced knowledge of model architectures, they take into account hyperparameter correlations and they work seamlessly across different hardware setups. All these reasons make them a strong option for the early stage of your experimentation process.
283
283
284
284
### Primary metric
285
285
286
286
An AutoML training job uses a primary metric for model optimization and hyperparameter tuning. The primary metric depends on the task typeas shown below; other primary metric values are currently not supported.
287
287
288
-
* Accuracy for image classification
289
-
* Intersection over union for image classification multilabel
290
-
* Mean average precision for image object detection
291
-
* Mean average precision for image instance segmentation
*[Intersection over union]((https://scikit-learn.org/stable/modules/generated/sklearn.metrics.jaccard_score.html#sklearn.metrics.jaccard_score)) for image classification multilabel
290
+
*[Mean average precision](en-us/azure/machine-learning/how-to-understand-automated-ml#object-detection-and-instance-segmentation-metrics) for image object detection
291
+
*[Mean average precision](en-us/azure/machine-learning/how-to-understand-automated-ml#object-detection-and-instance-segmentation-metrics) for image instance segmentation
292
292
293
293
### Job limits
294
294
295
295
You can control the resources spent on your AutoML Image training job by specifying the `timeout_minutes`, `max_trials`and the `max_concurrent_trials`for the job in limit settings as described in the below example.
296
296
297
297
Parameter | Detail
298
298
-----|----
299
-
`max_trials`| Parameter for maximum number of configurations to sweep. Must be an integer between 1and1000. When exploring just the default hyperparameters for a given model architecture, set this parameter to 1. The default value is1.
300
-
`max_concurrent_trials`| Maximum number of runs that can run concurrently. If specified, must be an integer between 1and100. The default value is1. <br><br>**NOTE:**<li> The number of concurrent runsis gated on the resources available in the specified compute target. Ensure that the compute target has the available resources for the desired concurrency. <li>`max_concurrent_trials`is capped at `max_trials` internally. For example, if user sets `max_concurrent_trials=4`, `max_trials=2`, values would be internally updated as`max_concurrent_trials=2`, `max_trials=2`.
299
+
`max_trials`| Parameter for maximum number of trials to sweep. Must be an integer between 1and1000. When exploring just the default hyperparameters for a given model architecture, set this parameter to 1. The default value is1.
300
+
`max_concurrent_trials`| Maximum number of trials that can run concurrently. If specified, must be an integer between 1and100. The default value is1. <br><br>**NOTE:**<li> The number of concurrent trialsis gated on the resources available in the specified compute target. Ensure that the compute target has the available resources for the desired concurrency. <li>`max_concurrent_trials`is capped at `max_trials` internally. For example, if user sets `max_concurrent_trials=4`, `max_trials=2`, values would be internally updated as`max_concurrent_trials=2`, `max_trials=2`.
301
301
`timeout_minutes`| The amount of time in minutes before the experiment terminates. If none specified, default experiment timeout_minutes is seven days (maximum 60 days)
302
302
303
303
# [Azure CLI](#tab/cli)
@@ -324,7 +324,7 @@ limits:
324
324
> [!IMPORTANT]
325
325
> This feature is currently in public preview. This preview version is provided without a service-level agreement. Certain features might not be supported or might have constrained capabilities. For more information, see [Supplemental Terms of Use for Microsoft Azure Previews](https://azure.microsoft.com/support/legal/preview-supplemental-terms/).
326
326
327
-
It is generally hard to predict the best model architecture and hyperparameters for a dataset. Also, in some cases the human time allocated to tuning hyperparameters may be limited. For computer vision tasks, you can specify a number of runsand the system will automatically determine the region of the hyperparameter space to sweep. You do not have to define a hyperparameter search space, a sampling method or an early termination policy.
327
+
It is generally hard to predict the best model architecture and hyperparameters for a dataset. Also, in some cases the human time allocated to tuning hyperparameters may be limited. For computer vision tasks, you can specify a number of trialsand the system will automatically determine the region of the hyperparameter space to sweep. You do not have to define a hyperparameter search space, a sampling method or an early termination policy.
A number of runs between 10and20 will likely work well on many datasets. The [time budget](#job-limits) for the AutoML job can still be set, but we recommend doing this only if each trial may take a long time.
352
+
A number of trials between 10and20 will likely work well on many datasets. The [time budget](#job-limits) for the AutoML job can still be set, but we recommend doing this only if each trial may take a long time.
353
353
354
354
> [!Warning]
355
355
> Launching automatic sweeps via the UIisnot supported at this time.
356
356
357
357
358
-
### Individual runs
358
+
### Individual trials
359
359
360
-
In individual runs, you directly control the model architecture and hyperparameters. The model architecture is passed via the `model_name` parameter.
360
+
In individual trials, you directly control the model architecture and hyperparameters. The model architecture is passed via the `model_name` parameter.
361
361
362
362
#### Supported model architectures
363
363
@@ -441,7 +441,7 @@ search_space:
441
441
442
442
You can define the model architectures and hyperparameters to sweep in the parameter space. You can either specify a single model architecture or multiple ones.
443
443
444
-
* See [Individual runs](#individual-runs) for the list of supported model architectures for each task type.
444
+
* See [Individual trials](#individual-trials) for the list of supported model architectures for each task type.
445
445
* See [Hyperparameters for computer vision tasks](reference-automl-images-hyperparameters.md) hyperparameters for each computer vision task type.
446
446
* See [details on supported distributions for discrete and continuous hyperparameters](how-to-tune-hyperparameters.md#define-the-search-space).
447
447
@@ -460,7 +460,7 @@ When sweeping hyperparameters, you need to specify the sampling method to use fo
460
460
461
461
#### Early termination policies
462
462
463
-
You can automatically end poorly performing runswith an early termination policy. Early termination improves computational efficiency, saving compute resources that would have been otherwise spent on less promising configurations. Automated MLfor images supports the following early termination policies using the `early_termination` parameter. If no termination policy is specified, allconfigurations are run to completion.
463
+
You can automatically end poorly performing trialswith an early termination policy. Early termination improves computational efficiency, saving compute resources that would have been otherwise spent on less promising trials. Automated MLfor images supports the following early termination policies using the `early_termination` parameter. If no termination policy is specified, alltrials are run to completion.
464
464
465
465
466
466
| Early termination policy | AutoML Job syntax |
@@ -607,12 +607,12 @@ In our experiments, we found that these augmentations help the model to generali
607
607
608
608
## Incremental training (optional)
609
609
610
-
Once the training runis done, you have the option to further train the model by loading the trained model checkpoint. You can either use the same dataset or a different one for incremental training.
610
+
Once the training jobis done, you have the option to further train the model by loading the trained model checkpoint. You can either use the same dataset or a different one for incremental training.
611
611
612
612
613
-
### Pass the checkpoint via run ID
613
+
### Pass the checkpoint via job ID
614
614
615
-
You can pass the runID that you want to load the checkpoint from.
615
+
You can pass the jobID that you want to load the checkpoint from.
@@ -697,18 +697,18 @@ When you've configured your AutoML Job to the desired settings, you can submit t
697
697
698
698
## Outputs and evaluation metrics
699
699
700
-
The automated ML training runs generates output model files, evaluation metrics, logs and deployment artifacts like the scoring fileand the environment file which can be viewed from the outputs and logs and metrics tab of the child runs.
700
+
The automated ML training jobs generates output model files, evaluation metrics, logs and deployment artifacts like the scoring fileand the environment file which can be viewed from the outputs and logs and metrics tab of the child jobs.
701
701
702
702
> [!TIP]
703
-
> Check how to navigate to the run results from the [View run results](how-to-understand-automated-ml.md#view-job-results) section.
703
+
> Check how to navigate to the job results from the [View job results](how-to-understand-automated-ml.md#view-job-results) section.
704
704
705
-
For definitions and examples of the performance charts and metrics provided for each run, see [Evaluate automated machine learning experiment results](how-to-understand-automated-ml.md#metrics-for-image-models-preview).
705
+
For definitions and examples of the performance charts and metrics provided for each job, see [Evaluate automated machine learning experiment results](how-to-understand-automated-ml.md#metrics-for-image-models-preview).
706
706
707
707
## Register and deploy model
708
708
709
-
Once the run completes, you can register the model that was created from the best run (configuration that resulted in the best primary metric). You can either register the model after downloading or by specifying the azureml path with corresponding jobid. Note: If you want to change the inference settings that are described below you need to download the model and change settings.json and register using the updated model folder.
709
+
Once the job completes, you can register the model that was created from the best trial (configuration that resulted in the best primary metric). You can either register the model after downloading or by specifying the azureml path with corresponding jobid. Note: If you want to change the inference settings that are described below you need to download the model and change settings.json and register using the updated model folder.
710
710
711
-
### Get the best run
711
+
### Get the best trial
712
712
713
713
# [Azure CLI](#tab/cli)
714
714
@@ -864,7 +864,7 @@ az ml online-endpoint update --name 'od-fridge-items-endpoint' --traffic 'od-fri
864
864
865
865
866
866
Alternatively You can deploy the model from the [Azure Machine Learning studio UI](https://ml.azure.com/).
867
-
Navigate to the model you wish to deploy in the **Models** tab of the automated MLrunand select on **Deploy**and select **Deploy to real-time endpoint** .
867
+
Navigate to the model you wish to deploy in the **Models** tab of the automated MLjoband select on **Deploy**and select **Deploy to real-time endpoint** .
868
868
869
869
.
By default, all image files are downloaded to disk prior to model training. If the size of the image files is greater than available disk space, the run will fail. Instead of downloading all images to disk, you can select to stream image files from Azure storage as they're needed during training. Image files are streamed from Azure storage directly to system memory, bypassing disk. At the same time, as many files as possible from storage are cached on disk to minimize the number of requests to storage.
1105
+
By default, all image files are downloaded to disk prior to model training. If the size of the image files is greater than available disk space, the job will fail. Instead of downloading all images to disk, you can select to stream image files from Azure storage as they're needed during training. Image files are streamed from Azure storage directly to system memory, bypassing disk. At the same time, as many files as possible from storage are cached on disk to minimize the number of requests to storage.
1106
1106
1107
1107
> [!NOTE]
1108
1108
> If streaming is enabled, ensure the Azure storage account is located in the same region as compute to minimize cost and latency.
0 commit comments