Skip to content

Commit 82371df

Browse files
Merge pull request #209988 from PhaniShekhar/phmantri/small-od-v2
Add v2 doc for small object detection
2 parents 5ad2cda + fab7b68 commit 82371df

File tree

5 files changed

+218
-67
lines changed

5 files changed

+218
-67
lines changed
Lines changed: 65 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,22 @@
11
---
22
title: Use AutoML to detect small objects in images
3-
description: Set up Azure Machine Learning automated ML to train small object detection models.
3+
titleSuffix: Azure Machine Learning
4+
description: Set up Azure Machine Learning automated ML to train small object detection models with the CLI v2 and Python SDK v2 (preview).
45
author: PhaniShekhar
56
ms.author: phmantri
67
ms.service: machine-learning
78
ms.subservice: automl
89
ms.topic: how-to
910
ms.date: 10/13/2021
10-
ms.custom: sdkv1, event-tier1-build-2022
11+
ms.custom: sdkv2, event-tier1-build-2022
1112
---
1213

1314
# Train a small object detection model with AutoML (preview)
1415

15-
[!INCLUDE [sdk v1](../../includes/machine-learning-sdk-v1.md)]
16+
[!INCLUDE [sdk v2](../../includes/machine-learning-sdk-v2.md)]
17+
> [!div class="op_single_selector" title1="Select the version of Azure Machine Learning CLI extension you are using:"]
18+
> * [v1](v1/how-to-use-automl-small-object-detect-v1.md)
19+
> * [v2 (current version)](how-to-use-automl-small-object-detect.md)
1620
1721
> [!IMPORTANT]
1822
> This feature is currently in public preview. This preview version is provided without a service-level agreement. Certain features might not be supported or might have constrained capabilities. For more information, see [Supplemental Terms of Use for Microsoft Azure Previews](https://azure.microsoft.com/support/legal/preview-supplemental-terms/).
@@ -21,96 +25,118 @@ In this article, you'll learn how to train an object detection model to detect s
2125

2226
Typically, computer vision models for object detection work well for datasets with relatively large objects. However, due to memory and computational constraints, these models tend to under-perform when tasked to detect small objects in high-resolution images. Because high-resolution images are typically large, they are resized before input into the model, which limits their capability to detect smaller objects--relative to the initial image size.
2327

24-
To help with this problem, automated ML supports tiling as part of the public preview computer vision capabilities. The tiling capability in automated ML is based on the concepts in [The Power of Tiling for Small Object Detection](https://openaccess.thecvf.com/content_CVPRW_2019/papers/UAVision/Unel_The_Power_of_Tiling_for_Small_Object_Detection_CVPRW_2019_paper.pdf).
28+
To help with this problem, automated ML supports tiling as part of the computer vision capabilities. The tiling capability in automated ML is based on the concepts in [The Power of Tiling for Small Object Detection](https://openaccess.thecvf.com/content_CVPRW_2019/papers/UAVision/Unel_The_Power_of_Tiling_for_Small_Object_Detection_CVPRW_2019_paper.pdf).
2529

26-
When tiling, each image is divided into a grid of tiles. Adjacent tiles overlap with each other in width and height dimensions. The tiles are cropped from the original as shown in the following image.
30+
When tiling, each image is divided into a grid of tiles. Adjacent tiles overlap with each other in width and height dimensions. The tiles are cropped from the original as shown in the following image.
2731

28-
![Tiles generation](./media/how-to-use-automl-small-object-detect/tiles-generation.png)
32+
:::image type="content" source="./media/how-to-use-automl-small-object-detect/tiles-generation.png" alt-text="Diagram that shows an image being divided into a grid of overlapping tiles.":::
2933

3034
## Prerequisites
3135

3236
* An Azure Machine Learning workspace. To create the workspace, see [Create workspace resources](quickstart-create-resources.md).
3337

34-
* This article assumes some familiarity with how to configure an [automated machine learning experiment for computer vision tasks](how-to-auto-train-image-models.md).
38+
* This article assumes some familiarity with how to configure an [automated machine learning experiment for computer vision tasks](how-to-auto-train-image-models.md).
3539

3640
## Supported models
3741

38-
Small object detection using tiling is currently supported for the following models:
39-
40-
* fasterrcnn_resnet18_fpn
41-
* fasterrcnn_resnet50_fpn
42-
* fasterrcnn_resnet34_fpn
43-
* fasterrcnn_resnet101_fpn
44-
* fasterrcnn_resnet152_fpn
45-
* retinanet_resnet50_fpn
42+
Small object detection using tiling is supported for all models supported by Automated ML for images for object detection task.
4643

4744
## Enable tiling during training
4845

49-
To enable tiling, you can set the `tile_grid_size` parameter to a value like (3, 2); where 3 is the number of tiles along the width dimension and 2 is the number of tiles along the height dimension. When this parameter is set to (3, 2), each image is split into a grid of 3 x 2 tiles. Each tile overlaps with the adjacent tiles, so that any objects that fall on the tile border are included completely in one of the tiles. This overlap can be controlled by the `tile_overlap_ratio` parameter, which defaults to 25%.
46+
To enable tiling, you can set the `tile_grid_size` parameter to a value like '3x2'; where 3 is the number of tiles along the width dimension and 2 is the number of tiles along the height dimension. When this parameter is set to '3x2', each image is split into a grid of 3 x 2 tiles. Each tile overlaps with the adjacent tiles, so that any objects that fall on the tile border are included completely in one of the tiles. This overlap can be controlled by the `tile_overlap_ratio` parameter, which defaults to 25%.
47+
48+
When tiling is enabled, the entire image and the tiles generated from it are passed through the model. These images and tiles are resized according to the `min_size` and `max_size` parameters before feeding to the model. The computation time increases proportionally because of processing this extra data.
5049

51-
When tiling is enabled, the entire image and the tiles generated from it are passed through the model. These images and tiles are resized according to the `min_size` and `max_size` parameters before feeding to the model. The computation time increases proportionally because of processing this extra data.
50+
For example, when the `tile_grid_size` parameter is '3x2', the computation time would be approximately seven times higher than without tiling.
5251

53-
For example, when the `tile_grid_size` parameter is (3, 2), the computation time would be approximately seven times when compared to no tiling.
52+
You can specify the value for `tile_grid_size` in your training parameters as a string.
5453

55-
You can specify the value for `tile_grid_size` in your hyperparameter space as a string.
54+
# [CLI v2](#tab/CLI-v2)
55+
56+
[!INCLUDE [cli v2](../../includes/machine-learning-cli-v2.md)]
57+
58+
```yaml
59+
training_parameters:
60+
tile_grid_size: '3x2'
61+
```
62+
63+
# [Python SDK v2 (preview)](#tab/SDK-v2)
5664
5765
```python
58-
parameter_space = {
59-
'model_name': choice('fasterrcnn_resnet50_fpn'),
60-
'tile_grid_size': choice('(3, 2)'),
61-
...
62-
}
66+
image_object_detection_job.set_training_parameters(
67+
tile_grid_size='3x2'
68+
)
6369
```
70+
---
6471

6572
The value for `tile_grid_size` parameter depends on the image dimensions and size of objects within the image. For example, larger number of tiles would be helpful when there are smaller objects in the images.
6673

6774
To choose the optimal value for this parameter for your dataset, you can use hyperparameter search. To do so, you can specify a choice of values for this parameter in your hyperparameter space.
6875

76+
# [CLI v2](#tab/CLI-v2)
77+
78+
[!INCLUDE [cli v2](../../includes/machine-learning-cli-v2.md)]
79+
80+
```yaml
81+
search_space:
82+
- model_name:
83+
type: choice
84+
values: ['fasterrcnn_resnet50_fpn']
85+
tile_grid_size:
86+
type: choice
87+
values: ['2x1', '3x2', '5x3']
88+
```
89+
90+
# [Python SDK v2 (preview)](#tab/SDK-v2)
91+
6992
```python
70-
parameter_space = {
71-
'model_name': choice('fasterrcnn_resnet50_fpn'),
72-
'tile_grid_size': choice('(2, 1)', '(3, 2)', '(5, 3)'),
73-
...
74-
}
93+
image_object_detection_job.extend_search_space(
94+
SearchSpace(
95+
model_name=Choice(['fasterrcnn_resnet50_fpn']),
96+
tile_grid_size=Choice(['2x1', '3x2', '5x3'])
97+
)
98+
)
7599
```
100+
---
101+
76102
## Tiling during inference
77103

78104
When a model trained with tiling is deployed, tiling also occurs during inference. Automated ML uses the `tile_grid_size` value from training to generate the tiles during inference. The entire image and corresponding tiles are passed through the model, and the object proposals from them are merged to output final predictions, like in the following image.
79105

80-
![Object proposals merge](./media/how-to-use-automl-small-object-detect/tiles-merge.png)
106+
:::image type="content" source="./media/how-to-use-automl-small-object-detect/tiles-merge.png" alt-text="Diagram that shows object proposals from image and tiles being merged to form the final predictions.":::
81107

82-
> [!NOTE]
108+
> [!NOTE]
83109
> It's possible that the same object is detected from multiple tiles, duplication detection is done to remove such duplicates.
84110
>
85111
> Duplicate detection is done by running NMS on the proposals from the tiles and the image. When multiple proposals overlap, the one with the highest score is picked and others are discarded as duplicates.Two proposals are considered to be overlapping when the intersection over union (iou) between them is greater than the `tile_predictions_nms_thresh` parameter.
86112
87-
You also have the option to enable tiling only during inference without enabling it in training. To do so, set the `tile_grid_size` parameter only during inference, not for training.
113+
You also have the option to enable tiling only during inference without enabling it in training. To do so, set the `tile_grid_size` parameter only during inference, not for training.
88114

89-
Doing so, may improve performance for some datasets, and won't incur the extra cost that comes with tiling at training time.
115+
Doing so, may improve performance for some datasets, and won't incur the extra cost that comes with tiling at training time.
90116

91-
## Tiling hyperparameters
117+
## Tiling hyperparameters
92118

93119
The following are the parameters you can use to control the tiling feature.
94120

95121
| Parameter Name | Description | Default |
96122
| --------------- |-------------| -------|
97-
| `tile_grid_size` | The grid size to use for tiling each image. Available for use during training, validation, and inference.<br><br>Tuple of two integers passed as a string, e.g `'(3, 2)'`<br><br> *Note: Setting this parameter increases the computation time proportionally, since all tiles and images are processed by the model.*| no default value |
123+
| `tile_grid_size` | The grid size to use for tiling each image. Available for use during training, validation, and inference.<br><br>Should be passed as a string in `'3x2'` format.<br><br> *Note: Setting this parameter increases the computation time proportionally, since all tiles and images are processed by the model.*| no default value |
98124
| `tile_overlap_ratio` | Controls the overlap ratio between adjacent tiles in each dimension. When the objects that fall on the tile boundary are too large to fit completely in one of the tiles, increase the value of this parameter so that the objects fit in at least one of the tiles completely.<br> <br> Must be a float in [0, 1).| 0.25 |
99125
| `tile_predictions_nms_thresh` | The intersection over union threshold to use to do non-maximum suppression (nms) while merging predictions from tiles and image. Available during validation and inference. Change this parameter if there are multiple boxes detected per object in the final predictions. <br><br> Must be float in [0, 1]. | 0.25 |
100126

101127

102128
## Example notebooks
103129

104-
See the [object detection sample notebook](https://github.com/Azure/azureml-examples/tree/v2samplesreorg/v1/python-sdk/tutorials/automl-with-azureml/image-object-detection/auto-ml-image-object-detection.ipynb) for detailed code examples of setting up and training an object detection model.
130+
See the [object detection sample notebook](https://github.com/Azure/azureml-examples/tree/main/sdk/python/jobs/automl-standalone-jobs/automl-image-object-detection-task-fridge-items/automl-image-object-detection-task-fridge-items.ipynb) for detailed code examples of setting up and training an object detection model.
105131

106132
>[!NOTE]
107-
> All images in this article are made available in accordance with the permitted use section of the [MIT licensing agreement](https://choosealicense.com/licenses/mit/).
133+
> All images in this article are made available in accordance with the permitted use section of the [MIT licensing agreement](https://choosealicense.com/licenses/mit/).
108134
> Copyright © 2020 Roboflow, Inc.
109135
110136
## Next steps
111137

112138
* Learn more about [how and where to deploy a model](/azure/machine-learning/how-to-deploy-managed-online-endpoints).
113-
* For definitions and examples of the performance charts and metrics provided for each job, see [Evaluate automated machine learning experiment results](how-to-understand-automated-ml.md).
139+
* For definitions and examples of the performance charts and metrics provided for each job, see [Evaluate automated machine learning experiment results](how-to-understand-automated-ml.md).
114140
* [Tutorial: Train an object detection model (preview) with AutoML and Python](tutorial-auto-train-image-models.md).
115141
* See [what hyperparameters are available for computer vision tasks](reference-automl-images-hyperparameters.md).
116-
*[Make predictions with ONNX on computer vision models from AutoML](how-to-inference-onnx-automl-image-models.md)
142+
* [Make predictions with ONNX on computer vision models from AutoML](how-to-inference-onnx-automl-image-models.md)

articles/machine-learning/reference-automl-images-hyperparameters.md

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,9 @@ This table summarizes hyperparameters specific to the `yolov5` algorithm.
3636
| `multi_scale` | Enable multi-scale image by varying image size by +/- 50% <br> Must be 0 or 1. <br> <br> *Note: training run may get into CUDA OOM if no sufficient GPU memory*. | 0 |
3737
| `box_score_threshold` | During inference, only return proposals with a score greater than `box_score_threshold`. The score is the multiplication of the objectness score and classification probability. <br> Must be a float in the range [0, 1]. | 0.1 |
3838
| `nms_iou_threshold` | IOU threshold used during inference in non-maximum suppression post processing. <br> Must be a float in the range [0, 1]. | 0.5 |
39+
| `tile_grid_size` | The grid size to use for tiling each image. <br>*Note: tile_grid_size must not be None to enable [small object detection](how-to-use-automl-small-object-detect.md) logic*<br> Should be passed as a string in '3x2' format. Example: --tile_grid_size '3x2' | No Default |
40+
| `tile_overlap_ratio` | Overlap ratio between adjacent tiles in each dimension. <br> Must be float in the range of [0, 1) | 0.25 |
41+
| `tile_predictions_nms_threshold` | The IOU threshold to use to perform NMS while merging predictions from tiles and image. Used in validation/ inference. <br> Must be float in the range of [0, 1] | 0.25 |
3942

4043
This table summarizes hyperparameters specific to the `maskrcnn_*` for instance segmentation during inference.
4144

@@ -80,13 +83,13 @@ The following table describes the hyperparameters that are model agnostic.
8083
## Image classification (multi-class and multi-label) specific hyperparameters
8184

8285
The following table summarizes hyperparmeters for image classification (multi-class and multi-label) tasks.
83-
86+
8487
| Parameter name | Description | Default |
8588
| ------------- |-------------|-----|
8689
| `weighted_loss` | <li> 0 for no weighted loss. <li> 1 for weighted loss with sqrt.(class_weights) <li> 2 for weighted loss with class_weights. <li> Must be 0 or 1 or 2. | 0 |
8790
| `validation_resize_size` | <li> Image size to which to resize before cropping for validation dataset. <li> Must be a positive integer. <br> <br> *Notes: <li> `seresnext` doesn't take an arbitrary size. <li> Training run may get into CUDA OOM if the size is too big*. | 256  |
8891
| `validation_crop_size` | <li> Image crop size that's input to your neural network for validation dataset. <li> Must be a positive integer. <br> <br> *Notes: <li> `seresnext` doesn't take an arbitrary size. <li> *ViT-variants* should have the same `validation_crop_size` and `training_crop_size`. <li> Training run may get into CUDA OOM if the size is too big*. | 224 |
89-
| `training_crop_size` | <li> Image crop size that's input to your neural network for train dataset. <li> Must be a positive integer. <br> <br> *Notes: <li> `seresnext` doesn't take an arbitrary size. <li> *ViT-variants* should have the same `validation_crop_size` and `training_crop_size`. <li> Training run may get into CUDA OOM if the size is too big*. | 224 |
92+
| `training_crop_size` | <li> Image crop size that's input to your neural network for train dataset. <li> Must be a positive integer. <br> <br> *Notes: <li> `seresnext` doesn't take an arbitrary size. <li> *ViT-variants* should have the same `validation_crop_size` and `training_crop_size`. <li> Training run may get into CUDA OOM if the size is too big*. | 224 |
9093

9194
## Object detection and instance segmentation task specific hyperparameters
9295

@@ -104,7 +107,7 @@ The following hyperparameters are for object detection and instance segmentation
104107
| `box_score_threshold` | During inference, only return proposals with a classification score greater than `box_score_threshold`. <br> Must be a float in the range [0, 1].| 0.3 |
105108
| `nms_iou_threshold` | IOU (intersection over union) threshold used in non-maximum suppression (NMS) for the prediction head. Used during inference. <br>Must be a float in the range [0, 1]. | 0.5 |
106109
| `box_detections_per_image` | Maximum number of detections per image, for all classes. <br> Must be a positive integer.| 100 |
107-
| `tile_grid_size` | The grid size to use for tiling each image. <br>*Note: tile_grid_size must not be None to enable [small object detection](how-to-use-automl-small-object-detect.md) logic*<br> A tuple of two integers passed as a string. Example: --tile_grid_size "(3, 2)" | No Default |
110+
| `tile_grid_size` | The grid size to use for tiling each image. <br>*Note: tile_grid_size must not be None to enable [small object detection](how-to-use-automl-small-object-detect.md) logic*<br> Should be passed as a string in '3x2' format. Example: --tile_grid_size '3x2' | No Default |
108111
| `tile_overlap_ratio` | Overlap ratio between adjacent tiles in each dimension. <br> Must be float in the range of [0, 1) | 0.25 |
109112
| `tile_predictions_nms_threshold` | The IOU threshold to use to perform NMS while merging predictions from tiles and image. Used in validation/ inference. <br> Must be float in the range of [0, 1] | 0.25 |
110113

0 commit comments

Comments
 (0)