Skip to content

Commit e7e5ac6

Browse files
eugene123twsungmancjaegukhyun
authored
MaskRCNN-ConvNeXt Instance Segmentation (#2292)
* add swin-t * update config * simplify data pipeline and use SGD optimizer * update model * add convnext-t * update convnext input size * update custom mask fcn head * add x101 * update * remove swin-t and ResNeXt template * style change * rename convnext * update compressed model yaml * modify nncf compression yaml * update doc * update structure * update pretrained model path * Update src/otx/algorithms/detection/configs/instance_segmentation/convnext_maskrcnn/data_pipeline.py Co-authored-by: Sungman Cho <[email protected]> * Update src/otx/algorithms/detection/configs/instance_segmentation/convnext_maskrcnn/tile_pipeline.py Co-authored-by: Sungman Cho <[email protected]> * change Gflops * update doc * fix model url path * Update src/otx/algorithms/detection/configs/instance_segmentation/convnext_maskrcnn/tile_pipeline.py Co-authored-by: Jaeguk Hyun <[email protected]> * Update src/otx/algorithms/detection/configs/instance_segmentation/convnext_maskrcnn/model.py Co-authored-by: Jaeguk Hyun <[email protected]> * Update src/otx/algorithms/detection/configs/instance_segmentation/convnext_maskrcnn/deployment_tile_classifier.py Co-authored-by: Jaeguk Hyun <[email protected]> * Update src/otx/algorithms/detection/configs/instance_segmentation/convnext_maskrcnn/data_pipeline.py Co-authored-by: Jaeguk Hyun <[email protected]> * Update src/otx/algorithms/detection/configs/instance_segmentation/convnext_maskrcnn/__init__.py Co-authored-by: Jaeguk Hyun <[email protected]> * fix according to review * update hpo * include convnext tests --------- Co-authored-by: Sungman Cho <[email protected]> Co-authored-by: Jaeguk Hyun <[email protected]>
1 parent 00bc6a8 commit e7e5ac6

File tree

14 files changed

+517
-32
lines changed

14 files changed

+517
-32
lines changed

docs/source/guide/explanation/algorithms/segmentation/instance_segmentation.rst

Lines changed: 16 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -58,15 +58,21 @@ Models
5858

5959
We support the following ready-to-use model templates:
6060

61-
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+---------------------+-----------------+
62-
| Template ID | Name | Complexity (GFLOPs) | Model size (MB) |
63-
+================================================================================================================================================================================================================================================+============================+=====================+=================+
64-
| `Custom_Counting_Instance_Segmentation_MaskRCNN_EfficientNetB2B <https://github.com/openvinotoolkit/training_extensions/blob/develop/src/otx/algorithms/detection/configs/instance_segmentation/efficientnetb2b_maskrcnn/template.yaml>`_ | MaskRCNN-EfficientNetB2B | 68.48 | 13.27 |
65-
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+---------------------+-----------------+
66-
| `Custom_Counting_Instance_Segmentation_MaskRCNN_ResNet50 <https://github.com/openvinotoolkit/training_extensions/blob/develop/src/otx/algorithms/detection/configs/instance_segmentation/resnet50_maskrcnn/template.yaml>`_ | MaskRCNN-ResNet50 | 533.80 | 177.90 |
67-
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+---------------------+-----------------+
61+
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+---------------------+-----------------+
62+
| Template ID | Name | Complexity (GFLOPs) | Model size (MB) |
63+
+============================================================================================================================================================================================================================================+============================+=====================+=================+
64+
| `Custom_Counting_Instance_Segmentation_MaskRCNN_EfficientNetB2B <https://github.com/openvinotoolkit/training_extensions/blob/develop/src/otx/algorithms/detection/configs/instance_segmentation/efficientnetb2b_maskrcnn/template.yaml>`_ | MaskRCNN-EfficientNetB2B | 68.48 | 13.27 |
65+
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+---------------------+-----------------+
66+
| `Custom_Counting_Instance_Segmentation_MaskRCNN_ResNet50 <https://github.com/openvinotoolkit/training_extensions/blob/develop/src/otx/algorithms/detection/configs/instance_segmentation/resnet50_maskrcnn/template.yaml>`_ | MaskRCNN-ResNet50 | 533.80 | 177.90 |
67+
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+---------------------+-----------------+
68+
| `Custom_Counting_Instance_Segmentation_MaskRCNN_ConvNeXt <https://github.com/openvinotoolkit/training_extensions/blob/develop/src/otx/algorithms/detection/configs/instance_segmentation/convnext_maskrcnn/template.yaml>`_ | MaskRCNN-ConvNeXt | 266.78 | 192.4 |
69+
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+---------------------+-----------------+
6870

69-
``MaskRCNN-ResNet50`` uses `ResNet-50 <https://arxiv.org/abs/1512.03385>`_ as the backbone network for the image features extraction. It has more parameters and FLOPs and needs more time to train, meanwhile providing superior performance in terms of accuracy. ``MaskRCNN-EfficientNetB2B`` uses `EfficientNet-B2 <https://arxiv.org/abs/1905.11946>`_ as the backbone network. It is a good trade-off between accuracy and speed. It is a better choice when training time and computational cost are in priority.
71+
MaskRCNN-ResNet50 utilizes the `ResNet-50 <https://arxiv.org/abs/1512.03385>`_ architecture as the backbone network for extracting image features. This choice of backbone network results in a higher number of parameters and FLOPs, which consequently requires more training time. However, the model offers superior performance in terms of accuracy.
72+
73+
On the other hand, MaskRCNN-EfficientNetB2B employs the `EfficientNet-B2 <https://arxiv.org/abs/1905.11946>`_ architecture as the backbone network. This selection strikes a balance between accuracy and speed, making it a preferable option when prioritizing training time and computational cost.
74+
75+
Recently, we have made updates to MaskRCNN-ConvNeXt, incorporating the `ConvNeXt backbone <https://arxiv.org/abs/2201.03545>`_. Through our experiments, we have observed that this variant achieves better accuracy compared to MaskRCNN-ResNet50 while utilizing less GPU memory. However, it is important to note that the training time and inference duration may slightly increase. If minimizing training time is a significant concern, we recommend considering a switch to MaskRCNN-EfficientNetB2B.
7076

7177
.. In the table below the `mAP <https://en.wikipedia.org/wiki/S%C3%B8rensen%E2%80%93Dice_coefficient>`_ metric on some academic datasets using our :ref:`supervised pipeline <instance_segmentation_supervised_pipeline>` is presented. The results were obtained on our templates without any changes. We use 1024x1024 image resolution, for other hyperparameters, please, refer to the related template. We trained each model with single Nvidia GeForce RTX3090.
7278
@@ -77,6 +83,8 @@ We support the following ready-to-use model templates:
7783
.. +---------------------------+--------------+------------+-----------------+
7884
.. | MaskRCNN-ResNet50 | N/A | N/A | N/A |
7985
.. +---------------------------+--------------+------------+-----------------+
86+
.. | MaskRCNN-ConvNeXt | N/A | N/A | N/A |
87+
.. +---------------------------+--------------+------------+-----------------+
8088
8189
.. *******************
8290
.. Tiling Pipeline

docs/source/guide/tutorials/base/how_to_train/instance_segmentation.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -136,6 +136,7 @@ The list of supported templates for instance segmentation is available with the
136136
+-----------------------+----------------------------------------------------------------+--------------------------+---------------------------------------------------------------------------------------------------+
137137
| INSTANCE_SEGMENTATION | Custom_Counting_Instance_Segmentation_MaskRCNN_ResNet50 | MaskRCNN-ResNet50 | src/otx/algorithms/detection/configs/instance_segmentation/resnet50_maskrcnn/template.yaml |
138138
| INSTANCE_SEGMENTATION | Custom_Counting_Instance_Segmentation_MaskRCNN_EfficientNetB2B | MaskRCNN-EfficientNetB2B | src/otx/algorithms/detection/configs/instance_segmentation/efficientnetb2b_maskrcnn/template.yaml |
139+
| INSTANCE_SEGMENTATION | Custom_Counting_Instance_Segmentation_MaskRCNN_ConvNeXt | MaskRCNN-ConvNeXt | src/otx/algorithms/detection/configs/instance_segmentation/convnext_maskrcnn/template.yaml |
139140
+-----------------------+----------------------------------------------------------------+--------------------------+---------------------------------------------------------------------------------------------------+
140141
141142
2. We need to create
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
"""Initialization of ConvNeXt-T-MaskRCNN model for Instance-Segmentation Task."""
2+
3+
# Copyright (C) 2023 Intel Corporation
4+
# SPDX-License-Identifier: Apache-2.0
Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
{
2+
"base": {
3+
"find_unused_parameters": true,
4+
"nncf_config": {
5+
"target_metric_name": "mAP",
6+
"input_info": {
7+
"sample_size": [1, 3, 1024, 1024]
8+
},
9+
"compression": [],
10+
"log_dir": "/tmp"
11+
}
12+
},
13+
"nncf_quantization": {
14+
"optimizer": {
15+
"lr": 0.0005
16+
},
17+
"nncf_config": {
18+
"compression": [
19+
{
20+
"algorithm": "quantization",
21+
"initializer": {
22+
"range": {
23+
"num_init_samples": 1000
24+
},
25+
"batchnorm_adaptation": {
26+
"num_bn_adaptation_samples": 1000
27+
}
28+
}
29+
}
30+
],
31+
"accuracy_aware_training": {
32+
"mode": "early_exit",
33+
"params": {
34+
"maximal_absolute_accuracy_degradation": 0.01,
35+
"maximal_total_epochs": 20
36+
}
37+
}
38+
}
39+
},
40+
"order_of_parts": ["nncf_quantization"]
41+
}
Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,64 @@
1+
"""Data Pipeline of ConvNeXt model for Instance-Seg Task."""
2+
3+
# Copyright (C) 2023 Intel Corporation
4+
# SPDX-License-Identifier: Apache-2.0
5+
6+
# pylint: disable=invalid-name
7+
8+
__img_size = (1024, 1024)
9+
10+
# TODO: A comparison experiment is needed to determine which value is appropriate for to_rgb.
11+
__img_norm_cfg = dict(mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
12+
13+
train_pipeline = [
14+
dict(type="LoadImageFromFile"),
15+
dict(type="LoadAnnotations", with_bbox=True, with_mask=True, poly2mask=False),
16+
dict(type="Resize", img_scale=__img_size, keep_ratio=False),
17+
dict(type="RandomFlip", flip_ratio=0.5),
18+
dict(type="Normalize", **__img_norm_cfg),
19+
dict(type="DefaultFormatBundle"),
20+
dict(type="Collect", keys=["img", "gt_bboxes", "gt_labels", "gt_masks"]),
21+
]
22+
23+
test_pipeline = [
24+
dict(type="LoadImageFromFile"),
25+
dict(
26+
type="MultiScaleFlipAug",
27+
img_scale=__img_size,
28+
flip=False,
29+
transforms=[
30+
dict(type="Resize", keep_ratio=False),
31+
dict(type="RandomFlip"),
32+
dict(type="Normalize", **__img_norm_cfg),
33+
dict(type="ImageToTensor", keys=["img"]),
34+
dict(type="Collect", keys=["img"]),
35+
],
36+
),
37+
]
38+
39+
__dataset_type = "CocoDataset"
40+
41+
data = dict(
42+
samples_per_gpu=2,
43+
workers_per_gpu=2,
44+
train=dict(
45+
type=__dataset_type,
46+
ann_file="data/coco/annotations/instances_train2017.json",
47+
img_prefix="data/coco/train2017",
48+
pipeline=train_pipeline,
49+
),
50+
val=dict(
51+
type=__dataset_type,
52+
test_mode=True,
53+
ann_file="data/coco/annotations/instances_val2017.json",
54+
img_prefix="data/coco/val2017",
55+
pipeline=test_pipeline,
56+
),
57+
test=dict(
58+
type=__dataset_type,
59+
test_mode=True,
60+
ann_file="data/coco/annotations/instances_val2017.json",
61+
img_prefix="data/coco/val2017",
62+
pipeline=test_pipeline,
63+
),
64+
)
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
"""MMDployment config of Resnet model for Instance-Seg Task."""
2+
3+
_base_ = ["../../base/deployments/base_instance_segmentation_dynamic.py"]
4+
5+
scale_ir_input = True
6+
7+
ir_config = dict(
8+
output_names=["boxes", "labels", "masks"],
9+
)
Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
"""MMDeploy config partitioning ConvNeXt-T MaskRCNN model to tile classifier and MaskRCNN model."""
2+
# Copyright (C) 2023 Intel Corporation
3+
# SPDX-License-Identifier: Apache-2.0
4+
5+
_base_ = ["./deployment.py"]
6+
7+
ir_config = dict(
8+
output_names=["boxes", "labels", "masks", "tile_prob"],
9+
)
10+
11+
partition_config = dict(
12+
type="tile_classifier",
13+
apply_marks=True,
14+
partition_cfg=[
15+
dict(
16+
save_file="tile_classifier.onnx",
17+
start=["tile_classifier:input"],
18+
end=["tile_classifier:output"],
19+
output_names=["tile_prob"],
20+
)
21+
],
22+
)
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
metric: mAP
2+
search_algorithm: asha
3+
early_stop: None
4+
hp_space:
5+
learning_parameters.learning_rate:
6+
param_type: qloguniform
7+
range:
8+
- 0.0001
9+
- 0.01
10+
- 0.0001
11+
learning_parameters.batch_size:
12+
param_type: qloguniform
13+
range:
14+
- 2
15+
- 6
16+
- 2
Lines changed: 133 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,133 @@
1+
"""Model configuration of ConvNeXt-T-MaskRCNN model for Instance-Seg Task."""
2+
3+
# Copyright (C) 2023 Intel Corporation
4+
# SPDX-License-Identifier: Apache-2.0
5+
6+
# pylint: disable=invalid-name
7+
8+
_base_ = [
9+
"../../../../../recipes/stages/instance-segmentation/incremental.py",
10+
"../../base/models/detector.py",
11+
]
12+
13+
task = "instance-segmentation"
14+
15+
model = dict(
16+
type="CustomMaskRCNN",
17+
backbone=dict(
18+
type="mmcls.ConvNeXt",
19+
arch="tiny",
20+
out_indices=[0, 1, 2, 3],
21+
drop_path_rate=0.4,
22+
layer_scale_init_value=1.0,
23+
gap_before_final_norm=False,
24+
),
25+
neck=dict(type="FPN", in_channels=[96, 192, 384, 768], out_channels=256, num_outs=5),
26+
rpn_head=dict(
27+
type="RPNHead",
28+
in_channels=256,
29+
feat_channels=256,
30+
anchor_generator=dict(type="AnchorGenerator", scales=[8], ratios=[0.5, 1.0, 2.0], strides=[4, 8, 16, 32, 64]),
31+
bbox_coder=dict(type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[1.0, 1.0, 1.0, 1.0]),
32+
loss_cls=dict(type="CrossEntropyLoss", use_sigmoid=True, loss_weight=1.0),
33+
loss_bbox=dict(type="L1Loss", loss_weight=1.0),
34+
),
35+
roi_head=dict(
36+
type="CustomRoIHead",
37+
bbox_roi_extractor=dict(
38+
type="SingleRoIExtractor",
39+
roi_layer=dict(type="RoIAlign", output_size=7, sampling_ratio=0),
40+
out_channels=256,
41+
featmap_strides=[4, 8, 16, 32],
42+
),
43+
bbox_head=dict(
44+
type="Shared2FCBBoxHead",
45+
in_channels=256,
46+
fc_out_channels=1024,
47+
roi_feat_size=7,
48+
num_classes=80,
49+
bbox_coder=dict(
50+
type="DeltaXYWHBBoxCoder", target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.1, 0.1, 0.2, 0.2]
51+
),
52+
reg_class_agnostic=False,
53+
loss_cls=dict(type="CrossEntropyLoss", use_sigmoid=False, loss_weight=1.0),
54+
loss_bbox=dict(type="L1Loss", loss_weight=1.0),
55+
),
56+
mask_roi_extractor=dict(
57+
type="SingleRoIExtractor",
58+
roi_layer=dict(type="RoIAlign", output_size=14, sampling_ratio=0),
59+
out_channels=256,
60+
featmap_strides=[4, 8, 16, 32],
61+
),
62+
mask_head=dict(
63+
type="CustomFCNMaskHead",
64+
num_convs=4,
65+
in_channels=256,
66+
conv_out_channels=256,
67+
num_classes=80,
68+
loss_mask=dict(type="CrossEntropyLoss", use_mask=True, loss_weight=1.0),
69+
),
70+
),
71+
train_cfg=dict(
72+
rpn=dict(
73+
assigner=dict(
74+
type="CustomMaxIoUAssigner",
75+
pos_iou_thr=0.7,
76+
neg_iou_thr=0.3,
77+
min_pos_iou=0.3,
78+
match_low_quality=True,
79+
ignore_iof_thr=-1,
80+
gpu_assign_thr=300,
81+
),
82+
sampler=dict(type="RandomSampler", num=256, pos_fraction=0.5, neg_pos_ub=-1, add_gt_as_proposals=False),
83+
allowed_border=-1,
84+
pos_weight=-1,
85+
debug=False,
86+
),
87+
rpn_proposal=dict(
88+
nms_across_levels=False,
89+
nms_pre=2000,
90+
max_per_img=1000,
91+
nms=dict(type="nms", iou_threshold=0.7),
92+
min_bbox_size=0,
93+
),
94+
rcnn=dict(
95+
assigner=dict(
96+
type="CustomMaxIoUAssigner",
97+
pos_iou_thr=0.5,
98+
neg_iou_thr=0.5,
99+
min_pos_iou=0.5,
100+
match_low_quality=True,
101+
ignore_iof_thr=-1,
102+
gpu_assign_thr=300,
103+
),
104+
sampler=dict(type="RandomSampler", num=512, pos_fraction=0.25, neg_pos_ub=-1, add_gt_as_proposals=True),
105+
mask_size=28,
106+
pos_weight=-1,
107+
debug=False,
108+
),
109+
),
110+
test_cfg=dict(
111+
rpn=dict(
112+
nms_across_levels=False,
113+
nms_pre=1000,
114+
max_per_img=1000,
115+
nms=dict(type="nms", iou_threshold=0.7),
116+
min_bbox_size=0,
117+
),
118+
rcnn=dict(
119+
score_thr=0.05, nms=dict(type="nms", iou_threshold=0.5, max_num=100), max_per_img=100, mask_thr_binary=0.5
120+
),
121+
),
122+
)
123+
124+
load_from = "https://storage.openvinotoolkit.org/\
125+
repositories/openvino_training_extensions/\
126+
models/instance_segmentation/\
127+
mask_rcnn_convnext-t_p4_w7_fpn_fp16.pth"
128+
129+
evaluation = dict(interval=1, metric="mAP", save_best="mAP", iou_thr=[0.5])
130+
ignore = True
131+
132+
custom_imports = dict(imports=["mmcls.models"], allow_failed_imports=False)
133+
fp16 = dict(loss_scale=dict(init_scale=512.0))

0 commit comments

Comments
 (0)