Skip to content

Commit abd5042

Browse files
authored
Merge pull request #3846 from pkulzc/master
Internal changes for object detection
2 parents c3b2660 + 143464d commit abd5042

40 files changed

+1021
-240
lines changed

research/object_detection/README.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,7 @@ https://scholar.googleusercontent.com/scholar.bib?q=info:l291WsrB-hQJ:scholar.go
2929

3030
* Jonathan Huang, github: [jch1](https://github.com/jch1)
3131
* Vivek Rathod, github: [tombstone](https://github.com/tombstone)
32+
* Ronny Votel, github: [ronnyvotel](https://github.com/ronnyvotel)
3233
* Derek Chow, github: [derekjchow](https://github.com/derekjchow)
3334
* Chen Sun, github: [jesu9](https://github.com/jesu9)
3435
* Menglong Zhu, github: [dreamdragon](https://github.com/dreamdragon)
@@ -89,6 +90,16 @@ reporting an issue.
8990

9091
## Release information
9192

93+
### April 2, 2018
94+
95+
Supercharge your mobile phones with the next generation mobile object detector!
96+
We are adding support for MobileNet V2 with SSDLite presented in
97+
[MobileNetV2: Inverted Residuals and Linear Bottlenecks](https://arxiv.org/abs/1801.04381).
98+
This model is 35% faster than Mobilenet V1 SSD on a Google Pixel phone CPU (200ms vs. 270ms) at the same accuracy.
99+
Along with the model definition, we are also releasing a model checkpoint trained on the COCO dataset.
100+
101+
<b>Thanks to contributors</b>: Menglong Zhu, Mark Sandler, Zhichao Lu, Vivek Rathod, Jonathan Huang
102+
92103
### February 9, 2018
93104

94105
We now support instance segmentation!! In this API update we support a number of instance segmentation models similar to those discussed in the [Mask R-CNN paper](https://arxiv.org/abs/1703.06870). For further details refer to

research/object_detection/builders/model_builder.py

Lines changed: 29 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@
3030
from object_detection.models import faster_rcnn_inception_resnet_v2_feature_extractor as frcnn_inc_res
3131
from object_detection.models import faster_rcnn_inception_v2_feature_extractor as frcnn_inc_v2
3232
from object_detection.models import faster_rcnn_nas_feature_extractor as frcnn_nas
33+
from object_detection.models import faster_rcnn_pnas_feature_extractor as frcnn_pnas
3334
from object_detection.models import faster_rcnn_resnet_v1_feature_extractor as frcnn_resnet_v1
3435
from object_detection.models import ssd_resnet_v1_fpn_feature_extractor as ssd_resnet_v1_fpn
3536
from object_detection.models.embedded_ssd_mobilenet_v1_feature_extractor import EmbeddedSSDMobileNetV1FeatureExtractor
@@ -55,6 +56,8 @@
5556
FASTER_RCNN_FEATURE_EXTRACTOR_CLASS_MAP = {
5657
'faster_rcnn_nas':
5758
frcnn_nas.FasterRCNNNASFeatureExtractor,
59+
'faster_rcnn_pnas':
60+
frcnn_pnas.FasterRCNNPNASFeatureExtractor,
5861
'faster_rcnn_inception_resnet_v2':
5962
frcnn_inc_res.FasterRCNNInceptionResnetV2FeatureExtractor,
6063
'faster_rcnn_inception_v2':
@@ -95,13 +98,19 @@ def build(model_config, is_training, add_summaries=True):
9598

9699

97100
def _build_ssd_feature_extractor(feature_extractor_config, is_training,
98-
reuse_weights=None):
101+
reuse_weights=None,
102+
inplace_batchnorm_update=False):
99103
"""Builds a ssd_meta_arch.SSDFeatureExtractor based on config.
100104
101105
Args:
102106
feature_extractor_config: A SSDFeatureExtractor proto config from ssd.proto.
103107
is_training: True if this feature extractor is being built for training.
104108
reuse_weights: if the feature extractor should reuse weights.
109+
inplace_batchnorm_update: Whether to update batch_norm inplace during
110+
training. This is required for batch norm to work correctly on TPUs. When
111+
this is false, user must add a control dependency on
112+
tf.GraphKeys.UPDATE_OPS for train/loss op in order to update the batch
113+
norm moving average parameters.
105114
106115
Returns:
107116
ssd_meta_arch.SSDFeatureExtractor based on config.
@@ -126,7 +135,8 @@ def _build_ssd_feature_extractor(feature_extractor_config, is_training,
126135
return feature_extractor_class(is_training, depth_multiplier, min_depth,
127136
pad_to_multiple, conv_hyperparams,
128137
batch_norm_trainable, reuse_weights,
129-
use_explicit_padding, use_depthwise)
138+
use_explicit_padding, use_depthwise,
139+
inplace_batchnorm_update)
130140

131141

132142
def _build_ssd_model(ssd_config, is_training, add_summaries):
@@ -140,15 +150,18 @@ def _build_ssd_model(ssd_config, is_training, add_summaries):
140150
141151
Returns:
142152
SSDMetaArch based on the config.
153+
143154
Raises:
144155
ValueError: If ssd_config.type is not recognized (i.e. not registered in
145156
model_class_map).
146157
"""
147158
num_classes = ssd_config.num_classes
148159

149160
# Feature extractor
150-
feature_extractor = _build_ssd_feature_extractor(ssd_config.feature_extractor,
151-
is_training)
161+
feature_extractor = _build_ssd_feature_extractor(
162+
feature_extractor_config=ssd_config.feature_extractor,
163+
is_training=is_training,
164+
inplace_batchnorm_update=ssd_config.inplace_batchnorm_update)
152165

153166
box_coder = box_coder_builder.build(ssd_config.box_coder)
154167
matcher = matcher_builder.build(ssd_config.matcher)
@@ -194,21 +207,29 @@ def _build_ssd_model(ssd_config, is_training, add_summaries):
194207

195208

196209
def _build_faster_rcnn_feature_extractor(
197-
feature_extractor_config, is_training, reuse_weights=None):
210+
feature_extractor_config, is_training, reuse_weights=None,
211+
inplace_batchnorm_update=False):
198212
"""Builds a faster_rcnn_meta_arch.FasterRCNNFeatureExtractor based on config.
199213
200214
Args:
201215
feature_extractor_config: A FasterRcnnFeatureExtractor proto config from
202216
faster_rcnn.proto.
203217
is_training: True if this feature extractor is being built for training.
204218
reuse_weights: if the feature extractor should reuse weights.
219+
inplace_batchnorm_update: Whether to update batch_norm inplace during
220+
training. This is required for batch norm to work correctly on TPUs. When
221+
this is false, user must add a control dependency on
222+
tf.GraphKeys.UPDATE_OPS for train/loss op in order to update the batch
223+
norm moving average parameters.
205224
206225
Returns:
207226
faster_rcnn_meta_arch.FasterRCNNFeatureExtractor based on config.
208227
209228
Raises:
210229
ValueError: On invalid feature extractor type.
211230
"""
231+
if inplace_batchnorm_update:
232+
raise ValueError('inplace batchnorm updates not supported.')
212233
feature_type = feature_extractor_config.type
213234
first_stage_features_stride = (
214235
feature_extractor_config.first_stage_features_stride)
@@ -238,6 +259,7 @@ def _build_faster_rcnn_model(frcnn_config, is_training, add_summaries):
238259
239260
Returns:
240261
FasterRCNNMetaArch based on the config.
262+
241263
Raises:
242264
ValueError: If frcnn_config.type is not recognized (i.e. not registered in
243265
model_class_map).
@@ -246,7 +268,8 @@ def _build_faster_rcnn_model(frcnn_config, is_training, add_summaries):
246268
image_resizer_fn = image_resizer_builder.build(frcnn_config.image_resizer)
247269

248270
feature_extractor = _build_faster_rcnn_feature_extractor(
249-
frcnn_config.feature_extractor, is_training)
271+
frcnn_config.feature_extractor, is_training,
272+
frcnn_config.inplace_batchnorm_update)
250273

251274
number_of_stages = frcnn_config.number_of_stages
252275
first_stage_anchor_generator = anchor_generator_builder.build(

research/object_detection/builders/model_builder_test.py

Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@
2525
from object_detection.models import faster_rcnn_inception_resnet_v2_feature_extractor as frcnn_inc_res
2626
from object_detection.models import faster_rcnn_inception_v2_feature_extractor as frcnn_inc_v2
2727
from object_detection.models import faster_rcnn_nas_feature_extractor as frcnn_nas
28+
from object_detection.models import faster_rcnn_pnas_feature_extractor as frcnn_pnas
2829
from object_detection.models import faster_rcnn_resnet_v1_feature_extractor as frcnn_resnet_v1
2930
from object_detection.models import ssd_resnet_v1_fpn_feature_extractor as ssd_resnet_v1_fpn
3031
from object_detection.models.embedded_ssd_mobilenet_v1_feature_extractor import EmbeddedSSDMobileNetV1FeatureExtractor
@@ -297,6 +298,7 @@ def test_create_ssd_resnet_v1_fpn_model_from_config(self):
297298
def test_create_ssd_mobilenet_v1_model_from_config(self):
298299
model_text_proto = """
299300
ssd {
301+
inplace_batchnorm_update: true
300302
feature_extractor {
301303
type: 'ssd_mobilenet_v1'
302304
conv_hyperparams {
@@ -519,6 +521,7 @@ def test_create_embedded_ssd_mobilenet_v1_model_from_config(self):
519521
def test_create_faster_rcnn_resnet_v1_models_from_config(self):
520522
model_text_proto = """
521523
faster_rcnn {
524+
inplace_batchnorm_update: true
522525
num_classes: 3
523526
image_resizer {
524527
keep_aspect_ratio_resizer {
@@ -726,6 +729,73 @@ def test_create_faster_rcnn_nas_model_from_config(self):
726729
model._feature_extractor,
727730
frcnn_nas.FasterRCNNNASFeatureExtractor)
728731

732+
def test_create_faster_rcnn_pnas_model_from_config(self):
733+
model_text_proto = """
734+
faster_rcnn {
735+
num_classes: 3
736+
image_resizer {
737+
keep_aspect_ratio_resizer {
738+
min_dimension: 600
739+
max_dimension: 1024
740+
}
741+
}
742+
feature_extractor {
743+
type: 'faster_rcnn_pnas'
744+
}
745+
first_stage_anchor_generator {
746+
grid_anchor_generator {
747+
scales: [0.25, 0.5, 1.0, 2.0]
748+
aspect_ratios: [0.5, 1.0, 2.0]
749+
height_stride: 16
750+
width_stride: 16
751+
}
752+
}
753+
first_stage_box_predictor_conv_hyperparams {
754+
regularizer {
755+
l2_regularizer {
756+
}
757+
}
758+
initializer {
759+
truncated_normal_initializer {
760+
}
761+
}
762+
}
763+
initial_crop_size: 17
764+
maxpool_kernel_size: 1
765+
maxpool_stride: 1
766+
second_stage_box_predictor {
767+
mask_rcnn_box_predictor {
768+
fc_hyperparams {
769+
op: FC
770+
regularizer {
771+
l2_regularizer {
772+
}
773+
}
774+
initializer {
775+
truncated_normal_initializer {
776+
}
777+
}
778+
}
779+
}
780+
}
781+
second_stage_post_processing {
782+
batch_non_max_suppression {
783+
score_threshold: 0.01
784+
iou_threshold: 0.6
785+
max_detections_per_class: 100
786+
max_total_detections: 300
787+
}
788+
score_converter: SOFTMAX
789+
}
790+
}"""
791+
model_proto = model_pb2.DetectionModel()
792+
text_format.Merge(model_text_proto, model_proto)
793+
model = model_builder.build(model_proto, is_training=True)
794+
self.assertIsInstance(model, faster_rcnn_meta_arch.FasterRCNNMetaArch)
795+
self.assertIsInstance(
796+
model._feature_extractor,
797+
frcnn_pnas.FasterRCNNPNASFeatureExtractor)
798+
729799
def test_create_faster_rcnn_inception_resnet_v2_model_from_config(self):
730800
model_text_proto = """
731801
faster_rcnn {

research/object_detection/core/box_list_ops_test.py

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@
1717
import numpy as np
1818
import tensorflow as tf
1919
from tensorflow.python.framework import errors
20+
from tensorflow.python.framework import ops
2021

2122
from object_detection.core import box_list
2223
from object_detection.core import box_list_ops
@@ -509,9 +510,13 @@ def test_sort_by_field_invalid_inputs(self):
509510
with self.assertRaises(ValueError):
510511
box_list_ops.sort_by_field(boxes, 'misc')
511512

512-
with self.assertRaisesWithPredicateMatch(errors.InvalidArgumentError,
513-
'Incorrect field size'):
514-
sess.run(box_list_ops.sort_by_field(boxes, 'weights').get())
513+
if ops._USE_C_API:
514+
with self.assertRaises(ValueError):
515+
box_list_ops.sort_by_field(boxes, 'weights')
516+
else:
517+
with self.assertRaisesWithPredicateMatch(errors.InvalidArgumentError,
518+
'Incorrect field size'):
519+
sess.run(box_list_ops.sort_by_field(boxes, 'weights').get())
515520

516521
def test_visualize_boxes_in_image(self):
517522
image = tf.zeros((6, 4, 3))

research/object_detection/core/preprocessor.py

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2279,7 +2279,11 @@ def resize_masks_branch():
22792279
return new_masks
22802280

22812281
def reshape_masks_branch():
2282-
new_masks = tf.reshape(masks, [0, new_size[0], new_size[1]])
2282+
# The shape function will be computed for both branches of the
2283+
# condition, regardless of which branch is actually taken. Make sure
2284+
# that we don't trigger an assertion in the shape function when trying
2285+
# to reshape a non empty tensor into an empty one.
2286+
new_masks = tf.reshape(masks, [-1, new_size[0], new_size[1]])
22832287
return new_masks
22842288

22852289
masks = tf.cond(num_instances > 0, resize_masks_branch,

research/object_detection/dataset_tools/download_and_preprocess_mscoco.sh

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,7 @@ cd ${SCRATCH_DIR}
6464
# Download the images.
6565
BASE_IMAGE_URL="http://images.cocodataset.org/zips"
6666

67-
# TRAIN_IMAGE_FILE="train2017.zip"
67+
TRAIN_IMAGE_FILE="train2017.zip"
6868
download_and_unzip ${BASE_IMAGE_URL} ${TRAIN_IMAGE_FILE}
6969
TRAIN_IMAGE_DIR="${SCRATCH_DIR}/train2017"
7070

@@ -91,7 +91,7 @@ download_and_unzip ${BASE_IMAGE_INFO_URL} ${IMAGE_INFO_FILE}
9191

9292
TESTDEV_ANNOTATIONS_FILE="${SCRATCH_DIR}/annotations/image_info_test-dev2017.json"
9393

94-
# # Build TFRecords of the image data.
94+
# Build TFRecords of the image data.
9595
cd "${CURRENT_DIR}"
9696
python object_detection/dataset_tools/create_coco_tf_record.py \
9797
--logtostderr \

research/object_detection/eval_util.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -79,7 +79,7 @@ def visualize_detection_results(result_dict,
7979
data corresponding to each image being evaluated. The following keys
8080
are required:
8181
'original_image': a numpy array representing the image with shape
82-
[1, height, width, 3]
82+
[1, height, width, 3] or [1, height, width, 1]
8383
'detection_boxes': a numpy array of shape [N, 4]
8484
'detection_scores': a numpy array of shape [N]
8585
'detection_classes': a numpy array of shape [N]
@@ -133,6 +133,8 @@ def visualize_detection_results(result_dict,
133133
category_index = label_map_util.create_category_index(categories)
134134

135135
image = np.squeeze(result_dict[input_fields.original_image], axis=0)
136+
if image.shape[2] == 1: # If one channel image, repeat in RGB.
137+
image = np.tile(image, [1, 1, 3])
136138
detection_boxes = result_dict[detection_fields.detection_boxes]
137139
detection_scores = result_dict[detection_fields.detection_scores]
138140
detection_classes = np.int32((result_dict[

research/object_detection/evaluator.py

Lines changed: 12 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -94,14 +94,24 @@ def _extract_predictions_and_losses(model,
9494
if fields.InputDataFields.groundtruth_group_of in input_dict:
9595
groundtruth[fields.InputDataFields.groundtruth_group_of] = (
9696
input_dict[fields.InputDataFields.groundtruth_group_of])
97+
groundtruth_masks_list = None
9798
if fields.DetectionResultFields.detection_masks in detections:
9899
groundtruth[fields.InputDataFields.groundtruth_instance_masks] = (
99100
input_dict[fields.InputDataFields.groundtruth_instance_masks])
101+
groundtruth_masks_list = [
102+
input_dict[fields.InputDataFields.groundtruth_instance_masks]]
103+
groundtruth_keypoints_list = None
104+
if fields.DetectionResultFields.detection_keypoints in detections:
105+
groundtruth[fields.InputDataFields.groundtruth_keypoints] = (
106+
input_dict[fields.InputDataFields.groundtruth_keypoints])
107+
groundtruth_keypoints_list = [
108+
input_dict[fields.InputDataFields.groundtruth_keypoints]]
100109
label_id_offset = 1
101110
model.provide_groundtruth(
102111
[input_dict[fields.InputDataFields.groundtruth_boxes]],
103112
[tf.one_hot(input_dict[fields.InputDataFields.groundtruth_classes]
104-
- label_id_offset, depth=model.num_classes)])
113+
- label_id_offset, depth=model.num_classes)],
114+
groundtruth_masks_list, groundtruth_keypoints_list)
105115
losses_dict.update(model.loss(prediction_dict, true_image_shapes))
106116

107117
result_dict = eval_util.result_dict_for_single_example(
@@ -205,7 +215,7 @@ def _process_batch(tensor_dict, sess, batch_index, counters,
205215
except tf.errors.InvalidArgumentError:
206216
logging.info('Skipping image')
207217
counters['skipped'] += 1
208-
return {}
218+
return {}, {}
209219
global_step = tf.train.global_step(sess, tf.train.get_global_step())
210220
if batch_index < eval_config.num_visualizations:
211221
tag = 'image-{}'.format(batch_index)

research/object_detection/g3doc/detection_model_zoo.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,9 @@ In the table below, we list each such pre-trained model including:
1919
aware that these timings depend highly on one's specific hardware
2020
configuration (these timings were performed using an Nvidia
2121
GeForce GTX TITAN X card) and should be treated more as relative timings in
22-
many cases.
22+
many cases. Also note that desktop GPU timing does not always reflect mobile
23+
run time. For example Mobilenet V2 is faster on mobile devices than Mobilenet
24+
V1, but is slightly slower on desktop GPU.
2325
* detector performance on subset of the COCO validation set or Open Images test split as measured by the dataset-specific mAP measure.
2426
Here, higher is better, and we only report bounding box mAP rounded to the
2527
nearest integer.
@@ -68,6 +70,7 @@ Some remarks on frozen inference graphs:
6870
| Model name | Speed (ms) | COCO mAP[^1] | Outputs |
6971
| ------------ | :--------------: | :--------------: | :-------------: |
7072
| [ssd_mobilenet_v1_coco](http://download.tensorflow.org/models/object_detection/ssd_mobilenet_v1_coco_2017_11_17.tar.gz) | 30 | 21 | Boxes |
73+
| [ssd_mobilenet_v2_coco](http://download.tensorflow.org/models/object_detection/ssd_mobilenet_v2_coco_2018_03_29.tar.gz) | 31 | 22 | Boxes |
7174
| [ssd_inception_v2_coco](http://download.tensorflow.org/models/object_detection/ssd_inception_v2_coco_2017_11_17.tar.gz) | 42 | 24 | Boxes |
7275
| [faster_rcnn_inception_v2_coco](http://download.tensorflow.org/models/object_detection/faster_rcnn_inception_v2_coco_2018_01_28.tar.gz) | 58 | 28 | Boxes |
7376
| [faster_rcnn_resnet50_coco](http://download.tensorflow.org/models/object_detection/faster_rcnn_resnet50_coco_2018_01_28.tar.gz) | 89 | 30 | Boxes |

research/object_detection/g3doc/running_pets.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ environment variable below:
3737
export YOUR_GCS_BUCKET=${YOUR_GCS_BUCKET}
3838
```
3939

40-
It is also possible to run locally by following
40+
It is also possible to run locally by following
4141
[the running locally instructions](running_locally.md).
4242

4343
## Installing Tensorflow and the Tensorflow Object Detection API

0 commit comments

Comments
 (0)