Skip to content

Commit 001a2a6

Browse files
pkulzcsguada
authored andcommitted
Internal changes for object detection. (#3656)
* Force cast of num_classes to integer PiperOrigin-RevId: 188335318 * Updating config util to allow overwriting of cosine decay learning rates. PiperOrigin-RevId: 188338852 * Make box_list_ops.py and box_list_ops_test.py work with C API enabled. The C API has improved shape inference over the original Python code. This causes some previously-working conds to fail. Switching to smart_cond fixes this. Another effect of the improved shape inference is that one of the failures tested gets caught earlier, so I modified the test to reflect this. PiperOrigin-RevId: 188409792 * Fix parallel event file writing issue. Without this change, the event files might get corrupted when multiple evaluations are run in parallel. PiperOrigin-RevId: 188502560 * Deprecating the boolean flag of from_detection_checkpoint. Replace with a string field fine_tune_checkpoint_type to train_config to provide extensibility. The fine_tune_checkpoint_type can currently take value of `detection`, `classification`, or others when the restore_map is overwritten. PiperOrigin-RevId: 188518685 * Automated g4 rollback of changelist 188502560 PiperOrigin-RevId: 188519969 * Introducing eval metrics specs for Coco Mask metrics. This allows metrics to be computed in tensorflow using the tf.learn Estimator. PiperOrigin-RevId: 188528485 * Minor fix to make object_detection/metrics/coco_evaluation.py python3 compatible. PiperOrigin-RevId: 188550683 * Updating eval_util to handle eval_metric_ops from multiple `DetectionEvaluator`s. PiperOrigin-RevId: 188560474 * Allow tensor input for new_height and new_width for resize_image. PiperOrigin-RevId: 188561908 * Fix typo in fine_tune_checkpoint_type name in trainer. PiperOrigin-RevId: 188799033 * Adding mobilenet feature extractor to object detection. PiperOrigin-RevId: 188916897 * Allow label maps to optionally contain an explicit background class with id zero. PiperOrigin-RevId: 188951089 * Fix boundary conditions in random_pad_to_aspect_ratio to ensure that min_scale is always less than max_scale. PiperOrigin-RevId: 189026868 * Fallback on from_detection_checkpoint option if fine_tune_checkpoint_type isn't set. PiperOrigin-RevId: 189052833 * Add proper names for learning rate schedules so we don't see cryptic names on tensorboard. PiperOrigin-RevId: 189069837 * Enforcing that all datasets are batched (and then unbatched in the model) with batch_size >= 1. PiperOrigin-RevId: 189117178 * Adding regularization to total loss returned from DetectionModel.loss(). PiperOrigin-RevId: 189189123 * Standardize the names of loss scalars (for SSD, Faster R-CNN and R-FCN) in both training and eval so they can be compared on tensorboard. Log localization and classification losses in evaluation. PiperOrigin-RevId: 189189940 * Remove negative test from box list ops test. PiperOrigin-RevId: 189229327 * Add an option to warmup learning rate in manual stepping schedule. PiperOrigin-RevId: 189361039 * Replace tf.contrib.slim.tfexample_decoder.LookupTensor with object_detection.data_decoders.tf_example_decoder.LookupTensor. PiperOrigin-RevId: 189388556 * Force regularization summary variables under specific family names. PiperOrigin-RevId: 189393190 * Automated g4 rollback of changelist 188619139 PiperOrigin-RevId: 189396001 * Remove step 0 schedule since we do a hard check for it after cl/189361039 PiperOrigin-RevId: 189396697 * PiperOrigin-RevId: 189040463 * PiperOrigin-RevId: 189059229 * PiperOrigin-RevId: 189214402 * Force regularization summary variables under specific family names. PiperOrigin-RevId: 189393190 * Automated g4 rollback of changelist 188619139 PiperOrigin-RevId: 189396001 * Make slim python3 compatible. * Monir fixes. * Add TargetAssignment summaries in a separate family. PiperOrigin-RevId: 189407487 * 1. Setting `family` keyword arg prepends the summary names twice with the same name. Directly adding family suffix to the name gets rid of this problem. 2. Make sure the eval losses have the same name. PiperOrigin-RevId: 189434618 * Minor fixes to make object detection tf 1.4 compatible. PiperOrigin-RevId: 189437519 * Call the base of mobilenet_v1 feature extractor under the right arg scope and set batchnorm is_training based on the value passed in the constructor. PiperOrigin-RevId: 189460890 * Automated g4 rollback of changelist 188409792 PiperOrigin-RevId: 189463882 * Update object detection syncing. PiperOrigin-RevId: 189601955 * Add an option to warmup learning rate, hold it constant for a certain number of steps and cosine decay it. PiperOrigin-RevId: 189606169 * Let the proposal feature extractor function in faster_rcnn meta architectures return the activations (end_points). PiperOrigin-RevId: 189619301 * Fixed bug which caused masks to be mostly zeros (caused by detection_boxes being in absolute coordinates if scale_to_absolute=True. PiperOrigin-RevId: 189641294 * Open sourcing Mobilenetv2 + SSDLite. PiperOrigin-RevId: 189654520 * Remove unused files.
1 parent 2913cb2 commit 001a2a6

File tree

93 files changed

+2106
-462
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

93 files changed

+2106
-462
lines changed

research/object_detection/builders/dataset_builder.py

Lines changed: 36 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -30,8 +30,8 @@
3030
from object_detection.utils import dataset_util
3131

3232

33-
def _get_padding_shapes(dataset, max_num_boxes, num_classes,
34-
spatial_image_shape):
33+
def _get_padding_shapes(dataset, max_num_boxes=None, num_classes=None,
34+
spatial_image_shape=None):
3535
"""Returns shapes to pad dataset tensors to before batching.
3636
3737
Args:
@@ -41,23 +41,28 @@ def _get_padding_shapes(dataset, max_num_boxes, num_classes,
4141
num_classes: Number of classes in the dataset needed to compute shapes for
4242
padding.
4343
spatial_image_shape: A list of two integers of the form [height, width]
44-
containing expected spatial shape of the imaage.
44+
containing expected spatial shape of the image.
4545
4646
Returns:
4747
A dictionary keyed by fields.InputDataFields containing padding shapes for
4848
tensors in the dataset.
49+
50+
Raises:
51+
ValueError: If groundtruth classes is neither rank 1 nor rank 2.
4952
"""
50-
height, width = spatial_image_shape
53+
54+
if not spatial_image_shape or spatial_image_shape == [-1, -1]:
55+
height, width = None, None
56+
else:
57+
height, width = spatial_image_shape # pylint: disable=unpacking-non-sequence
58+
5159
padding_shapes = {
5260
fields.InputDataFields.image: [height, width, 3],
5361
fields.InputDataFields.source_id: [],
5462
fields.InputDataFields.filename: [],
5563
fields.InputDataFields.key: [],
5664
fields.InputDataFields.groundtruth_difficult: [max_num_boxes],
5765
fields.InputDataFields.groundtruth_boxes: [max_num_boxes, 4],
58-
fields.InputDataFields.groundtruth_classes: [
59-
max_num_boxes, num_classes
60-
],
6166
fields.InputDataFields.groundtruth_instance_masks: [max_num_boxes, height,
6267
width],
6368
fields.InputDataFields.groundtruth_is_crowd: [max_num_boxes],
@@ -69,6 +74,21 @@ def _get_padding_shapes(dataset, max_num_boxes, num_classes,
6974
fields.InputDataFields.groundtruth_label_scores: [max_num_boxes],
7075
fields.InputDataFields.true_image_shape: [3]
7176
}
77+
# Determine whether groundtruth_classes are integers or one-hot encodings, and
78+
# apply batching appropriately.
79+
classes_shape = dataset.output_shapes[
80+
fields.InputDataFields.groundtruth_classes]
81+
if len(classes_shape) == 1: # Class integers.
82+
padding_shapes[fields.InputDataFields.groundtruth_classes] = [max_num_boxes]
83+
elif len(classes_shape) == 2: # One-hot or k-hot encoding.
84+
padding_shapes[fields.InputDataFields.groundtruth_classes] = [
85+
max_num_boxes, num_classes]
86+
else:
87+
raise ValueError('Groundtruth classes must be a rank 1 tensor (classes) or '
88+
'rank 2 tensor (one-hot encodings)')
89+
90+
if fields.InputDataFields.original_image in dataset.output_shapes:
91+
padding_shapes[fields.InputDataFields.original_image] = [None, None, 3]
7292
if fields.InputDataFields.groundtruth_keypoints in dataset.output_shapes:
7393
tensor_shape = dataset.output_shapes[fields.InputDataFields.
7494
groundtruth_keypoints]
@@ -87,37 +107,32 @@ def _get_padding_shapes(dataset, max_num_boxes, num_classes,
87107

88108

89109
def build(input_reader_config, transform_input_data_fn=None,
90-
batch_size=1, max_num_boxes=None, num_classes=None,
110+
batch_size=None, max_num_boxes=None, num_classes=None,
91111
spatial_image_shape=None):
92112
"""Builds a tf.data.Dataset.
93113
94114
Builds a tf.data.Dataset by applying the `transform_input_data_fn` on all
95-
records. Optionally, if `batch_size` > 1 and `max_num_boxes`, `num_classes`
96-
and `spatial_image_shape` are not None, returns a padded batched
97-
tf.data.Dataset.
115+
records. Applies a padded batch to the resulting dataset.
98116
99117
Args:
100118
input_reader_config: A input_reader_pb2.InputReader object.
101119
transform_input_data_fn: Function to apply to all records, or None if
102120
no extra decoding is required.
103-
batch_size: Batch size. If not None, returns a padded batch dataset.
104-
max_num_boxes: Max number of groundtruth boxes needed to computes shapes for
105-
padding. This is only used if batch_size is greater than 1.
121+
batch_size: Batch size. If None, batching is not performed.
122+
max_num_boxes: Max number of groundtruth boxes needed to compute shapes for
123+
padding. If None, will use a dynamic shape.
106124
num_classes: Number of classes in the dataset needed to compute shapes for
107-
padding. This is only used if batch_size is greater than 1.
108-
spatial_image_shape: a list of two integers of the form [height, width]
125+
padding. If None, will use a dynamic shape.
126+
spatial_image_shape: A list of two integers of the form [height, width]
109127
containing expected spatial shape of the image after applying
110-
transform_input_data_fn. This is needed to compute shapes for padding and
111-
only used if batch_size is greater than 1.
128+
transform_input_data_fn. If None, will use dynamic shapes.
112129
113130
Returns:
114131
A tf.data.Dataset based on the input_reader_config.
115132
116133
Raises:
117134
ValueError: On invalid input reader proto.
118135
ValueError: If no input paths are specified.
119-
ValueError: If batch_size > 1 and any of (max_num_boxes, num_classes,
120-
spatial_image_shape) is None.
121136
"""
122137
if not isinstance(input_reader_config, input_reader_pb2.InputReader):
123138
raise ValueError('input_reader_config not of type '
@@ -147,14 +162,7 @@ def process_fn(value):
147162
functools.partial(tf.data.TFRecordDataset, buffer_size=8 * 1000 * 1000),
148163
process_fn, config.input_path[:], input_reader_config)
149164

150-
if batch_size > 1:
151-
if num_classes is None:
152-
raise ValueError('`num_classes` must be set when batch_size > 1.')
153-
if max_num_boxes is None:
154-
raise ValueError('`max_num_boxes` must be set when batch_size > 1.')
155-
if spatial_image_shape is None:
156-
raise ValueError('`spatial_image_shape` must be set when batch_size > '
157-
'1 .')
165+
if batch_size:
158166
padding_shapes = _get_padding_shapes(dataset, max_num_boxes, num_classes,
159167
spatial_image_shape)
160168
dataset = dataset.apply(

research/object_detection/builders/dataset_builder_test.py

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -91,7 +91,7 @@ def test_build_tf_record_input_reader(self):
9191
input_reader_proto = input_reader_pb2.InputReader()
9292
text_format.Merge(input_reader_text_proto, input_reader_proto)
9393
tensor_dict = dataset_util.make_initializable_iterator(
94-
dataset_builder.build(input_reader_proto)).get_next()
94+
dataset_builder.build(input_reader_proto, batch_size=1)).get_next()
9595

9696
sv = tf.train.Supervisor(logdir=self.get_temp_dir())
9797
with sv.prepare_or_wait_for_session() as sess:
@@ -100,15 +100,15 @@ def test_build_tf_record_input_reader(self):
100100

101101
self.assertTrue(
102102
fields.InputDataFields.groundtruth_instance_masks not in output_dict)
103-
self.assertEquals((4, 5, 3),
103+
self.assertEquals((1, 4, 5, 3),
104104
output_dict[fields.InputDataFields.image].shape)
105-
self.assertEquals([2],
106-
output_dict[fields.InputDataFields.groundtruth_classes])
105+
self.assertAllEqual([[2]],
106+
output_dict[fields.InputDataFields.groundtruth_classes])
107107
self.assertEquals(
108-
(1, 4), output_dict[fields.InputDataFields.groundtruth_boxes].shape)
108+
(1, 1, 4), output_dict[fields.InputDataFields.groundtruth_boxes].shape)
109109
self.assertAllEqual(
110110
[0.0, 0.0, 1.0, 1.0],
111-
output_dict[fields.InputDataFields.groundtruth_boxes][0])
111+
output_dict[fields.InputDataFields.groundtruth_boxes][0][0])
112112

113113
def test_build_tf_record_input_reader_and_load_instance_masks(self):
114114
tf_record_path = self.create_tf_record()
@@ -124,14 +124,14 @@ def test_build_tf_record_input_reader_and_load_instance_masks(self):
124124
input_reader_proto = input_reader_pb2.InputReader()
125125
text_format.Merge(input_reader_text_proto, input_reader_proto)
126126
tensor_dict = dataset_util.make_initializable_iterator(
127-
dataset_builder.build(input_reader_proto)).get_next()
127+
dataset_builder.build(input_reader_proto, batch_size=1)).get_next()
128128

129129
sv = tf.train.Supervisor(logdir=self.get_temp_dir())
130130
with sv.prepare_or_wait_for_session() as sess:
131131
sv.start_queue_runners(sess)
132132
output_dict = sess.run(tensor_dict)
133133
self.assertAllEqual(
134-
(1, 4, 5),
134+
(1, 1, 4, 5),
135135
output_dict[fields.InputDataFields.groundtruth_instance_masks].shape)
136136

137137
def test_build_tf_record_input_reader_with_batch_size_two(self):

research/object_detection/builders/model_builder.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,13 +36,15 @@
3636
from object_detection.models.ssd_inception_v2_feature_extractor import SSDInceptionV2FeatureExtractor
3737
from object_detection.models.ssd_inception_v3_feature_extractor import SSDInceptionV3FeatureExtractor
3838
from object_detection.models.ssd_mobilenet_v1_feature_extractor import SSDMobileNetV1FeatureExtractor
39+
from object_detection.models.ssd_mobilenet_v2_feature_extractor import SSDMobileNetV2FeatureExtractor
3940
from object_detection.protos import model_pb2
4041

4142
# A map of names to SSD feature extractors.
4243
SSD_FEATURE_EXTRACTOR_CLASS_MAP = {
4344
'ssd_inception_v2': SSDInceptionV2FeatureExtractor,
4445
'ssd_inception_v3': SSDInceptionV3FeatureExtractor,
4546
'ssd_mobilenet_v1': SSDMobileNetV1FeatureExtractor,
47+
'ssd_mobilenet_v2': SSDMobileNetV2FeatureExtractor,
4648
'ssd_resnet50_v1_fpn': ssd_resnet_v1_fpn.SSDResnet50V1FpnFeatureExtractor,
4749
'ssd_resnet101_v1_fpn': ssd_resnet_v1_fpn.SSDResnet101V1FpnFeatureExtractor,
4850
'ssd_resnet152_v1_fpn': ssd_resnet_v1_fpn.SSDResnet152V1FpnFeatureExtractor,

research/object_detection/builders/model_builder_test.py

Lines changed: 76 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,7 @@
3131
from object_detection.models.ssd_inception_v2_feature_extractor import SSDInceptionV2FeatureExtractor
3232
from object_detection.models.ssd_inception_v3_feature_extractor import SSDInceptionV3FeatureExtractor
3333
from object_detection.models.ssd_mobilenet_v1_feature_extractor import SSDMobileNetV1FeatureExtractor
34+
from object_detection.models.ssd_mobilenet_v2_feature_extractor import SSDMobileNetV2FeatureExtractor
3435
from object_detection.protos import model_pb2
3536

3637
FRCNN_RESNET_FEAT_MAPS = {
@@ -368,6 +369,81 @@ def test_create_ssd_mobilenet_v1_model_from_config(self):
368369
self.assertTrue(model._feature_extractor._batch_norm_trainable)
369370
self.assertTrue(model._normalize_loc_loss_by_codesize)
370371

372+
def test_create_ssd_mobilenet_v2_model_from_config(self):
373+
model_text_proto = """
374+
ssd {
375+
feature_extractor {
376+
type: 'ssd_mobilenet_v2'
377+
conv_hyperparams {
378+
regularizer {
379+
l2_regularizer {
380+
}
381+
}
382+
initializer {
383+
truncated_normal_initializer {
384+
}
385+
}
386+
}
387+
batch_norm_trainable: true
388+
}
389+
box_coder {
390+
faster_rcnn_box_coder {
391+
}
392+
}
393+
matcher {
394+
argmax_matcher {
395+
}
396+
}
397+
similarity_calculator {
398+
iou_similarity {
399+
}
400+
}
401+
anchor_generator {
402+
ssd_anchor_generator {
403+
aspect_ratios: 1.0
404+
}
405+
}
406+
image_resizer {
407+
fixed_shape_resizer {
408+
height: 320
409+
width: 320
410+
}
411+
}
412+
box_predictor {
413+
convolutional_box_predictor {
414+
conv_hyperparams {
415+
regularizer {
416+
l2_regularizer {
417+
}
418+
}
419+
initializer {
420+
truncated_normal_initializer {
421+
}
422+
}
423+
}
424+
}
425+
}
426+
normalize_loc_loss_by_codesize: true
427+
loss {
428+
classification_loss {
429+
weighted_softmax {
430+
}
431+
}
432+
localization_loss {
433+
weighted_smooth_l1 {
434+
}
435+
}
436+
}
437+
}"""
438+
model_proto = model_pb2.DetectionModel()
439+
text_format.Merge(model_text_proto, model_proto)
440+
model = self.create_model(model_proto)
441+
self.assertIsInstance(model, ssd_meta_arch.SSDMetaArch)
442+
self.assertIsInstance(model._feature_extractor,
443+
SSDMobileNetV2FeatureExtractor)
444+
self.assertTrue(model._feature_extractor._batch_norm_trainable)
445+
self.assertTrue(model._normalize_loc_loss_by_codesize)
446+
371447
def test_create_embedded_ssd_mobilenet_v1_model_from_config(self):
372448
model_text_proto = """
373449
ssd {

research/object_detection/builders/optimizer_builder.py

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -85,7 +85,8 @@ def _create_learning_rate(learning_rate_config):
8585
learning_rate_type = learning_rate_config.WhichOneof('learning_rate')
8686
if learning_rate_type == 'constant_learning_rate':
8787
config = learning_rate_config.constant_learning_rate
88-
learning_rate = tf.constant(config.learning_rate, dtype=tf.float32)
88+
learning_rate = tf.constant(config.learning_rate, dtype=tf.float32,
89+
name='learning_rate')
8990

9091
if learning_rate_type == 'exponential_decay_learning_rate':
9192
config = learning_rate_config.exponential_decay_learning_rate
@@ -94,7 +95,7 @@ def _create_learning_rate(learning_rate_config):
9495
tf.train.get_or_create_global_step(),
9596
config.decay_steps,
9697
config.decay_factor,
97-
staircase=config.staircase)
98+
staircase=config.staircase, name='learning_rate')
9899

99100
if learning_rate_type == 'manual_step_learning_rate':
100101
config = learning_rate_config.manual_step_learning_rate
@@ -105,7 +106,7 @@ def _create_learning_rate(learning_rate_config):
105106
learning_rate_sequence += [x.learning_rate for x in config.schedule]
106107
learning_rate = learning_schedules.manual_stepping(
107108
tf.train.get_or_create_global_step(), learning_rate_step_boundaries,
108-
learning_rate_sequence)
109+
learning_rate_sequence, config.warmup)
109110

110111
if learning_rate_type == 'cosine_decay_learning_rate':
111112
config = learning_rate_config.cosine_decay_learning_rate
@@ -114,7 +115,8 @@ def _create_learning_rate(learning_rate_config):
114115
config.learning_rate_base,
115116
config.total_steps,
116117
config.warmup_learning_rate,
117-
config.warmup_steps)
118+
config.warmup_steps,
119+
config.hold_base_rate_steps)
118120

119121
if learning_rate is None:
120122
raise ValueError('Learning_rate %s not supported.' % learning_rate_type)

research/object_detection/builders/optimizer_builder_test.py

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,7 @@ def testBuildConstantLearningRate(self):
3535
text_format.Merge(learning_rate_text_proto, learning_rate_proto)
3636
learning_rate = optimizer_builder._create_learning_rate(
3737
learning_rate_proto)
38+
self.assertTrue(learning_rate.op.name.endswith('learning_rate'))
3839
with self.test_session():
3940
learning_rate_out = learning_rate.eval()
4041
self.assertAlmostEqual(learning_rate_out, 0.004)
@@ -52,19 +53,22 @@ def testBuildExponentialDecayLearningRate(self):
5253
text_format.Merge(learning_rate_text_proto, learning_rate_proto)
5354
learning_rate = optimizer_builder._create_learning_rate(
5455
learning_rate_proto)
56+
self.assertTrue(learning_rate.op.name.endswith('learning_rate'))
5557
self.assertTrue(isinstance(learning_rate, tf.Tensor))
5658

5759
def testBuildManualStepLearningRate(self):
5860
learning_rate_text_proto = """
5961
manual_step_learning_rate {
62+
initial_learning_rate: 0.002
6063
schedule {
61-
step: 0
64+
step: 100
6265
learning_rate: 0.006
6366
}
6467
schedule {
6568
step: 90000
6669
learning_rate: 0.00006
6770
}
71+
warmup: true
6872
}
6973
"""
7074
learning_rate_proto = optimizer_pb2.LearningRate()
@@ -80,6 +84,7 @@ def testBuildCosineDecayLearningRate(self):
8084
total_steps: 20000
8185
warmup_learning_rate: 0.0001
8286
warmup_steps: 1000
87+
hold_base_rate_steps: 20000
8388
}
8489
"""
8590
learning_rate_proto = optimizer_pb2.LearningRate()

research/object_detection/core/box_list_ops_test.py

Lines changed: 0 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -727,21 +727,6 @@ def test_concatenate_is_correct(self):
727727

728728
class NonMaxSuppressionTest(tf.test.TestCase):
729729

730-
def test_with_invalid_scores_field(self):
731-
corners = tf.constant([[0, 0, 1, 1],
732-
[0, 0.1, 1, 1.1],
733-
[0, -0.1, 1, 0.9],
734-
[0, 10, 1, 11],
735-
[0, 10.1, 1, 11.1],
736-
[0, 100, 1, 101]], tf.float32)
737-
boxes = box_list.BoxList(corners)
738-
boxes.add_field('scores', tf.constant([.9, .75, .6, .95, .5]))
739-
iou_thresh = .5
740-
max_output_size = 3
741-
with self.assertRaisesWithPredicateMatch(ValueError,
742-
'Dimensions must be equal'):
743-
box_list_ops.non_max_suppression(boxes, iou_thresh, max_output_size)
744-
745730
def test_select_from_three_clusters(self):
746731
corners = tf.constant([[0, 0, 1, 1],
747732
[0, 0.1, 1, 1.1],

0 commit comments

Comments
 (0)