Skip to content

Commit 010ae9a

Browse files
johan-hultberg-workdanielmyhDaniel Myhrmanisak-jakobsson
authored
Align vdo-larod model input shapes (#421)
Co-authored-by: Daniel Myhrman <[email protected]> Co-authored-by: Daniel Myhrman <[email protected]> Co-authored-by: Isak Jakobsson <[email protected]>
1 parent 5054787 commit 010ae9a

File tree

9 files changed

+89
-64
lines changed

9 files changed

+89
-64
lines changed

tensorflow-to-larod-artpec8/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -71,7 +71,7 @@ If your machine doesn't have the hardware requisites, like not enough GPU to tra
7171

7272
### The example model
7373

74-
In this tutorial, we'll train a simple model with one input and two outputs. The input to the model is a FP32 RGB image scaled to the [0, 1] range and of shape `(480, 270, 3)`.
74+
In this tutorial, we'll train a simple model with one input and two outputs. The input to the model is a FP32 RGB image scaled to the [0, 1] range and of shape `(256, 256, 3)`.
7575
The output of the model are two separate tensors of shape `(1,)`, representing the model's confidences for the presence of `person` and `car`. The outputs are configured as such, and not as one tensor with a SoftMax activation, in order to demonstrate how to use multiple outputs.
7676
However, the general process of making a camera-compatible model is the same irrespective of the dimensions or number of inputs or outputs.
7777

@@ -84,7 +84,7 @@ In order to produce a model with BatchNormalization layers that are fused with t
8484
Specifically, the convolutional layers need to not use bias, e.g., for Keras Conv2D layers have the `use_bias=False` parameter set, and the layer order needs to be: `convolutional layer -> batch normalization -> relu`.
8585
This will "fold" , or "fuse", the batch normalization, which increases performance.
8686

87-
The pre-trained model is trained on the MS COCO 2017 **training** dataset, which is significantly larger than the supplied MS COCO 2017 **validation** dataset. After training it for 8 epochs and fine-tuning the model with quantization for 4 epochs, it achieves around 85% validation accuracy on both the people output and the car output with 6.6 million parameters. This model is saved in the frozen graph format in the `/env/output_models` directory.
87+
To replicate the model training used for the model in [vdo-larod](../vdo-larod/), utilize the MS COCO 2017 **training** dataset, which is significantly larger than the provided MS COCO 2017 **validation** dataset. After training for 12 epochs and fine-tuning the model with quantization for 1 epoch, it achieves good accuracy on both people and cars.
8888

8989
### Model training and quantization
9090

tensorflow-to-larod-artpec8/env/training/model.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,7 @@ def _residual_block(x, n_filters, strides):
5454
return x
5555

5656

57-
def create_model(n_blocks=4, n_filters=16, input_shape=(480, 270, 3)):
57+
def create_model(n_blocks=4, n_filters=16, input_shape=(256, 256, 3)):
5858
""" Defines and instantiates a model.
5959
6060
Args:

tensorflow-to-larod-artpec8/env/training/train.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -147,9 +147,9 @@ def train_model(data_generator, trained_model_path, model_configuration, train_e
147147
parser.add_argument('-a', '--annotations', type=str, required=True,
148148
help='path to the .json-file containing COCO instance \
149149
annotations')
150-
parser.add_argument('--input-width', type=int, default=480,
150+
parser.add_argument('--input-width', type=int, default=256,
151151
help='The width of the model\'s input image')
152-
parser.add_argument('--input-height', type=int, default=270,
152+
parser.add_argument('--input-height', type=int, default=256,
153153
help='The height of the model\'s input image')
154154
parser.add_argument('-e', '--training-epochs', type=int, default=8,
155155
help='number of training epochs')
@@ -158,7 +158,7 @@ def train_model(data_generator, trained_model_path, model_configuration, train_e
158158
args = parser.parse_args()
159159

160160
print('Using TensorFlow version: {}'.format(tf.__version__))
161-
data_generator = DataGenerator(args.images, args.annotations, batch_size=8,
161+
data_generator = DataGenerator(args.images, args.annotations, batch_size=16,
162162
width=args.input_width, height=args.input_height)
163163

164164
trained_model_path = '/env/models/fp32_model/model'

tensorflow-to-larod-artpec8/env/training/utils.py

Lines changed: 2 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -33,8 +33,8 @@ class SimpleCOCODataGenerator(Sequence):
3333
reprocesses it to simply output whether a certain class exists in
3434
a given image.
3535
"""
36-
def __init__(self, samples_dir, annotation_path, width=480, height=270,
37-
batch_size=2, shuffle=True, balance=True):
36+
def __init__(self, samples_dir, annotation_path, width=256, height=256,
37+
batch_size=16, shuffle=True, balance=True):
3838
""" Initializes the data generator.
3939
4040
Args:
@@ -169,10 +169,6 @@ def _generate_batch(self, batch_annotations):
169169
img_path = os.path.join(self.samples_dir, annotation['file_name'])
170170
img = Image.open(img_path).resize((self.width, self.height))
171171

172-
# Horizontal flipping with p=0.5
173-
if np.random.random() >= 0.5:
174-
img = img.transpose(Image.FLIP_LEFT_RIGHT)
175-
176172
X[i, ] = np.array(img)
177173
y_person[i, ] = annotation['has_person']
178174
y_car[i, ] = annotation['has_car']

vdo-larod/Dockerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ ARG CHIP
1616
# Download the pretrained model
1717
ARG MODEL_BUCKET=https://acap-ml-models.s3.amazonaws.com/tensorflow_to_larod_resnet
1818
RUN if [ "$CHIP" = artpec8 ] || [ "$CHIP" = artpec9 ] || [ "$CHIP" = cpu ] ; then \
19-
curl -o model.tflite $MODEL_BUCKET/custom_resnet_artpec8_car_human_480x270.tflite ; \
19+
curl -o model.tflite $MODEL_BUCKET/custom_resnet_artpec8_car_human_256.tflite ; \
2020
elif [ "$CHIP" = edgetpu ]; then \
2121
curl -o model.tflite $MODEL_BUCKET/custom_resnet_edgetpu_car_human_256.tflite ; \
2222
elif [ "$CHIP" = cv25 ]; then \

vdo-larod/README.md

Lines changed: 77 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -33,8 +33,8 @@ See the manifest.json.* files to change the configuration on chip, image size, n
3333

3434
## Which backends and models are supported?
3535

36-
Unless you modify the app to your own needs you should only use our pretrained model that takes 480x270 (256x256 for Ambarella CV25 and Google TPU) RGB (interleaved or planar) images as input,
37-
and that outputs an array of 2 confidence scores of person and car in the format of `float32`.
36+
Unless you modify the app to your own needs you should only use our pretrained model that takes 256x256 RGB (interleaved or planar) images as input,
37+
and that outputs an array of 2 confidence scores of person and car in the format of `uint8`.
3838

3939
You can run the example with any inference backend as long as you can provide it with a model as described above.
4040

@@ -280,67 +280,96 @@ In previous larod versions, the chip was referred to as a number instead of a st
280280
```sh
281281
----- Contents of SYSTEM_LOG for 'vdo_larod' -----
282282

283-
284-
vdo_larod[584171]: Starting /usr/local/packages/vdo_larod/vdo_larod
285-
vdo_larod[584171]: chooseStreamResolution: We select stream w/h=480 x 270 based on VDO channel info.
286-
vdo_larod[584171]: Creating VDO image provider and creating stream 480 x 270
287-
vdo_larod[584171]: Setting up larod connection with chip axis-a8-dlpu-tflite and model file /usr/local/packages/vdo_larod/model/model.tflite
288-
vdo_larod[584171]: Loading the model... This might take up to 5 minutes depending on your device model.
289-
vdo_larod[584171]: Model loaded successfully
290-
vdo_larod[584171]: Created mmaped model output 0 with size 1
291-
vdo_larod[584171]: Created mmaped model output 1 with size 1
292-
vdo_larod[584171]: Start fetching video frames from VDO
293-
294-
vdo_larod[584171]: Ran pre-processing for 2 ms
295-
vdo_larod[584171]: Ran inference for 16 ms
296-
vdo_larod[584171]: Person detected: 65.14% - Car detected: 11.92%
297-
298-
vdo_larod[4165]: Exit /usr/local/packages/vdo_larod/vdo_larod
283+
vdo_larod[141742]: Starting /usr/local/packages/vdo_larod/vdo_larod
284+
vdo_larod[141742]: choose_stream_resolution: We select stream w/h=480 x 270 based on VDO channel info.
285+
vdo_larod[141742]: Creating VDO image provider and creating stream 480 x 270
286+
vdo_larod[141742]: 'buffer.count'-----: <uint32 2>
287+
vdo_larod[141742]: 'dynamic.framerate': <true>
288+
vdo_larod[141742]: 'format'-----------: <uint32 3>
289+
vdo_larod[141742]: 'framerate'--------: <30.0>
290+
vdo_larod[141742]: 'height'-----------: <uint32 270>
291+
vdo_larod[141742]: 'input'------------: <uint32 1>
292+
vdo_larod[141742]: 'socket.blocking'--: <false>
293+
vdo_larod[141742]: 'width'------------: <uint32 480>
294+
vdo_larod[141742]: Dump of vdo stream settings map =====
295+
vdo_larod[141742]: Setting up larod connection with chip axis-a8-dlpu-tflite and model file /usr/local/packages/vdo_larod/model/model.tflite
296+
vdo_larod[141742]: Loading the model... This might take up to 5 minutes depending on your device model.
297+
vdo_larod[141742]: Model loaded successfully
298+
vdo_larod[141742]: Calculate crop image
299+
vdo_larod[141742]: Crop input image X=105 Y=0 (270 x 270)
300+
vdo_larod[141742]: Created mmaped model output 0 with size 1
301+
vdo_larod[141742]: Created mmaped model output 1 with size 1
302+
303+
vdo_larod[141742]: Ran pre-processing for 3 ms
304+
vdo_larod[141742]: Ran inference for 14 ms
305+
vdo_larod[141742]: Person detected: 100.00% - Car detected: 3.14%
306+
307+
vdo_larod[141742]: Exit /usr/local/packages/vdo_larod/vdo_larod
299308
```
300309
301310
#### Output - ARTPEC-9 with TensorFlow Lite
302311
303312
```sh
304313
----- Contents of SYSTEM_LOG for 'vdo_larod' -----
305314

306-
vdo_larod[584171]: Starting /usr/local/packages/vdo_larod/vdo_larod
307-
vdo_larod[584171]: chooseStreamResolution: We select stream w/h=480 x 270 based on VDO channel info.
308-
vdo_larod[584171]: Creating VDO image provider and creating stream 480 x 270
309-
vdo_larod[584171]: Setting up larod connection with chip a9-dlpu-tflite and model file /usr/local/packages/vdo_larod/model/model.tflite
310-
vdo_larod[584171]: Loading the model... This might take up to 5 minutes depending on your device model.
311-
vdo_larod[584171]: Model loaded successfully
312-
vdo_larod[584171]: Created mmaped model output 0 with size 1
313-
vdo_larod[584171]: Created mmaped model output 1 with size 1
314-
vdo_larod[584171]: Start fetching video frames from VDO
315-
316-
vdo_larod[584171]: Ran pre-processing for 2 ms
317-
vdo_larod[584171]: Ran inference for 7 ms
318-
vdo_larod[584171]: Person detected: 65.14% - Car detected: 11.92%
319315

320-
vdo_larod[4165]: Exit /usr/local/packages/vdo_larod/vdo_larod
316+
vdo_larod[3991067]: Starting /usr/local/packages/vdo_larod/vdo_larod
317+
vdo_larod[3991067]: choose_stream_resolution: We select stream w/h=480 x 360 based on VDO channel info.
318+
vdo_larod[3991067]: Creating VDO image provider and creating stream 480 x 360
319+
vdo_larod[3991067]: 'buffer.count'-----: <uint32 2>
320+
vdo_larod[3991067]: 'dynamic.framerate': <true>
321+
vdo_larod[3991067]: 'format'-----------: <uint32 3>
322+
vdo_larod[3991067]: 'framerate'--------: <30.0>
323+
vdo_larod[3991067]: 'height'-----------: <uint32 360>
324+
vdo_larod[3991067]: 'input'------------: <uint32 1>
325+
vdo_larod[3991067]: 'socket.blocking'--: <false>
326+
vdo_larod[3991067]: 'width'------------: <uint32 480>
327+
vdo_larod[3991067]: Dump of vdo stream settings map =====
328+
vdo_larod[3991067]: Setting up larod connection with chip a9-dlpu-tflite and model file /usr/local/packages/vdo_larod/model/model.tflite
329+
vdo_larod[3991067]: Loading the model... This might take up to 5 minutes depending on your device model.
330+
vdo_larod[3991067]: Model loaded successfully
331+
vdo_larod[3991067]: Calculate crop image
332+
vdo_larod[3991067]: Crop input image X=0 Y=60 (360 x 360)
333+
vdo_larod[3991067]: Created mmaped model output 0 with size 1
334+
vdo_larod[3991067]: Created mmaped model output 1 with size 1
335+
vdo_larod[3991067]: Start fetching video frames from VDO
336+
vdo_larod[3991067]: Ran pre-processing for 13 ms
337+
vdo_larod[3991067]: Ran inference for 5 ms
338+
vdo_larod[3991067]: Person detected: 100.00% - Car detected: 3.14%
339+
340+
vdo_larod[3991067]: Exit /usr/local/packages/vdo_larod/vdo_larod
321341
```
322342
323343
#### Output - CPU with TensorFlow Lite
324344
325345
```sh
326346
----- Contents of SYSTEM_LOG for 'vdo_larod' -----
327347

328-
vdo_larod[584171]: Starting /usr/local/packages/vdo_larod/vdo_larod
329-
vdo_larod[584171]: chooseStreamResolution: We select stream w/h=480 x 270 based on VDO channel info.
330-
vdo_larod[584171]: Creating VDO image provider and creating stream 480 x 270
331-
vdo_larod[584171]: Setting up larod connection with chip cpu-tflite and model file /usr/local/packages/vdo_larod/model/model.tflite
332-
vdo_larod[584171]: Loading the model... This might take up to 5 minutes depending on your device model.
333-
vdo_larod[584171]: Model loaded successfully
334-
vdo_larod[584171]: Created mmaped model output 0 with size 1
335-
vdo_larod[584171]: Created mmaped model output 1 with size 1
336-
vdo_larod[584171]: Start fetching video frames from VDO
337-
338-
vdo_larod[584171]: Ran pre-processing for 3 ms
339-
vdo_larod[584171]: Ran inference for 2594 ms
340-
vdo_larod[584171]: Change VDO stream framerate to 1.000000 because of too long inference time
341-
vdo_larod[584171]: Person detected: 65.14% - Car detected: 11.92%
342-
343-
vdo_larod[4165]: Exit /usr/local/packages/vdo_larod/vdo_larod
348+
vdo_larod[145071]: Starting /usr/local/packages/vdo_larod/vdo_larod
349+
vdo_larod[145071]: choose_stream_resolution: We select stream w/h=480 x 270 based on VDO channel info.
350+
vdo_larod[145071]: Creating VDO image provider and creating stream 480 x 270
351+
vdo_larod[145071]: Dump of vdo stream settings map =====
352+
vdo_larod[145071]: 'buffer.count'-----: <uint32 2>
353+
vdo_larod[145071]: 'dynamic.framerate': <true>
354+
vdo_larod[145071]: 'format'-----------: <uint32 3>
355+
vdo_larod[145071]: 'framerate'--------: <30.0>
356+
vdo_larod[145071]: 'height'-----------: <uint32 270>
357+
vdo_larod[145071]: 'input'------------: <uint32 1>
358+
vdo_larod[145071]: 'socket.blocking'--: <false>
359+
vdo_larod[145071]: 'width'------------: <uint32 480>
360+
vdo_larod[145071]: Setting up larod connection with chip cpu-tflite and model file /usr/local/packages/vdo_larod/model/model.tflite
361+
vdo_larod[145071]: Loading the model... This might take up to 5 minutes depending on your device model.
362+
vdo_larod[145071]: Model loaded successfully
363+
vdo_larod[145071]: Calculate crop image
364+
vdo_larod[145071]: Crop input image X=105 Y=0 (270 x 270)
365+
vdo_larod[145071]: Created mmaped model output 0 with size 1
366+
vdo_larod[145071]: Created mmaped model output 1 with size 1
367+
vdo_larod[145071]: Start fetching video frames from VDO
368+
vdo_larod[145071]: Ran pre-processing for 3 ms
369+
vdo_larod[145071]: Ran inference for 545 ms
370+
vdo_larod[145071]: Person detected: 100.00% - Car detected: 3.14%
371+
372+
vdo_larod[145071]: Exit /usr/local/packages/vdo_larod/vdo_larod
344373
```
345374
346375
#### Output - Google TPU

vdo-larod/app/manifest.json.artpec8

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
"appName": "vdo_larod",
77
"vendor": "Axis Communications",
88
"embeddedSdkVersion": "3.0",
9-
"runOptions": "axis-a8-dlpu-tflite /usr/local/packages/vdo_larod/model/model.tflite 480 270",
9+
"runOptions": "axis-a8-dlpu-tflite /usr/local/packages/vdo_larod/model/model.tflite 256 256",
1010
"vendorUrl": "https://www.axis.com",
1111
"runMode": "never",
1212
"version": "1.0.0"

vdo-larod/app/manifest.json.artpec9

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
"appName": "vdo_larod",
77
"vendor": "Axis Communications",
88
"embeddedSdkVersion": "3.0",
9-
"runOptions": "a9-dlpu-tflite /usr/local/packages/vdo_larod/model/model.tflite 480 270",
9+
"runOptions": "a9-dlpu-tflite /usr/local/packages/vdo_larod/model/model.tflite 256 256",
1010
"vendorUrl": "https://www.axis.com",
1111
"runMode": "never",
1212
"version": "1.0.0"

vdo-larod/app/manifest.json.cpu

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
"appName": "vdo_larod",
77
"vendor": "Axis Communications",
88
"embeddedSdkVersion": "3.0",
9-
"runOptions": "cpu-tflite /usr/local/packages/vdo_larod/model/model.tflite 480 270",
9+
"runOptions": "cpu-tflite /usr/local/packages/vdo_larod/model/model.tflite 256 256",
1010
"vendorUrl": "https://www.axis.com",
1111
"runMode": "never",
1212
"version": "1.0.0"

0 commit comments

Comments
 (0)