Copyright (C) 2021, Axis Communications AB, Lund, Sweden. All Rights Reserved.
This README file explains how to build an ACAP application that uses:
- the Video capture API (VDO) to fetch frames from e.g. a camera
- the Machine learning API (Larod) to load a graph model and run preprocessing and classification inferences
It is achieved by using the containerized API and toolchain images.
Together with this README file you should be able to find a directory called app. That directory contains the "vdo_larod" application source code, which can easily be compiled and run with the help of the tools and step by step below.
- Axis camera equipped with CPU or DLPU
- Docker
This application opens a client to VDO and starts fetching frames in the RGB or YUV format dependent on platform. Vdo is used to determine if a format is supported or not. The application will try to get the same resolution as requested from VDO. The only limit is the min and max resolution received from VDO. When using this it is important to also check the image so it looks good on the used camera.
Steps in application:
- Fetch image data from VDO.
- If needed preprocess the images (crop to the size required by the neural network (if needed), scale and color convert) using larod with either cpu-proc (libyuv) or ambarella-cvflow-proc backend.
- Run inferences using the trained model on a specific chip with the preprocessing output as input on a larod backend specified by a command-line argument.
- Measure the total inference time (preprocessing and inference time) and determine if the framerate of the vdo streams needs to be adjusted.
- The model's confidence scores for the presence of person and car in the image are printed as the output.
- Repeat until the user ends the application.
See the manifest.json.* files to change the configuration on chip, image size, number of iterations and model path.
Unless you modify the app to your own needs you should only use our pretrained model that takes 256x256 RGB (interleaved or planar) images as input,
and that outputs an array of 2 confidence scores of person and car in the format of uint8.
You can run the example with any inference backend as long as you can provide it with a model as described above.
These instructions will guide you on how to execute the code. Below is the structure and scripts used in the example:
vdo-larod
├── app
│ ├── imgprovider.c
│ ├── imgprovider.h
│ ├── LICENSE
│ ├── Makefile
│ ├── manifest.json.artpec8
│ ├── manifest.json.artpec9
│ ├── manifest.json.cpu
│ ├── manifest.json.cv25
│ ├── manifest.json.edgetpu
│ ├── model.c
│ ├── model.h
│ ├── panic.c
│ ├── panic.h
│ └── vdo_larod.c
├── Dockerfile
└── README.md- app/imgprovider.c/h - Implementation of vdo parts, written in C.
- app/LICENSE - Text file which lists all open source licensed source code distributed with the application.
- app/Makefile - Makefile containing the build and link instructions for building the ACAP application.
- app/manifest.json.artpec8 - Defines the application and its configuration when building for artpec8 DLPU with TensorFlow Lite.
- app/manifest.json.artpec9 - Defines the application and its configuration when building for artpec9 DLPU with TensorFlow Lite.
- app/manifest.json.cpu - Defines the application and its configuration when building for CPU with TensorFlow Lite.
- app/manifest.json.cv25 - Defines the application and its configuration when building chip and model for cv25 DLPU.
- app/manifest.json.edgetpu - Defines the application and its configuration when building chip and model for Google TPU.
- app/panic.c/h - Utility for exiting the program on error
- app/vdo_larod.c - Application using larod, written in C.
- Dockerfile - Docker file with the specified Axis toolchain and API container to build the example specified.
- README.md - Step by step instructions on how to run the example.
- The example shows how to run on a device's DLPU or CPU, but for good performance it's recommended to use products with DLPU. See links to search for products with DLPU support in Axis device compatibility.
- This application was not written to optimize performance
- The pretrained models only outputs the confidence of two classes i.e., person and car. For options on pretrained models that classify a higher number of classes, visit the Axis Model Zoo.
Below is the step by step instructions on how to execute the program. So basically starting with the generation of the .eap file to running it on a device:
Note
Depending on the network your local build machine is connected to, you may need to add proxy settings for Docker. See Proxy in build time.
Depending on selected chip, different models are trained and are used for running laord. In this example, model files are downloaded from an AWS S3 bucket, when building the application. Which model that is used is configured through attributes in manifest.json and the CHIP parameter in the Dockerfile.
The attributes in manifest.json that configures model are:
- runOptions, which contains the application command-line options.
- friendlyName, a user friendly package name which is also part of the .eap filename.
The CHIP argument in the Dockerfile also needs to be changed depending on model. This argument controls which files are to be included in the package, e.g. model files. These files are copied to the application directory during installation.
Different devices support different chips and models.
Building is done using the following commands:
docker build --platform=linux/amd64 --tag <APP_IMAGE> --build-arg CHIP=<CHIP> .
docker cp $(docker create --platform=linux/amd64 <APP_IMAGE>):/opt/app ./build- <APP_IMAGE> is the name to tag the image with, e.g.,
vdo_larod:1.0. - <CHIP> is the chip type. Supported values are
artpec9,artpec8,cpu,cv25andedgetpu. - <ARCH> is the architecture. Supported values are
armv7hf(default) andaarch64.
See the following sections for build commands for each chip.
To build a package for ARTPEC-8 with Tensorflow Lite, run the following commands standing in your working directory:
docker build --build-arg ARCH=aarch64 --build-arg CHIP=artpec8 --tag <APP_IMAGE> .
docker cp $(docker create --platform=linux/amd64 <APP_IMAGE>):/opt/app ./buildTo build a package for ARTPEC-9 with Tensorflow Lite, run the following commands standing in your working directory:
docker build --build-arg ARCH=aarch64 --build-arg CHIP=artpec9 --tag <APP_IMAGE> .
docker cp $(docker create --platform=linux/amd64 <APP_IMAGE>):/opt/app ./buildTo build a package for CPU with Tensorflow Lite, run the following commands standing in your working directory:
docker build --build-arg CHIP=cpu --tag <APP_IMAGE> .
docker cp $(docker create --platform=linux/amd64 <APP_IMAGE>):/opt/app ./buildTo build a package for Google TPU instead, run the following commands standing in your working directory:
docker build --build-arg CHIP=edgetpu --tag <APP_IMAGE> .
docker cp $(docker create --platform=linux/amd64 <APP_IMAGE>):/opt/app ./buildTo build a package for CV25 run the following commands standing in your working directory:
docker build --build-arg ARCH=aarch64 --build-arg CHIP=cv25 --tag <APP_IMAGE> .
docker cp $(docker create --platform=linux/amd64 <APP_IMAGE>):/opt/app ./buildThe working directory now contains a build folder with the following files of importance:
vdo-larod
├── build
│ ├── imgprovider.c
│ ├── imgprovider.h
│ ├── lib
│ ├── LICENSE
│ ├── Makefile
│ ├── manifest.json
│ ├── manifest.json.artpec8
│ ├── manifest.json.artpec9
│ ├── manifest.json.cpu
│ ├── manifest.json.edgetpu
│ ├── manifest.json.cv25
│ ├── model
| │ └── model.tflite / model.bin
│ ├── package.conf
│ ├── package.conf.orig
│ └── panic.c
│ └── panic.h
│ ├── param.conf
│ ├── vdo_larod*
│ ├── vdo_larod_{cpu,edgetpu}_1_0_0_armv7hf.eap / vdo_larod_{cv25,artpec8,artpec9}_1_0_0_aarch64.eap
│ ├── vdo_larod_{cpu,edgetpu}_1_0_0_LICENSE.txt / vdo_larod_{cv25,artpec8,artpec9}_1_0_0_LICENSE.txt
│ └── vdo_larod.c-
build/manifest.json - Defines the application and its configuration.
-
build/model - Folder containing models used in this application.
-
build/model/model.tflite - Trained model file used for ARTPEC-8, ARTPEC-9, and CPU, or trained model file used for Google TPU, depending on
<CHIP>. -
build/model/model.bin - Trained model file used for CV25.
-
build/package.conf - Defines the application and its configuration.
-
build/package.conf.orig - Defines the application and its configuration, original file.
-
build/param.conf - File containing application parameters.
-
build/vdo_larod - Application executable binary file.
If chip
artpec8has been built. -
build/vdo_larod_artpec8_1_0_0_aarch64.eap - Application package .eap file.
-
build/vdo_larod_artpec8_1_0_0_LICENSE.txt - Copy of LICENSE file.
If chip
artpec9has been built. -
build/vdo_larod_artpec9_1_0_0_aarch64.eap - Application package .eap file.
-
build/vdo_larod_artpec9_1_0_0_LICENSE.txt - Copy of LICENSE file.
If chip
cpuhas been built. -
build/vdo_larod_cpu_1_0_0_armv7hf.eap - Application package .eap file.
-
build/vdo_larod_cpu_1_0_0_LICENSE.txt - Copy of LICENSE file.
If chip
edgetpuhas been built. -
build/vdo_larod_edgetpu_1_0_0_armv7hf.eap - Application package .eap file.
-
build/vdo_larod_edgetpu_1_0_0_LICENSE.txt - Copy of LICENSE file.
If chip
cv25has been built. -
build/vdo_larod_cv25_1_0_0_aarch64.eap - Application package .eap file.
-
build/vdo_larod_cv25_1_0_0_LICENSE.txt - Copy of LICENSE file.
Note
For detailed information on how to build, install, and run ACAP applications, refer to the official ACAP documentation: Build, install, and run.
Browse to the application page of the Axis device:
http://<AXIS_DEVICE_IP>/index.html#apps- Click on the tab
Appsin the device GUI - Enable
Allow unsigned appstoggle - Click
(+ Add app)button to upload the application file - Browse to the newly built ACAP application, depending on architecture:
vdo_larod_cv25_1_0_0_aarch64.eapvdo_larod_artpec8_1_0_0_aarch64.eapvdo_larod_artpec9_1_0_0_aarch64.eapvdo_larod_cpu_1_0_0_armv7hf.eapvdo_larod_edgetpu_1_0_0_armv7hf.eap
- Click
Install - Run the application by enabling the
Startswitch
The application is now installed on the device and named "vdo_larod_".
Application log can be found directly at:
http://<AXIS_DEVICE_IP>/axis-cgi/admin/systemlog.cgi?appname=vdo_larodDepending on the selected chip, different output is received.
In previous larod versions, the chip was referred to as a number instead of a string. See the table below to understand the mapping:
| Chips | Larod 1 (int) | Larod 3 |
|---|---|---|
| CPU with TensorFlow Lite | 2 | cpu-tflite |
| Google TPU | 4 | google-edge-tpu-tflite |
| Ambarella CVFlow (NN) | 6 | ambarella-cvflow |
| ARTPEC-8 DLPU | 12 | axis-a8-dlpu-tflite |
| ARTPEC-9 DLPU | - | a9-dlpu-tflite |
----- Contents of SYSTEM_LOG for 'vdo_larod' -----
vdo_larod[141742]: Starting /usr/local/packages/vdo_larod/vdo_larod
vdo_larod[141742]: Setting up larod connection with chip axis-a8-dlpu-tflite and model file /usr/local/packages/vdo_larod/model/model.tflite
vdo_larod[141742]: Loading the model... This might take up to 5 minutes depending on your device model.
vdo_larod[141742]: Model loaded successfully
vdo_larod[3991067]: Detected model format RGB and input resolution 256x256
vdo_larod[141742]: Created mmaped model output 0 with size 1
vdo_larod[141742]: Created mmaped model output 1 with size 1
vdo_larod[141742]: choose_stream_resolution: We select stream w/h=256 x 256 with format yuv based on VDO channel info.
vdo_larod[141742]: Dump of vdo stream settings map =====
vdo_larod[141742]: 'buffer.count'-----: <uint32 2>
vdo_larod[141742]: 'dynamic.framerate': <true>
vdo_larod[141742]: 'format'-----------: <uint32 3>
vdo_larod[141742]: 'framerate'--------: <30.0>
vdo_larod[141742]: 'height'-----------: <uint32 256>
vdo_larod[141742]: 'input'------------: <uint32 1>
vdo_larod[141742]: 'socket.blocking'--: <false>
vdo_larod[141742]: 'width'------------: <uint32 256>
vdo_larod[141742]: Ran pre-processing for 3 ms
vdo_larod[141742]: Ran inference for 14 ms
vdo_larod[141742]: Person detected: 100.00% - Car detected: 3.14%
vdo_larod[141742]: Exit /usr/local/packages/vdo_larod/vdo_larod----- Contents of SYSTEM_LOG for 'vdo_larod' -----
vdo_larod[3991067]: Starting /usr/local/packages/vdo_larod/vdo_larod
vdo_larod[3991067]: Setting up larod connection with chip a9-dlpu-tflite and model file /usr/local/packages/vdo_larod/model/model.tflite
vdo_larod[3991067]: Loading the model... This might take up to 5 minutes depending on your device model.
vdo_larod[3991067]: Model loaded successfully
vdo_larod[3991067]: Detected model format RGB and input resolution 256x256
vdo_larod[3991067]: Created mmaped model output 0 with size 1
vdo_larod[3991067]: Created mmaped model output 1 with size 1
vdo_larod[3991067]: choose_stream_resolution: We select stream w/h=256 x 256 with format rgb interleaved based on VDO channel info.
vdo_larod[3991067]: Dump of vdo stream settings map =====
vdo_larod[3991067]: 'buffer.count'-----: <uint32 2>
vdo_larod[3991067]: 'dynamic.framerate': <true>
vdo_larod[3991067]: 'format'-----------: <uint32 8>
vdo_larod[3991067]: 'framerate'--------: <30.0>
vdo_larod[3991067]: 'height'-----------: <uint32 256>
vdo_larod[3991067]: 'input'------------: <uint32 1>
vdo_larod[3991067]: 'socket.blocking'--: <false>
vdo_larod[3991067]: 'width'------------: <uint32 256>
vdo_larod[3991067]: Start fetching video frames from VDO
vdo_larod[3991067]: Ran inference for 5 ms
vdo_larod[3991067]: Person detected: 100.00% - Car detected: 3.14%
vdo_larod[3991067]: Exit /usr/local/packages/vdo_larod/vdo_larod----- Contents of SYSTEM_LOG for 'vdo_larod' -----
vdo_larod[145071]: Starting /usr/local/packages/vdo_larod/vdo_larod
vdo_larod[145071]: Setting up larod connection with chip cpu-tflite and model file /usr/local/packages/vdo_larod/model/model.tflite
vdo_larod[3991067]: Loading the model... This might take up to 5 minutes depending on your device model.
vdo_larod[3991067]: Model loaded successfully
vdo_larod[3991067]: Detected model format RGB and input resolution 256x256
vdo_larod[3991067]: Created mmaped model output 0 with size 1
vdo_larod[3991067]: Created mmaped model output 1 with size 1
vdo_larod[3991067]: choose_stream_resolution: We select stream w/h=256 x 256 with format rgb interleaved based on VDO channel info.
vdo_larod[3991067]: Dump of vdo stream settings map =====
vdo_larod[3991067]: 'buffer.count'-----: <uint32 2>
vdo_larod[3991067]: 'dynamic.framerate': <true>
vdo_larod[3991067]: 'format'-----------: <uint32 8>
vdo_larod[3991067]: 'framerate'--------: <30.0>
vdo_larod[3991067]: 'height'-----------: <uint32 256>
vdo_larod[3991067]: 'input'------------: <uint32 1>
vdo_larod[3991067]: 'socket.blocking'--: <false>
vdo_larod[3991067]: 'width'------------: <uint32 256>
vdo_larod[3991067]: Start fetching video frames from VDO
vdo_larod[3991067]: Ran inference for 340 ms
vdo_larod[3991067]: Person detected: 100.00% - Car detected: 3.14%
vdo_larod[145071]: Exit /usr/local/packages/vdo_larod/vdo_larod----- Contents of SYSTEM_LOG for 'vdo_larod' -----
vdo_larod[584171]: Starting /usr/local/packages/vdo_larod/vdo_larod
vdo_larod[584171]: Setting up larod connection with chip google-edge-tpu-tflite and model file /usr/local/packages/vdo_larod/model/model.tflite
vdo_larod[584171]: Loading the model... This might take up to 5 minutes depending on your device model.
vdo_larod[584171]: Model loaded successfully
vdo_larod[584171]: Detected model format RGB and input resolution 256x256
vdo_larod[584171]: Created mmaped model output 0 with size 1
vdo_larod[584171]: Created mmaped model output 1 with size 1
vdo_larod[584171]: chooseStreamResolution: We select stream w/h=256 x 256 based with format yuv based on VDO channel info.
vdo_larod[3991067]: Dump of vdo stream settings map =====
vdo_larod[3991067]: 'buffer.count'-----: <uint32 2>
vdo_larod[3991067]: 'dynamic.framerate': <true>
vdo_larod[3991067]: 'format'-----------: <uint32 3>
vdo_larod[3991067]: 'framerate'--------: <30.0>
vdo_larod[3991067]: 'height'-----------: <uint32 256>
vdo_larod[3991067]: 'input'------------: <uint32 1>
vdo_larod[3991067]: 'socket.blocking'--: <false>
vdo_larod[3991067]: 'width'------------: <uint32 256>
vdo_larod[584171]: Use preprocessing with input format yuv and output format rgb-interleaved
vdo_larod[584171]: Start fetching video frames from VDO
vdo_larod[584171]: Ran pre-processing for 2 ms
vdo_larod[584171]: Ran inference for 16 ms
vdo_larod[584171]: Person detected: 65.14% - Car detected: 11.92%
vdo_larod[4165]: Exit /usr/local/packages/vdo_larod/vdo_larod----- Contents of SYSTEM_LOG for 'vdo_larod' -----
vdo_larod[584171]: Starting /usr/local/packages/vdo_larod/vdo_larod
vdo_larod[584171]: Setting up larod connection with chip ambarella-cvflow and model file /usr/local/packages/vdo_larod/model/model.bin
vdo_larod[584171]: Loading the model... This might take up to 5 minutes depending on your device model.
vdo_larod[584171]: Model loaded successfully
vdo_larod[584171]: Detected model format PLANAR RGB and input resolution 256x256
vdo_larod[584171]: Created mmaped model output 0 with size 32
vdo_larod[584171]: Created mmaped model output 1 with size 32
vdo_larod[584171]: chooseStreamResolution: We select stream w/h=256 x 256 with format planar rgb based on VDO channel info.
vdo_larod[3991067]: Dump of vdo stream settings map =====
vdo_larod[3991067]: 'buffer.count'-----: <uint32 2>
vdo_larod[3991067]: 'dynamic.framerate': <true>
vdo_larod[3991067]: 'format'-----------: <uint32 9>
vdo_larod[3991067]: 'framerate'--------: <30.0>
vdo_larod[3991067]: 'height'-----------: <uint32 256>
vdo_larod[3991067]: 'input'------------: <uint32 1>
vdo_larod[3991067]: 'socket.blocking'--: <false>
vdo_larod[3991067]: 'width'------------: <uint32 256>
vdo_larod[584171]: Start fetching video frames from VDO
vdo_larod[584171]: Ran inference for 50 ms
vdo_larod[584171]: Person detected: 65.14% - Car detected: 11.92%
vdo_larod[584171]: Exit /usr/local/packages/vdo_larod/vdo_larodBuffers are allocated and tracked by VDO and larod. As such they will automatically be handled as efficiently as possible. The libyuv backend will map each buffer once and never copy. The VProc backend and any inference backend that supports dma-bufs will use that to achieve both zero copy and zero mapping. Inference backends not supporting dma-bufs will map each buffer once and never copy just like libyuv. It should also be mentioned that the input tensors of the inference model will be used as output tensors for the preprocessing model to avoid copying data.
The application however does no pipelining of preprocessing and inferences, but
uses the synchronous liblarod API call larodRunJob() in the interest of
simplicity. One could implement pipelining using larodRunJobAsync() and thus
improve performance, but with some added complexity to the program.
- This is an example of test data, which is dependent on selected device and chip.
- One full-screen banana has been used for testing.
- Running inference is much faster on ARTPEC-8 and Google TPU in comparison to CPU.
- Converting images takes almost the same time on all chips.
- Objects with score less than 60% are generally not good enough to be used as classification results.