Skip to content

Commit c63ae40

Browse files
Merge pull request #1929 from Arnaud-de-Grandmaison-ARM/ai-camera-pipeline
[New] Add a LP on accelerating AI Camera pipelines with KleidiAI and KleidiCV
2 parents 1db1102 + baad21e commit c63ae40

File tree

13 files changed

+418
-0
lines changed

13 files changed

+418
-0
lines changed
Lines changed: 96 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,96 @@
1+
---
2+
title: Prerequisites
3+
weight: 3
4+
5+
### FIXED, DO NOT MODIFY
6+
layout: learningpathall
7+
---
8+
9+
## Host machine requirements
10+
11+
This learning path demonstrates the benefits of using KleidiCV and KleidiAI in applications running on Arm, so you will need an aarch64 machine, preferably running the Ubuntu distribution. The instructions in this learning path assume an Ubuntu distribution.
12+
13+
## Install software required for this Learning Path
14+
15+
You need to ensure you have the following tools:
16+
- `git`, the version control system, for cloning the Voice Assistant codebase
17+
- `git lfs`, an extension to `git` that helps manage large files by storing references to the files in the repository instead of the actual files themselves
18+
- `docker`, an open-source containerization platform
19+
- `libomp`, LLVM's OpenMP runtime library
20+
21+
### `git` and `git lfs`
22+
23+
These tools can be installed by running the following command (depending on your machine's OS):
24+
25+
{{< tabpane code=true >}}
26+
{{< tab header="Linux/Ubuntu" language="bash">}}
27+
sudo apt install git git-lfs
28+
{{< /tab >}}
29+
{{< tab header="macOS" language="bash">}}
30+
brew install git git-lfs
31+
{{< /tab >}}
32+
{{< /tabpane >}}
33+
34+
### `docker`
35+
36+
Start by checking that `docker` is installed on your machine by typing the following command line in a terminal:
37+
38+
```BASH { output_lines="2" }
39+
docker --version
40+
Docker version 27.3.1, build ce12230
41+
```
42+
43+
If the above command fails with a message similar to "`docker: command not found`," then follow the steps from the [Docker Install Guide](https://learn.arm.com/install-guides/docker/).
44+
45+
{{% notice Note %}}
46+
You might need to log in again or restart your machine for the changes to take effect.
47+
{{% /notice %}}
48+
49+
Once you have confirmed that Docker is installed on your machine, you can check that it is operating normally with the following:
50+
51+
```BASH { output_lines="2-27" }
52+
docker run hello-world
53+
Unable to find image 'hello-world:latest' locally
54+
latest: Pulling from library/hello-world
55+
478afc919002: Pull complete
56+
Digest: sha256:305243c734571da2d100c8c8b3c3167a098cab6049c9a5b066b6021a60fcb966
57+
Status: Downloaded newer image for hello-world:latest
58+
59+
Hello from Docker!
60+
This message shows that your installation appears to be working correctly.
61+
62+
To generate this message, Docker followed these steps:
63+
64+
1. The Docker client contacted the Docker daemon.
65+
66+
2. The Docker daemon pulled the "hello-world" image from Docker Hub.
67+
(arm64v8)
68+
69+
3. The Docker daemon created a new container from that image which runs the
70+
executable that produces the output you are currently reading.
71+
72+
4. The Docker daemon streamed that output to the Docker client, which sent it
73+
to your terminal.
74+
75+
To try something more ambitious, you can run an Ubuntu container with:
76+
$ docker run -it ubuntu bash
77+
78+
Share images, automate workflows, and more with a free Docker ID:
79+
https://hub.docker.com/
80+
81+
For more examples and ideas, visit:
82+
https://docs.docker.com/get-started/
83+
```
84+
85+
### `libomp`
86+
87+
`libomp` can be installed by running the following command (depending on your machine's OS):
88+
89+
{{< tabpane code=true >}}
90+
{{< tab header="Linux/Ubuntu" language="bash">}}
91+
sudo apt install libomp-19-dev
92+
{{< /tab >}}
93+
{{< tab header="macOS" language="bash">}}
94+
brew install libomp
95+
{{< /tab >}}
96+
{{< /tabpane >}}
Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
---
2+
3+
title: Overview
4+
weight: 4
5+
6+
### FIXED, DO NOT MODIFY
7+
layout: learningpathall
8+
9+
---
10+
11+
## KleidiAI
12+
13+
[KleidiAI](https://gitlab.arm.com/kleidi/kleidiai) is an open-source library that provides optimized performance-critical routines, also known as micro-kernels, for artificial intelligence (AI) workloads tailored for Arm CPUs.
14+
15+
These routines are tuned to exploit the capabilities of specific Arm hardware architectures, aiming to maximize performance. The [KleidiAI](https://gitlab.arm.com/kleidi/kleidiai) library has been designed for ease of adoption into C or C++ machine learning (ML) and AI frameworks. A number of AI frameworks already take advantage of [KleidiAI](https://gitlab.arm.com/kleidi/kleidiai) to improve performances on Arm platforms.
16+
17+
## KleidiCV
18+
19+
The open-source [KleidiCV](https://gitlab.arm.com/kleidi/kleidicv) library provides high-performance image processing functions for AArch64. It is designed to be simple to integrate into a wide variety of projects and some computer vision frameworks (like OpenCV) take advantage of [KleidiCV](https://gitlab.arm.com/kleidi/kleidicv) to improve performances on Arm platforms.
20+
21+
## The AI camera pipelines
22+
23+
The AI camera pipelines are 2 example applications, implemented with a combination of AI and CV (Computer Vision) computations:
24+
- Background Blur
25+
- Low Light Enhancement
26+
27+
For both applications:
28+
- The input and output images are stored in `ppm` (portable pixmap) format, with 3 channels (Red, Green, and Blue) and 256 color levels each (also known as `RGB8`).
29+
- The images are first converted to the `YUV420` color space, where the background blur or low-light enhancement operations will take place. After the processing is done, the images are converted back to `RGB8` and saved in `ppm` format.
30+
31+
### Background Blur
32+
33+
The pipeline that has been implemented for background blur looks like this:
34+
35+
![example image alt-text#center](blur_pipeline.png "Figure 1: Background Blur Pipeline Diagram")
36+
37+
### Low Light Enhancement
38+
39+
The pipeline implemented for low-light enhancement is adapted from the LiveHDR+ pipeline, as originally proposed by Google Research in 2017, and looks like this:
40+
41+
![example image alt-text#center](lle_pipeline.png "Figure 2: Low Light Enhancement Pipeline Diagram")
42+
43+
where the Low-Resolution Coefficient Prediction Network (implemented with TFLite) includes computations like:
44+
- strided convolutions
45+
- local feature extraction with convolutional layers
46+
- global feature extraction with convolutional + fully connected layers
47+
- add, convolve, and reshape
Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
---
2+
title: Build the Pipelines
3+
weight: 5
4+
5+
### FIXED, DO NOT MODIFY
6+
layout: learningpathall
7+
---
8+
9+
## Download the AI Camera Pipelines Project
10+
11+
```BASH
12+
git clone https://git.gitlab.arm.com/kleidi/kleidi-examples/ai-camera-pipelines.git ai-camera-pipelines.git
13+
```
14+
15+
Check out the data files:
16+
17+
```BASH
18+
cd ai-camera-pipelines.git
19+
git lfs install
20+
git lfs pull
21+
```
22+
23+
## Create a Build Container
24+
25+
The pipelines will be built from a container, so you first need to build the container:
26+
27+
```BASH
28+
docker build -t ai-camera-pipelines -f docker/Dockerfile --build-arg DOCKERHUB_MIRROR=docker.io --build-arg CI_UID=$(id -u) .
29+
```
30+
31+
## Build the AI Camera Pipelines
32+
33+
Start a shell in the container you just built with:
34+
35+
```BASH
36+
docker run --rm --volume $PWD:/home/cv-examples/example -it ai-camera-pipelines
37+
```
38+
39+
And execute the following commands and leave the container:
40+
41+
```BASH
42+
ENABLE_SME2=0
43+
TENSORFLOW_GIT_TAG=ddceb963c1599f803b5c4beca42b802de5134b44
44+
45+
# Build flatbuffers
46+
git clone https://github.com/google/flatbuffers.git
47+
cd flatbuffers
48+
git checkout v24.3.25
49+
mkdir build
50+
cd build
51+
cmake .. -DCMAKE_INSTALL_PREFIX=../install
52+
cmake --build . -j16
53+
cmake --install .
54+
cd ../..
55+
56+
# Build the pipelines
57+
mkdir build
58+
cd build
59+
cmake -GNinja -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=../install -DARMNN_TFLITE_PARSER=0 -DTENSORFLOW_GIT_TAG=$TENSORFLOW_GIT_TAG -DTFLITE_HOST_TOOLS_DIR=../flatbuffers/install/bin -DENABLE_SME2=$ENABLE_SME2 -DENABLE_KLEIDICV:BOOL=ON -DXNNPACK_ENABLE_KLEIDIAI:BOOL=ON -DCMAKE_TOOLCHAIN_FILE=toolchain.cmake -S ../example -B .
60+
cmake --build . -j16
61+
cmake --install .
62+
63+
# Package and export the pipelines.
64+
cd ..
65+
tar cfz example/install.tar.gz install
66+
67+
# Leave the container (ctrl+D)
68+
```
69+
70+
You can note on the `cmake` configuration step command line:
71+
- `-DENABLE_SME2=$ENABLE_SME2` with `ENABLE_SME2=0`: SME2 is not (yet) enabled --- but stay tuned !
72+
- `-DARMNN_TFLITE_PARSER=0` configure the `ai-camera-pipelines` repository to use TFLite (with XNNPack) instead of ArmNN
73+
- `-DENABLE_KLEIDICV:BOOL=ON`: KleidiCV is enabled
74+
- `-DXNNPACK_ENABLE_KLEIDIAI:BOOL=ON`: TFLite+XNNPack with use KleidiAI
75+
76+
## Install the Pipelines
77+
78+
```BASH
79+
cd $HOME
80+
tar xfz ai-camera-pipelines.git/install.tar.gz
81+
mv install ai-camera-pipelines
82+
```
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
---
2+
3+
title: Run the Pipelines
4+
weight: 6
5+
6+
### FIXED, DO NOT MODIFY
7+
layout: learningpathall
8+
---
9+
10+
In the previous section, we built the AI Camera Pipelines. In this section, you will run the AI Camera pipelines to transform an image.
11+
12+
## Background Blur
13+
14+
```BASH
15+
cd $HOME/ai-camera-pipelines
16+
bin/cinematic_mode resources/test_input2.ppm test_output2.ppm resources/depth_and_saliency_v3_2_assortedv2_w_augment_mobilenetv2_int8_only_ptq.tflite
17+
```
18+
19+
![example image alt-text#center](test_input2.png "Figure 3: Original picture")
20+
![example image alt-text#center](test_output2.png "Figure 4: Picture with blur applied")
21+
22+
## Low Light Enhancement
23+
24+
```BASH
25+
cd $HOME/ai-camera-pipelines
26+
bin/low_light_image_enhancement resources/test_input2.ppm test_output2_lime.ppm resources/HDRNetLIME_lr_coeffs_v1_1_0_mixed_low_light_perceptual_l2_loss_int8_only_ptq.tflite
27+
```
28+
29+
![example image alt-text#center](test_output2_lime.png "Figure 5: Picture with low light enhancement applied")
Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
---
2+
title: Performances
3+
weight: 7
4+
5+
### FIXED, DO NOT MODIFY
6+
layout: learningpathall
7+
---
8+
9+
The application that was previously built has a *benchmark* mode that will run the core function multiple times in a hot loop:
10+
11+
- `ai-camera-pipelines/bin/cinematic_mode_benchmark`
12+
- `ai-camera-pipelines/bin/low_light_image_enhancement_benchmark`
13+
14+
The performance of the camera pipelines have been improved by using KleidiCV and KleidiAI:
15+
- KleidiCV improves the performances of OpenCV with computation kernels optimized for the Arm processors.
16+
- KleidiAI improves the performances of TFLite+XNNPack with computations kernels dedicatd to AI tasks on Arm processors.
17+
18+
## Performances with KleidiCV and KleidiAI
19+
20+
By default, the OpenCV library is built with KleidiCV support, and TFLite+xnnpack is built with KleidiAI support, so let's measure the performance of the applications we have already built:
21+
22+
```BASH
23+
$ bin/cinematic_mode_benchmark 20 resources/depth_and_saliency_v3_2_assortedv2_w_augment_mobilenetv2_int8_only_ptq.tflite
24+
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
25+
Total run time over 20 iterations: 2023.39 ms
26+
27+
$ bin/low_light_image_enhancement_benchmark 20 resources/HDRNetLIME_lr_coeffs_v1_1_0_mixed_low_light_perceptual_l2_loss_int8_only_ptq.tflite
28+
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
29+
Total run time over 20 iterations: 54.3546 ms
30+
```
31+
32+
It can be seen from above that:
33+
- `cinematic_mode_benchmark` performed 20 iterations in 1985.99 ms,
34+
- `low_light_image_enhancement_benchmark` performed 20 iterations in 52.3448 ms.
35+
36+
## Performances without KleidiCV and KleidiAI
37+
38+
Now re-run the build steps from the previous section and change CMake's invocation to use `-DENABLE_KLEIDICV:BOOL=OFF -DXNNPACK_ENABLE_KLEIDIAI:BOOL=OFF` in order *not* to use KleidiCV and KleidiAI.
39+
40+
You can run the benchmarks again:
41+
42+
```BASH
43+
$ bin/cinematic_mode_benchmark 20 resources/depth_and_saliency_v3_2_assortedv2_w_augment_mobilenetv2_int8_only_ptq.tflite
44+
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
45+
Total run time over 20 iterations: 2029.25 ms
46+
47+
$ bin/low_light_image_enhancement_benchmark 20 resources/HDRNetLIME_lr_coeffs_v1_1_0_mixed_low_light_perceptual_l2_loss_int8_only_ptq.tflite
48+
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
49+
Total run time over 20 iterations: 79.431 ms
50+
```
51+
52+
Let's put all those numbers together in a simple table to compare them easily:
53+
54+
| Benchmark | Without KleidiCV+KleidiAI | With KleidiCV+KleidiAI |
55+
|-------------------------------------------|---------------------------|------------------------|
56+
| `cinematic_mode_benchmark` | 2029.25 ms | 2023.39 ms |
57+
| `low_light_image_enhancement_benchmark` | 79.431 ms | 54.3546 ms |
58+
59+
As can be seen, the blur pipeline (`cinematic_mode_benchmark`) benefits marginally from KleidiCV+KleidiAI, whereas low light enhancement got almost a 30% boost.
60+
61+
## Future Performance Uplift with SME2
62+
63+
A nice benefit of using KleidiCV and KleidiAI is that whenever the hardware adds support for new and more powerful instructions, the applications will be able to get a performance uplift without requiring complex software changes — KleidiCV and KleidiAI operate as abstraction layers that will be able to build on hardware improvements to boost future performance. An example of such a performance boost *for free* will take place in a couple of months when processors implementing SME2 become available.
Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
---
2+
title: AI Camera Pipelines
3+
4+
minutes_to_complete: 30
5+
6+
who_is_this_for: This Learning Path is an introductory topic on improving the performance of camera pipelines using KleidiAI and KleidiCV.
7+
8+
learning_objectives:
9+
- Compile and run camera pipeline applications
10+
- Use KleidiCV and KleidiAI to boost the performance of camera pipelines
11+
12+
prerequisites:
13+
- CMake
14+
- Git + Git LFS
15+
- Docker
16+
17+
author: Arnaud de Grandmaison
18+
19+
test_images:
20+
- ubuntu:latest
21+
test_link: null
22+
test_maintenance: false
23+
24+
### Tags
25+
skilllevels: Introductory
26+
subjects: Performance and Architecture
27+
armips:
28+
- Cortex-A
29+
tools_software_languages:
30+
- C++
31+
operatingsystems:
32+
- Linux
33+
- macOS
34+
- Windows
35+
36+
further_reading:
37+
38+
- resource:
39+
title: Accelerate Generative AI Workloads Using KleidiAI
40+
link: https://learn.arm.com/learning-paths/cross-platform/kleidiai-explainer
41+
type: website
42+
43+
- resource:
44+
title: LLM Inference on Android with KleidiAI, MediaPipe, and XNNPACK
45+
link: https://learn.arm.com/learning-paths/mobile-graphics-and-gaming/kleidiai-on-android-with-mediapipe-and-xnnpack/
46+
type: website
47+
48+
- resource:
49+
title: Vision LLM Inference on Android with KleidiAI and MNN
50+
link: https://learn.arm.com/learning-paths/mobile-graphics-and-gaming/vision-llm-inference-on-android-with-kleidiai-and-mnn/
51+
type: website
52+
53+
### FIXED, DO NOT MODIFY
54+
# ================================================================================
55+
weight: 1 # _index.md always has a weight of 1 to order correctly
56+
layout: "learningpathall" # All files under learning paths have this same wrapper
57+
learning_path_main_page: "yes" # This should be surfaced when looking for related content. Only set for _index.md of learning path content.
58+
---
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
---
2+
# ================================================================================
3+
# FIXED, DO NOT MODIFY
4+
# ================================================================================
5+
weight: 21 # set to always be larger than the content in this path, and one more than 'review'
6+
title: "Next Steps" # Always the same
7+
layout: "learningpathall" # All files under learning paths have this same wrapper
8+
---

0 commit comments

Comments
 (0)