Skip to content

Commit 30709e3

Browse files
Merge pull request #2303 from Arnaud-de-Grandmaison-ARM/denoizing
[ai-camera-pipelines] Add the neural denoising pipeline.
2 parents 368adc0 + 097fb67 commit 30709e3

File tree

8 files changed

+134
-44
lines changed

8 files changed

+134
-44
lines changed

content/learning-paths/mobile-graphics-and-gaming/ai-camera-pipelines/2-overview.md

Lines changed: 39 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -12,27 +12,30 @@ layout: learningpathall
1212

1313
[KleidiAI](https://gitlab.arm.com/kleidi/kleidiai) is an open-source library that provides optimized, performance-critical routines - also known as micro-kernels - for artificial intelligence (AI) workloads on Arm CPUs.
1414

15-
These routines are tuned to take full advantage of specific Arm hardware architectures to maximize performance. The [KleidiAI](https://gitlab.arm.com/kleidi/kleidiai) library is designed for easy integration into C or C++ machine learning (ML) and AI frameworks.
15+
These routines are tuned to take full advantage of specific Arm hardware architectures to maximize performance. The [KleidiAI](https://gitlab.arm.com/kleidi/kleidiai) library is designed for easy integration into C or C++ machine learning (ML) and AI frameworks.
1616

1717
Several popular AI frameworks already take advantage of [KleidiAI](https://gitlab.arm.com/kleidi/kleidiai) to improve performance on Arm platforms.
1818

1919
## KleidiCV
2020

21-
[KleidiCV](https://gitlab.arm.com/kleidi/kleidicv) is an open-source library that provides high-performance image processing functions for AArch64.
21+
[KleidiCV](https://gitlab.arm.com/kleidi/kleidicv) is an open-source library that provides high-performance image processing functions for AArch64.
2222

2323
It is designed to be lightweight and simple to integrate into a wide variety of projects. Some computer vision frameworks, such as OpenCV, leverage [KleidiCV](https://gitlab.arm.com/kleidi/kleidicv) to accelerate image processing on Arm devices.
2424

2525
## AI camera pipelines
2626

27-
This Learning Path provides two example applications that combine AI and computer vision (CV) techniques:
28-
- Background Blur.
29-
- Low-Light Enhancement.
27+
This Learning Path provides three example applications that combine AI and computer vision (CV) techniques:
28+
- Background Blur,
29+
- Low-Light Enhancement,
30+
- Neural Denoising.
31+
32+
## Background Blur and Low Light Enhancement
3033

3134
Both applications:
32-
- Use input and output images that are stored in `ppm` (Portable Pixmap format), with three RGB channels (Red, Green, and Blue). Each channel supports 256 intensity levels (0-255) commonly referred to as `RGB8`.
35+
- Use input and output images that are stored in `png` format, with three RGB channels (Red, Green, and Blue). Each channel supports 256 intensity levels (0-255) commonly referred to as `RGB8`.
3336
- Convert the images to the `YUV420` color space for processing.
3437
- Apply the relevant effect (background blur or low-light enhancement).
35-
- Convert the processed images back to `RGB8` and save them as `ppm` files.
38+
- Convert the processed images back to `RGB8` and save them as `.png` files.
3639

3740
### Background Blur
3841

@@ -50,4 +53,32 @@ The Low-Resolution Coefficient Prediction Network (implemented with LiteRT) perf
5053
- Strided convolutions.
5154
- Local feature extraction using convolutional layers.
5255
- Global feature extraction using convolutional and fully connected layers.
53-
- Add, convolve, and reshape operations.
56+
- Add, convolve, and reshape operations.
57+
58+
## Neural Denoising
59+
60+
Every smartphone photographer has seen it: images that look sharp in daylight
61+
but fall apart in dim lighting. This is because _signal-to-noise ratio (SNR)_
62+
drops dramatically when sensors capture fewer photons. At 1000 lux, the signal
63+
dominates and images look clean; at 1 lux, readout noise becomes visible as
64+
grain, color speckles, and loss of fine detail.
65+
66+
That’s why _neural camera denoising_ is one of the most critical --- and
67+
computationally demanding --- steps in a camera pipeline. Done well, it
68+
transforms noisy frames into sharp, vibrant captures. Done poorly, it leaves
69+
smudges and artifacts that ruin the shot.
70+
71+
As depicted in the diagram below, the Neural Denoising pipeline is using 2
72+
algorithms to process the frames:
73+
- either temporally, with an algorithm named `ultralite` in the code
74+
repository,
75+
- or spatially, with an algorithm named `collapsenet` in the code repository,
76+
- or both.
77+
78+
Temporal denoising uses some frames as history.
79+
80+
![example image alt-text#center](denoising_pipeline.png "Neural Denoising Pipeline Diagram")
81+
82+
The Neural Denoising application works on frames, as emitted by a camera sensor in Bayer format:
83+
- the input frames are in RGGB 1080x1920x4 format,
84+
- the output frames in YGGV 4x1080x1920 format.

content/learning-paths/mobile-graphics-and-gaming/ai-camera-pipelines/3-build.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ Build the Docker container used to compile the pipelines:
3030
docker build -t ai-camera-pipelines -f docker/Dockerfile \
3131
--build-arg DOCKERHUB_MIRROR=docker.io \
3232
--build-arg CI_UID=$(id -u) \
33-
docker
33+
docker/
3434
```
3535

3636
## Build the AI Camera Pipelines
@@ -45,7 +45,7 @@ Inside the container, run the following commands:
4545

4646
```bash
4747
ENABLE_SME2=0
48-
TENSORFLOW_GIT_TAG=ddceb963c1599f803b5c4beca42b802de5134b44
48+
TENSORFLOW_GIT_TAG="v2.19.0"
4949

5050
# Build flatbuffers
5151
git clone https://github.com/google/flatbuffers.git

content/learning-paths/mobile-graphics-and-gaming/ai-camera-pipelines/4-run.md

Lines changed: 37 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -9,26 +9,55 @@ layout: learningpathall
99

1010
## Apply transformations
1111

12-
In the previous section, you built the AI Camera Pipelines. In this section, you'll run them to apply transformations to an input image.
12+
In the previous section, you built the AI Camera Pipelines. In this section, you'll run them to apply transformations to an input image or input frames.
13+
14+
15+
```bash
16+
cd $HOME/ai-camera-pipelines
17+
python3 -m venv venv
18+
. venv/bin/activate
19+
pip install -r ai-camera-pipelines.git/docker/python-requirements.txt
20+
```
1321

1422
### Background Blur
1523

16-
Run the background blur pipeline:
24+
Run the background Blur pipeline, using `resources/test_input.png` as the input image and write the transformed image to `test_output.png`:
1725

1826
```bash
1927
cd $HOME/ai-camera-pipelines
20-
bin/cinematic_mode resources/test_input2.ppm test_output2.ppm resources/depth_and_saliency_v3_2_assortedv2_w_augment_mobilenetv2_int8_only_ptq.tflite
28+
bin/cinematic_mode resources/test_input.png test_output.png resources/depth_and_saliency_v3_2_assortedv2_w_augment_mobilenetv2_int8_only_ptq.tflite
2129
```
2230

23-
![example image alt-text#center](test_input2.png "Original picture")
24-
![example image alt-text#center](test_output2.png "Picture with blur applied")
31+
![example image alt-text#center](test_input2.png "Input image")
32+
![example image alt-text#center](test_output2.png "Image with blur applied")
2533

2634
### Low-Light Enhancement
2735

36+
Run the Low-Light Enhancement pipeline, using `resources/test_input.png` as the input image and write the transformed image to `test_output2_lime.png`:
37+
2838
```bash
2939
cd $HOME/ai-camera-pipelines
30-
bin/low_light_image_enhancement resources/test_input2.ppm test_output2_lime.ppm resources/HDRNetLIME_lr_coeffs_v1_1_0_mixed_low_light_perceptual_l2_loss_int8_only_ptq.tflite
40+
bin/low_light_image_enhancement resources/test_input.png test_output2_lime.png resources/HDRNetLIME_lr_coeffs_v1_1_0_mixed_low_light_perceptual_l1_loss_float32.tflite
41+
```
42+
43+
![example image alt-text#center](test_input2.png "Input image")
44+
![example image alt-text#center](test_output2_lime.png "Image with low-light enhancement applied")
45+
46+
47+
### Neural denoising
48+
49+
When the SME extension is not available, only temporal neural denoising is
50+
available, so this is what you will run for now --- but stay tuned as the SME extension
51+
will become available very soon:
52+
53+
```bash
54+
./scripts/run_neural_denoiser_temporal.sh
3155
```
3256

33-
![example image alt-text#center](test_input2.png "Original picture")
34-
![example image alt-text#center](test_output2_lime.png "Picture with low-light enhancement applied")
57+
The input frames are:
58+
- first converted from `.png` files in the `resources/test-lab-sequence/` directory to the sensor format (RGGB Bayer) into `neural_denoiser_io/input_noisy*`,
59+
- those frames are then processed by the Neural Denoiser and written into `neural_denoiser_io/output_denoised*`,
60+
- last, the denoised frames are converted back to `.png` for easy visualization in directory `test-lab-sequence-out`.
61+
62+
![example image alt-text#center](denoising_input_0010.png "Original frame")
63+
![example image alt-text#center](denoising_output_0010.png "Frame with temporal denoising applied")

content/learning-paths/mobile-graphics-and-gaming/ai-camera-pipelines/5-performances.md

Lines changed: 55 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -12,19 +12,19 @@ The application you built earlier includes a *benchmark mode* that runs the core
1212

1313
- `ai-camera-pipelines/bin/cinematic_mode_benchmark`
1414
- `ai-camera-pipelines/bin/low_light_image_enhancement_benchmark`
15+
- `ai-camera-pipelines/bin/neural_denoiser_temporal_benchmark_4K`
1516

1617
These benchmarks demonstrate the performance improvements enabled by KleidiCV and KleidiAI:
1718
- KleidiCV enhances OpenCV performance with computation kernels optimized for Arm processors.
18-
1919
- KleidiAI accelerates LiteRT + XNNPack inference using AI-optimized micro-kernels tailored for Arm CPUs.
2020

2121
## Performances with KleidiCV and KleidiAI
2222

23-
By default, the OpenCV library is built with KleidiCV support, and LiteRT+xnnpack is built with KleidiAI support.
23+
By default, the OpenCV library is built with KleidiCV support, and LiteRT+xnnpack is built with KleidiAI support.
2424

2525
You can run the benchmarks using the applications you built earlier.
2626

27-
Run the first benchmark:
27+
Run the Background Blur benchmark:
2828

2929
```bash
3030
bin/cinematic_mode_benchmark 20 resources/depth_and_saliency_v3_2_assortedv2_w_augment_mobilenetv2_int8_only_ptq.tflite
@@ -34,25 +34,38 @@ The output is similar to:
3434

3535
```output
3636
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
37-
Total run time over 20 iterations: 2023.39 ms
37+
Total run time over 20 iterations: 2028.745 ms
3838
```
3939

40-
Run the second benchmark:
40+
Run the Low Light Enhancement benchmark:
4141

4242
```bash
43-
bin/low_light_image_enhancement_benchmark 20 resources/HDRNetLIME_lr_coeffs_v1_1_0_mixed_low_light_perceptual_l2_loss_int8_only_ptq.tflite
43+
bin/low_light_image_enhancement_benchmark 20 resources/HDRNetLIME_lr_coeffs_v1_1_0_mixed_low_light_perceptual_l1_loss_float32.tflite
4444
```
4545

4646
The output is similar to:
4747

4848
```output
4949
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
50-
Total run time over 20 iterations: 54.3546 ms
50+
Total run time over 20 iterations: 58.2126 ms
51+
```
52+
53+
Last, run the Neural Denoising benchmark:
54+
55+
```bash
56+
bin/neural_denoiser_temporal_benchmark_4K 20
57+
```
58+
59+
The output is similar to:
60+
61+
```output
62+
Total run time over 10 iterations: 37.6839 ms
5163
```
5264

5365
From these results, you can see that:
54-
- `cinematic_mode_benchmark` performed 20 iterations in 1985.99 ms.
55-
- `low_light_image_enhancement_benchmark` performed 20 iterations in 52.3448 ms.
66+
- `cinematic_mode_benchmark` performed 20 iterations in 2028.745 ms.
67+
- `low_light_image_enhancement_benchmark` performed 20 iterations in 58.2126 ms.
68+
- `neural_denoiser_temporal_benchmark_4K` performed 20 iterations in 37.6839 ms.
5669

5770
## Benchmark results without KleidiCV and KleidiAI
5871

@@ -61,7 +74,7 @@ To measure the performance without these optimizations, recompile the pipelines
6174
-DENABLE_KLEIDICV:BOOL=OFF -DXNNPACK_ENABLE_KLEIDIAI:BOOL=OFF
6275
```
6376

64-
Re-run the first benchmark:
77+
Re-run the Background Blur benchmark:
6578

6679
```bash
6780
bin/cinematic_mode_benchmark 20 resources/depth_and_saliency_v3_2_assortedv2_w_augment_mobilenetv2_int8_only_ptq.tflite
@@ -71,35 +84,52 @@ The new output is similar to:
7184

7285
```output
7386
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
74-
Total run time over 20 iterations: 2029.25 ms
87+
Total run time over 20 iterations: 2030.5525 ms
7588
```
7689

77-
Re-run the second benchmark:
90+
Re-run the Low Light Enhancment benchmark:
7891

7992
```bash
80-
bin/low_light_image_enhancement_benchmark 20 resources/HDRNetLIME_lr_coeffs_v1_1_0_mixed_low_light_perceptual_l2_loss_int8_only_ptq.tflite
93+
bin/low_light_image_enhancement_benchmark 20 resources/HDRNetLIME_lr_coeffs_v1_1_0_mixed_low_light_perceptual_l1_loss_float32.tflite
8194
```
8295

8396
The new output is similar to:
8497

8598
```output
8699
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
87-
Total run time over 20 iterations: 79.431 ms
100+
Total run time over 20 iterations: 58.0613 ms
88101
```
89102

90-
### Comparison table
103+
Re-run the Neural Denoising benchmark:
91104

92-
| Benchmark | Without KleidiCV+KleidiAI | With KleidiCV+KleidiAI |
93-
|-------------------------------------------|---------------------------|------------------------|
94-
| `cinematic_mode_benchmark` | 2029.25 ms | 2023.39 ms |
95-
| `low_light_image_enhancement_benchmark` | 79.431 ms | 54.3546 ms |
96-
97-
As shown, the background blur pipeline (`cinematic_mode_benchmark`) gains only a small improvement, while the low-light enhancement pipeline sees a significant ~30% performance uplift when KleidiCV and KleidiAI are enabled.
98-
99-
## Future performance uplift with SME2
105+
```bash
106+
bin/neural_denoiser_temporal_benchmark_4K 20
107+
```
100108

101-
A major benefit of using KleidiCV and KleidiAI is that they can automatically leverage new Arm architecture features - such as SME2 (Scalable Matrix Extension v2) - without requiring changes to your application code.
109+
The new output is similar to:
102110

103-
As KleidiCV and KleidiAI operate as performance abstraction layers, any future hardware instruction support can be utilized by simply rebuilding the application. This enables better performance on newer processors without additional engineering effort.
111+
```output
112+
Total run time over 20 iterations: 38.0813 ms
113+
```
104114

115+
### Comparison table and future performance uplift with SME2
105116

117+
| Benchmark | Without KleidiCV+KleidiAI | With KleidiCV+KleidiAI |
118+
|-------------------------------------------|---------------------------|------------------------|
119+
| `cinematic_mode_benchmark` | 2030.5525 ms | 2028.745 ms (-0.09%) |
120+
| `low_light_image_enhancement_benchmark` | 58.0613 ms | 58.2126 ms (0.26%) |
121+
| `neural_denoiser_temporal_benchmark_4K` | 38.0813 ms | 37.6839 ms (-1.04%) |
122+
123+
As shown, the Background Blur (`cinematic_mode_benchmark`) and Neural Denoising
124+
pipelines gains only a minor improvement, while the low-light enhancement pipeline
125+
sees a minor performance degradation (0.26%) when KleidiCV and KleidiAI are
126+
enabled.
127+
128+
A major benefit of using KleidiCV and KleidiAI though is that they can
129+
automatically leverage new Arm architecture features - such as SME2 (Scalable
130+
Matrix Extension v2) - without requiring changes to your application code.
131+
132+
As KleidiCV and KleidiAI operate as performance abstraction layers, any future
133+
hardware instruction support can be utilized by simply rebuilding the
134+
application. This enables better performance on newer processors without
135+
additional engineering effort.

content/learning-paths/mobile-graphics-and-gaming/ai-camera-pipelines/_index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
title: Accelerate Background Blur and Low-Light Camera Effects
2+
title: Accelerate Denoising, Background Blur and Low-Light Camera Effects
33

44
minutes_to_complete: 30
55

400 KB
Loading
390 KB
Loading
22.6 KB
Loading

0 commit comments

Comments
 (0)