Skip to content

Commit 73f88e0

Browse files
Merge pull request #2592 from madeline-underwood/kleidi2
Kleidi2_JA to sign off
2 parents a1ca04e + 4234a33 commit 73f88e0

File tree

3 files changed

+83
-45
lines changed

3 files changed

+83
-45
lines changed

content/learning-paths/laptops-and-desktops/kleidicv-on-mac/_index.md

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,6 @@
11
---
22
title: Build and test KleidiCV on macOS
33

4-
draft: true
5-
cascade:
6-
draft: true
7-
84
minutes_to_complete: 30
95

106
who_is_this_for: This is an introductory topic for software developers who want to build and test KleidiCV on macOS.
@@ -16,8 +12,8 @@ learning_objectives:
1612

1713
prerequisites:
1814
- A Mac with Apple Silicon (M4 generation or newer)
19-
- Basic familiarity with command-line tools
2015
- Xcode command line tools installed
16+
- Basic familiarity with using the Terminal and command-line tools
2117

2218
author: Jett Zhou
2319

content/learning-paths/laptops-and-desktops/kleidicv-on-mac/build-1.md

Lines changed: 40 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -7,42 +7,47 @@ layout: learningpathall
77

88
## Introduction
99

10-
Arm KleidiCV is an open-source library of optimized, performance-critical routines for Arm CPUs. You can integrate it into any Computer Vision (CV) framework to get the best performance for CV workloads on Arm, with no action needed by application developers.
10+
Arm KleidiCV is an open-source library that provides fast, optimized routines for Arm CPUs. You can use KleidiCV with any computer vision (CV) framework to boost performance for CV workloads on Arm systems.
1111

12-
Each KleidiCV function has different implementations targeting Neon, SVE2 (Scalable Vector Extension), or Streaming SVE and SME2 (Scalable Matrix Extension). KleidiCV automatically detects the hardware it is running on and selects the best implementation. You can use KleidiCV as a lightweight standalone image processing library or as part of the OpenCV library.
12+
KleidiCV includes multiple optimized implementations for each function, targeting Arm Neon, SVE2 (Scalable Vector Extension 2), and SME2 (Scalable Matrix Extension 2) instruction sets. The library automatically detects your hardware and chooses the fastest available code path, so you don't need to adjust your code for different Arm CPUs.
1313

14-
Since the Apple M4 family is based on the Armv9.2‑A architecture, it supports the Scalable Matrix Extension (SME) for accelerating matrix computations. In this Learning Path, you will build and test KleidiCV to understand how the backend implementation is called for the KleidiCV functions.
14+
You can use KleidiCV as a standalone image processing library or integrate it with OpenCV for broader computer vision support. On Apple M4 processors, which use the Armv9.2‑A architecture and support SME, you'll see improved performance for matrix operations. In this Learning Path, you'll build and test KleidiCV to observe how it selects the best backend for your hardware.
1515

16-
## Host environment
16+
## Set up your environment
1717

18-
The host machine is a MacBook Pro (Apple Silicon M4), and the operating system version is detailed below.
18+
To follow this example you'll need a MacBook Pro with an Apple Silicon M4 processor.
1919

20-
You can find this information on your Mac by selecting the **Apple menu ()** in the top-left corner of your screen, then selecting **About This Mac**. Alternatively, run the following command in a terminal:
20+
To check your operating system version, follow these steps:
21+
22+
- Select the **Apple menu ()** in the top-left corner of your screen
23+
- Select **About This Mac**
24+
- Alternatively, open a terminal and run:
2125

2226
```console
2327
sw_vers
2428
```
25-
2629
The output is similar to:
2730

2831
```output
2932
ProductName: macOS
3033
ProductVersion: 15.5
3134
BuildVersion: 24F74
3235
```
36+
### Install CMake
3337

34-
If CMake is not already installed on your host machine, you can install it using Homebrew.
38+
If CMake is not already installed on your host machine, you can install it using Homebrew:
3539

3640
```bash
3741
brew install cmake
3842
```
39-
40-
You can verify the host architecture features as outlined below, confirming that `FEAT_SME` is supported:
43+
To check which Arm architecture features your Mac supports, run the following command in your terminal:
4144

4245
```bash
4346
sysctl -a | grep hw.optional.arm.FEAT
4447
```
4548

49+
Look for `hw.optional.arm.FEAT_SME: 1` in the output. If you see this line, your system supports SME (Scalable Matrix Extension). If the value is `0`, SME is not available on your hardware.
50+
4651
The output is:
4752

4853
```output
@@ -96,11 +101,11 @@ hw.optional.arm.FEAT_SME_F64F64: 1
96101
hw.optional.arm.FEAT_SME_I16I64: 1
97102
```
98103

99-
If you don't have an M4 Mac you will not see the `FEAT_SME` flags set to 1.
104+
If your Mac does not have an M4 processor, you won't see the `FEAT_SME` flags set to `1`. In that case, SME (Scalable Matrix Extension) features are not available on your hardware, and KleidiCV will use other optimized code paths instead.
100105

101-
## Create a workspace.
106+
## Create a workspace
102107

103-
You can use an environment variable to define your workspace.
108+
You can use an environment variable to define your workspace:
104109

105110
```bash
106111
export WORKSPACE=<your-workspace-directdory>
@@ -113,18 +118,18 @@ mkdir $HOME/kleidi
113118
export WORKSPACE=$HOME/kleidi
114119
```
115120

116-
## Download the Software
121+
## Download the software
117122

118123
To set up KleidiCV and OpenCV, first download the source code from GitLab.
119124

120-
In your $WORKSPACE directory, clone KleidiCV using the v0.6.0 release tag.
125+
In your $WORKSPACE directory, clone KleidiCV using the v0.6.0 release tag:
121126

122127
```bash
123128
cd $WORKSPACE
124129
git clone -b 0.6.0 https://git.gitlab.arm.com/kleidi/kleidicv.git
125130
```
126131

127-
Clone the OpenCV repository into $WORKSPACE using the v4.12.0 release tag.
132+
Clone the OpenCV repository into $WORKSPACE using the v4.12.0 release tag:
128133

129134
```bash
130135
cd $WORKSPACE
@@ -133,25 +138,28 @@ cd opencv
133138
git checkout 4.12.0
134139
```
135140

136-
Apply the patch for OpenCV version 4.12.
141+
Apply the patch for OpenCV version 4.12:
137142

138143
```bash
139144
patch -p1 < ../kleidicv/adapters/opencv/opencv-4.12.patch
140145
patch -p1 < ../kleidicv/adapters/opencv/extra_benchmarks/opencv-4.12.patch
141146
```
142147

148+
## Build options
143149

144-
## Build Options
150+
KleidiCV provides several CMake options to control which instruction sets and features are enabled during the build.
145151

146-
* KLEIDICV_ENABLE_SVE2 - Enable Scalable Vector Extension 2 code paths. This is on by default for some popular compilers known to support SVE2 but otherwise off by default.
147-
- KLEIDICV_LIMIT_SVE2_TO_SELECTED_ALGORITHMS - Limit Scalable Vector Extension 2 code paths to cases where it is expected to provide a benefit over other code paths. On by default. Has no effect if KLEIDICV_ENABLE_SVE2 is off.
148-
* KLEIDICV_BENCHMARK - Enable building KleidiCV benchmarks. The benchmarks use Google Benchmark which will be downloaded automatically. Off by default.
149-
* KLEIDICV_ENABLE_SME2 - Enable Scalable Matrix Extension 2 and Streaming Scalable Vector Extension code paths. Off by default while the ACLE SME specification is in beta.
150-
- KLEIDICV_LIMIT_SME2_TO_SELECTED_ALGORITHMS - Limit Scalable Matrix Extension 2 code paths to cases where it is expected to provide a benefit over other code paths. On by default. Has no effect if KLEIDICV_ENABLE_SME2 is off.
152+
Here are the most important options for Arm systems:
151153

152-
{{% notice Note %}}
153-
Normally, if our tests show SVE2 or SME2 are slower than NEON, we default to NEON (unless overridden with -DKLEIDICV_LIMIT_SVE2_TO_SELECTED_ALGORITHMS=OFF or -DKLEIDICV_LIMIT_SME2_TO_SELECTED_ALGORITHMS=OFF).
154-
{{% /notice %}}
154+
- KLEIDICV_ENABLE_SVE2 enables Scalable Vector Extension 2 (SVE2) code paths. This is on by default for popular compilers that support SVE2, but off otherwise.
155+
- KLEIDICV_LIMIT_SVE2_TO_SELECTED_ALGORITHMS limits SVE2 code paths to algorithms where SVE2 is expected to outperform other options. This is on by default. It has no effect if SVE2 is disabled.
156+
- KLEIDICV_BENCHMARK enables building KleidiCV benchmarks. The benchmarks use Google Benchmark, which is downloaded automatically. This is off by default.
157+
- KLEIDICV_ENABLE_SME2 enables Scalable Matrix Extension 2 (SME2) and Streaming SVE code paths. This is off by default while the ACLE SME specification is in beta.
158+
- KLEIDICV_LIMIT_SME2_TO_SELECTED_ALGORITHMS limits SME2 code paths to cases where SME2 is expected to provide a benefit. This is on by default. It has no effect if SME2 is disabled.
159+
160+
You can set these options when running `cmake` to customize your build for your hardware and use case.
161+
162+
KleidiCV automatically selects the fastest available code path for your hardware. If the library detects that SVE2 (Scalable Vector Extension 2) or SME2 (Scalable Matrix Extension 2) is slower than NEON for a specific function, it defaults to NEON—unless you explicitly turn off this behavior by setting `-DKLEIDICV_LIMIT_SVE2_TO_SELECTED_ALGORITHMS=OFF` or `-DKLEIDICV_LIMIT_SME2_TO_SELECTED_ALGORITHMS=OFF`.
155163

156164
## Build the KleidiCV standalone
157165

@@ -180,7 +188,7 @@ ls ./build-kleidicv-benchmark-SME/benchmark/kleidicv-benchmark
180188
```
181189
## Build the OpenCV with KleidiCV
182190

183-
The following command can be used to build OpenCV with KleidiCV:
191+
You can use the following command to build OpenCV with KleidiCV:
184192

185193
```bash
186194
cmake -S $WORKSPACE/opencv \
@@ -203,4 +211,8 @@ ls build-opencv-kleidicv-sme/bin/opencv_perf_core
203211
ls build-opencv-kleidicv-sme/bin/opencv_perf_imgproc
204212
```
205213

206-
Continue to the next section to run the benchmarks and learn about SME.
214+
## What you've accomplished and what's next
215+
216+
You've successfully set up your development environment, downloaded the KleidiCV and OpenCV source code, and built both libraries with SME2 support on your Apple Silicon Mac. At this point, you have all the tools you need to explore how KleidiCV optimizes for Arm architectures.
217+
218+
In the next section, you'll run benchmarks to see SME in action and learn how KleidiCV automatically selects the best code paths for your hardware. This will help you understand the performance benefits of Arm's advanced instruction sets for computer vision workloads.

content/learning-paths/laptops-and-desktops/kleidicv-on-mac/run-test-2.md

Lines changed: 42 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -6,11 +6,16 @@ weight: 3
66
layout: learningpathall
77
---
88

9-
## Run the Test
9+
## Run the test
1010

1111
Once the build steps are complete, you can run the KleidiCV and OpenCV tests.
12+
The KleidiCV API test checks the public C++ API and confirms that the build is working as expected. To run the test, use the following command:
1213

13-
The KleidiCV API test verifies the public C++ API. You can run it as shown below. The full test log is not included for brevity:
14+
```bash
15+
./build-kleidicv-benchmark-SME/test/api/kleidicv-api-test
16+
```
17+
18+
You will see output showing the number of tests run and their results. The full test log is omitted here for clarity.
1419

1520
```bash
1621
./build-kleidicv-benchmark-SME/test/api/kleidicv-api-test
@@ -60,16 +65,17 @@ Currently, Apple Xcode is built on Clang 17. Version clang-1700.3.19.1 has an SM
6065
{{% /notice %}}
6166

6267

63-
### Run the OpenCV test
68+
## Run the OpenCV test
69+
70+
After building OpenCV with KleidiCV, you will find the test binaries in the `build-opencv-kleidicv-sme/bin/` directory. The main tool for benchmarking image processing performance is `opencv_perf_imgproc`. This utility measures both execution speed and throughput for the OpenCV `imgproc` module, including KleidiCV-accelerated operations.
6471

65-
Upon completing the build steps for OpenCV with KleidiCV, the test binaries are located in the `build-opencv-kleidicv-sme/bin/` directory. For example, `opencv_perf_imgproc` is OpenCV’s performance benchmark suite for the image processing (`imgproc`) module, which evaluates both execution speed and throughput.
72+
To focus your testing, use the `--gtest_filter` option to select specific tests and `--gtest_param_filter` to set test parameters. For example, you can run the Gaussian blur 5×5 performance test three times on a 1920x1080 grayscale image with replicated borders:
6673

67-
You can customize testing by selecting specific test filters and parameters using the `--gtest_filter` and `--gtest_param_filter` options, respectively. For instance, to run the Gaussian blur 5×5 performance tests three times with the following parameter settings:
6874
- Image size: 1920x1080 (Full HD)
69-
- Image type: 8UC1 (8-bit unsigned, single channel, grayscale)
75+
- Image type: 8UC1 (8-bit unsigned, single channel)
7076
- Border type: BORDER_REPLICATE
7177

72-
Additional test cases are available in [benchmarks.txt](https://gitlab.arm.com/kleidi/kleidicv/-/blob/0.6.0/scripts/benchmark/benchmarks.txt?ref_type=tags).
78+
You can explore additional test cases and parameter combinations in the [benchmarks.txt](https://gitlab.arm.com/kleidi/kleidicv/-/blob/0.6.0/scripts/benchmark/benchmarks.txt?ref_type=tags) file in the KleidiCV repository.
7379

7480
The command for running the test is as follows:
7581

@@ -80,7 +86,7 @@ The command for running the test is as follows:
8086
--gtest_repeat=3
8187
```
8288

83-
The output will appear as follows:
89+
The expected output is:
8490

8591
```output
8692
[ERROR:[email protected]] global persistence.cpp:566 open Can't open file: 'imgproc.xml' in read mode
@@ -422,8 +428,7 @@ kleidicv API:: kleidicv_remap_f32_u8_resolver,NEON backend.
422428
kleidicv API:: kleidicv_remap_f32_u16_resolver,NEON backend.
423429
kleidicv API:: kleidicv_warp_perspective_stripe_u8_resolver,NEON backend.
424430
```
425-
426-
The output is truncated, but you will see performance metrics for all operations at 1280x720 resolution.
431+
The output is truncated for brevity, but you will see detailed performance metrics for each operation at 1280x720 resolution. Look for lines showing the operation name, sample count, mean and median times, and standard deviation. These results help you compare the performance of different backends and confirm that SME or NEON acceleration is active.
427432

428433
## Use lldb to check the SME backend implementation
429434

@@ -446,10 +451,35 @@ lldb ./build-kleidicv-benchmark-SME/test/api/kleidicv-api-test
446451
```
447452

448453
The interactions with the `(lldb)` command line are shown below.
454+
Start by entering the following commands in the `lldb` debugger:
455+
456+
```console
457+
target create "./build-kleidicv-benchmark-SME/test/api/kleidicv-api-test"
458+
b saturating_add_abs_with_threshold
459+
run
460+
```
461+
462+
When the program stops at your breakpoint, enter:
449463

450-
Enter the `target` command followed by the `b` command for the breakpoint, and the `run` command. When the breakpoint is reached enter the `bt` command to see the stack trace followed by the `disassemble` command to display the assembly instructions in SME streaming mode. Use the `quit` command at the end to exit `lldb`.
464+
```console
465+
bt
466+
```
467+
468+
This command displays the stack trace, showing how the function was called.
469+
470+
Next, to view the assembly instructions (including SME streaming mode), enter:
471+
472+
```console
473+
disassemble --frame
474+
```
475+
476+
After you finish inspecting the output, exit `lldb` by typing:
477+
478+
```console
479+
quit
480+
```
451481

452-
Some of the paths will be different for you, but you can enter the commands and follow the output.
482+
Note: Your file paths may differ, but the sequence of commands remains the same. Enter each command as shown and review the output at each step.
453483

454484
```console
455485
target create "./build-kleidicv-benchmark-SME/test/api/kleidicv-api-test"

0 commit comments

Comments
 (0)