You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: content/learning-paths/laptops-and-desktops/kleidicv-on-mac/build-1.md
+40-28Lines changed: 40 additions & 28 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,42 +7,47 @@ layout: learningpathall
7
7
8
8
## Introduction
9
9
10
-
Arm KleidiCV is an open-source library of optimized, performance-critical routines for Arm CPUs. You can integrate it into any Computer Vision (CV) framework to get the best performance for CV workloads on Arm, with no action needed by application developers.
10
+
Arm KleidiCV is an open-source library that provides fast, optimized routines for Arm CPUs. You can use KleidiCV with any computer vision (CV) framework to boost performance for CV workloads on Arm systems.
11
11
12
-
Each KleidiCV function has different implementations targeting Neon, SVE2 (Scalable Vector Extension), or Streaming SVE and SME2 (Scalable Matrix Extension). KleidiCV automatically detects the hardware it is running on and selects the best implementation. You can use KleidiCV as a lightweight standalone image processing library or as part of the OpenCV library.
12
+
KleidiCV includes multiple optimized implementations for each function, targeting Arm Neon, SVE2 (Scalable Vector Extension 2), and SME2 (Scalable Matrix Extension 2) instruction sets. The library automatically detects your hardware and chooses the fastest available code path, so you don't need to adjust your code for different Arm CPUs.
13
13
14
-
Since the Apple M4 family is based on the Armv9.2‑A architecture, it supports the Scalable Matrix Extension (SME) for accelerating matrix computations. In this Learning Path, you will build and test KleidiCV to understand how the backend implementation is called for the KleidiCV functions.
14
+
You can use KleidiCV as a standalone image processing library or integrate it with OpenCV for broader computer vision support. On Apple M4 processors, which use the Armv9.2‑A architecture and support SME, you'll see improved performance for matrix operations. In this Learning Path, you'll build and test KleidiCV to observe how it selects the best backend for your hardware.
15
15
16
-
## Host environment
16
+
## Set up your environment
17
17
18
-
The host machine is a MacBook Pro (Apple Silicon M4), and the operating system version is detailed below.
18
+
To follow this example you'll need a MacBook Pro with an Apple Silicon M4 processor.
19
19
20
-
You can find this information on your Mac by selecting the **Apple menu ()** in the top-left corner of your screen, then selecting **About This Mac**. Alternatively, run the following command in a terminal:
20
+
To check your operating system version, follow these steps:
21
+
22
+
- Select the **Apple menu ()** in the top-left corner of your screen
23
+
- Select **About This Mac**
24
+
- Alternatively, open a terminal and run:
21
25
22
26
```console
23
27
sw_vers
24
28
```
25
-
26
29
The output is similar to:
27
30
28
31
```output
29
32
ProductName: macOS
30
33
ProductVersion: 15.5
31
34
BuildVersion: 24F74
32
35
```
36
+
### Install CMake
33
37
34
-
If CMake is not already installed on your host machine, you can install it using Homebrew.
38
+
If CMake is not already installed on your host machine, you can install it using Homebrew:
35
39
36
40
```bash
37
41
brew install cmake
38
42
```
39
-
40
-
You can verify the host architecture features as outlined below, confirming that `FEAT_SME` is supported:
43
+
To check which Arm architecture features your Mac supports, run the following command in your terminal:
41
44
42
45
```bash
43
46
sysctl -a | grep hw.optional.arm.FEAT
44
47
```
45
48
49
+
Look for `hw.optional.arm.FEAT_SME: 1` in the output. If you see this line, your system supports SME (Scalable Matrix Extension). If the value is `0`, SME is not available on your hardware.
If you don't have an M4 Mac you will not see the `FEAT_SME` flags set to 1.
104
+
If your Mac does not have an M4 processor, you won't see the `FEAT_SME` flags set to `1`. In that case, SME (Scalable Matrix Extension) features are not available on your hardware, and KleidiCV will use other optimized code paths instead.
100
105
101
-
## Create a workspace.
106
+
## Create a workspace
102
107
103
-
You can use an environment variable to define your workspace.
108
+
You can use an environment variable to define your workspace:
104
109
105
110
```bash
106
111
export WORKSPACE=<your-workspace-directdory>
@@ -113,18 +118,18 @@ mkdir $HOME/kleidi
113
118
export WORKSPACE=$HOME/kleidi
114
119
```
115
120
116
-
## Download the Software
121
+
## Download the software
117
122
118
123
To set up KleidiCV and OpenCV, first download the source code from GitLab.
119
124
120
-
In your $WORKSPACE directory, clone KleidiCV using the v0.6.0 release tag.
125
+
In your $WORKSPACE directory, clone KleidiCV using the v0.6.0 release tag:
KleidiCV provides several CMake options to control which instruction sets and features are enabled during the build.
145
151
146
-
* KLEIDICV_ENABLE_SVE2 - Enable Scalable Vector Extension 2 code paths. This is on by default for some popular compilers known to support SVE2 but otherwise off by default.
147
-
- KLEIDICV_LIMIT_SVE2_TO_SELECTED_ALGORITHMS - Limit Scalable Vector Extension 2 code paths to cases where it is expected to provide a benefit over other code paths. On by default. Has no effect if KLEIDICV_ENABLE_SVE2 is off.
148
-
* KLEIDICV_BENCHMARK - Enable building KleidiCV benchmarks. The benchmarks use Google Benchmark which will be downloaded automatically. Off by default.
149
-
* KLEIDICV_ENABLE_SME2 - Enable Scalable Matrix Extension 2 and Streaming Scalable Vector Extension code paths. Off by default while the ACLE SME specification is in beta.
150
-
- KLEIDICV_LIMIT_SME2_TO_SELECTED_ALGORITHMS - Limit Scalable Matrix Extension 2 code paths to cases where it is expected to provide a benefit over other code paths. On by default. Has no effect if KLEIDICV_ENABLE_SME2 is off.
152
+
Here are the most important options for Arm systems:
151
153
152
-
{{% notice Note %}}
153
-
Normally, if our tests show SVE2 or SME2 are slower than NEON, we default to NEON (unless overridden with -DKLEIDICV_LIMIT_SVE2_TO_SELECTED_ALGORITHMS=OFF or -DKLEIDICV_LIMIT_SME2_TO_SELECTED_ALGORITHMS=OFF).
154
-
{{% /notice %}}
154
+
- KLEIDICV_ENABLE_SVE2 enables Scalable Vector Extension 2 (SVE2) code paths. This is on by default for popular compilers that support SVE2, but off otherwise.
155
+
- KLEIDICV_LIMIT_SVE2_TO_SELECTED_ALGORITHMS limits SVE2 code paths to algorithms where SVE2 is expected to outperform other options. This is on by default. It has no effect if SVE2 is disabled.
156
+
- KLEIDICV_BENCHMARK enables building KleidiCV benchmarks. The benchmarks use Google Benchmark, which is downloaded automatically. This is off by default.
157
+
- KLEIDICV_ENABLE_SME2 enables Scalable Matrix Extension 2 (SME2) and Streaming SVE code paths. This is off by default while the ACLE SME specification is in beta.
158
+
- KLEIDICV_LIMIT_SME2_TO_SELECTED_ALGORITHMS limits SME2 code paths to cases where SME2 is expected to provide a benefit. This is on by default. It has no effect if SME2 is disabled.
159
+
160
+
You can set these options when running `cmake` to customize your build for your hardware and use case.
161
+
162
+
KleidiCV automatically selects the fastest available code path for your hardware. If the library detects that SVE2 (Scalable Vector Extension 2) or SME2 (Scalable Matrix Extension 2) is slower than NEON for a specific function, it defaults to NEON—unless you explicitly turn off this behavior by setting `-DKLEIDICV_LIMIT_SVE2_TO_SELECTED_ALGORITHMS=OFF` or `-DKLEIDICV_LIMIT_SME2_TO_SELECTED_ALGORITHMS=OFF`.
155
163
156
164
## Build the KleidiCV standalone
157
165
@@ -180,7 +188,7 @@ ls ./build-kleidicv-benchmark-SME/benchmark/kleidicv-benchmark
180
188
```
181
189
## Build the OpenCV with KleidiCV
182
190
183
-
The following command can be used to build OpenCV with KleidiCV:
191
+
You can use the following command to build OpenCV with KleidiCV:
184
192
185
193
```bash
186
194
cmake -S $WORKSPACE/opencv \
@@ -203,4 +211,8 @@ ls build-opencv-kleidicv-sme/bin/opencv_perf_core
203
211
ls build-opencv-kleidicv-sme/bin/opencv_perf_imgproc
204
212
```
205
213
206
-
Continue to the next section to run the benchmarks and learn about SME.
214
+
## What you've accomplished and what's next
215
+
216
+
You've successfully set up your development environment, downloaded the KleidiCV and OpenCV source code, and built both libraries with SME2 support on your Apple Silicon Mac. At this point, you have all the tools you need to explore how KleidiCV optimizes for Arm architectures.
217
+
218
+
In the next section, you'll run benchmarks to see SME in action and learn how KleidiCV automatically selects the best code paths for your hardware. This will help you understand the performance benefits of Arm's advanced instruction sets for computer vision workloads.
@@ -60,16 +65,17 @@ Currently, Apple Xcode is built on Clang 17. Version clang-1700.3.19.1 has an SM
60
65
{{% /notice %}}
61
66
62
67
63
-
### Run the OpenCV test
68
+
## Run the OpenCV test
69
+
70
+
After building OpenCV with KleidiCV, you will find the test binaries in the `build-opencv-kleidicv-sme/bin/` directory. The main tool for benchmarking image processing performance is `opencv_perf_imgproc`. This utility measures both execution speed and throughput for the OpenCV `imgproc` module, including KleidiCV-accelerated operations.
64
71
65
-
Upon completing the build steps for OpenCV with KleidiCV, the test binaries are located in the `build-opencv-kleidicv-sme/bin/` directory. For example, `opencv_perf_imgproc` is OpenCV’s performance benchmark suite for the image processing (`imgproc`) module, which evaluates both execution speed and throughput.
72
+
To focus your testing, use the `--gtest_filter` option to select specific tests and `--gtest_param_filter` to set test parameters. For example, you can run the Gaussian blur 5×5 performance test three times on a 1920x1080 grayscale image with replicated borders:
66
73
67
-
You can customize testing by selecting specific test filters and parameters using the `--gtest_filter` and `--gtest_param_filter` options, respectively. For instance, to run the Gaussian blur 5×5 performance tests three times with the following parameter settings:
68
74
- Image size: 1920x1080 (Full HD)
69
-
- Image type: 8UC1 (8-bit unsigned, single channel, grayscale)
75
+
- Image type: 8UC1 (8-bit unsigned, single channel)
70
76
- Border type: BORDER_REPLICATE
71
77
72
-
Additional test cases are available in [benchmarks.txt](https://gitlab.arm.com/kleidi/kleidicv/-/blob/0.6.0/scripts/benchmark/benchmarks.txt?ref_type=tags).
78
+
You can explore additional test cases and parameter combinations in the [benchmarks.txt](https://gitlab.arm.com/kleidi/kleidicv/-/blob/0.6.0/scripts/benchmark/benchmarks.txt?ref_type=tags) file in the KleidiCV repository.
73
79
74
80
The command for running the test is as follows:
75
81
@@ -80,7 +86,7 @@ The command for running the test is as follows:
80
86
--gtest_repeat=3
81
87
```
82
88
83
-
The output will appear as follows:
89
+
The expected output is:
84
90
85
91
```output
86
92
[ERROR:[email protected]] global persistence.cpp:566 open Can't open file: 'imgproc.xml' in read mode
The output is truncated, but you will see performance metrics for all operations at 1280x720 resolution.
431
+
The output is truncated for brevity, but you will see detailed performance metrics for each operation at 1280x720 resolution. Look for lines showing the operation name, sample count, mean and median times, and standard deviation. These results help you compare the performance of different backends and confirm that SME or NEON acceleration is active.
427
432
428
433
## Use lldb to check the SME backend implementation
Enter the `target` command followed by the `b` command for the breakpoint, and the `run` command. When the breakpoint is reached enter the `bt` command to see the stack trace followed by the `disassemble` command to display the assembly instructions in SME streaming mode. Use the `quit` command at the end to exit `lldb`.
464
+
```console
465
+
bt
466
+
```
467
+
468
+
This command displays the stack trace, showing how the function was called.
469
+
470
+
Next, to view the assembly instructions (including SME streaming mode), enter:
471
+
472
+
```console
473
+
disassemble --frame
474
+
```
475
+
476
+
After you finish inspecting the output, exit `lldb` by typing:
477
+
478
+
```console
479
+
quit
480
+
```
451
481
452
-
Some of the paths will be different for you, but you can enter the commands and follow the output.
482
+
Note: Your file paths may differ, but the sequence of commands remains the same. Enter each command as shown and review the output at each step.
0 commit comments