You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: content/learning-paths/laptops-and-desktops/kleidicv-on-mac/build-1.md
+40-25Lines changed: 40 additions & 25 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,22 +7,19 @@ layout: learningpathall
7
7
8
8
## Introduction
9
9
10
-
Arm KleidiCV is an open-source library of optimized, performance-critical routines for Arm CPUs. You can integrate it into any Computer Vision (CV) framework to get the best performance for CV workloads on Arm, with no action needed by application developers.
10
+
Arm KleidiCV is an open-source library that provides fast, optimized routines for Arm CPUs. You can use KleidiCV with any computer vision (CV) framework to boost performance for CV workloads on Arm systems.
11
11
12
-
Each KleidiCV function has different implementations targeting Neon, SVE2 (Scalable Vector Extension), or Streaming SVE and SME2 (Scalable Matrix Extension). KleidiCV automatically detects the hardware it is running on and selects the best implementation. You can use KleidiCV as a lightweight standalone image processing library or as part of the OpenCV library.
12
+
KleidiCV includes multiple optimized implementations for each function, targeting Arm Neon, SVE2 (Scalable Vector Extension 2), and SME2 (Scalable Matrix Extension 2) instruction sets. The library automatically detects your hardware and chooses the fastest available code path, so you don't need to adjust your code for different Arm CPUs.
13
13
14
-
Since the Apple M4 family is based on the Armv9.2‑A architecture, it supports the Scalable Matrix Extension (SME) for accelerating matrix computations. In this Learning Path, you will build and test KleidiCV to understand how the backend implementation is called for the KleidiCV functions.
14
+
You can use KleidiCV as a standalone image processing library or integrate it with OpenCV for broader computer vision support. On Apple M4 processors, which use the Armv9.2‑A architecture and support SME, you'll see improved performance for matrix operations. In this Learning Path, you'll build and test KleidiCV to observe how it selects the best backend for your hardware.
15
15
16
-
## Host environment
16
+
## Set up your environment
17
17
18
-
The host machine is a MacBook Pro (Apple Silicon M4), and the operating system version is detailed below.
19
-
20
-
You can find this information on your Mac by selecting the **Apple menu ()** in the top-left corner of your screen, then selecting **About This Mac**. Alternatively, run the following command in a terminal:
18
+
To follow this example you'll need a MacBook Pro with an Apple Silicon M4 processor. To check your operating system version, select the **Apple menu ()** in the top-left corner of your screen and choose **About This Mac**. Alternatively, open a terminal and run:
21
19
22
20
```console
23
21
sw_vers
24
22
```
25
-
26
23
The output is similar to:
27
24
28
25
```output
@@ -31,18 +28,19 @@ ProductVersion: 15.5
31
28
BuildVersion: 24F74
32
29
```
33
30
34
-
If CMake is not already installed on your host machine, you can install it using Homebrew.
31
+
You also need CMake. If CMake is not already installed on your host machine, you can install it using Homebrew:
35
32
36
33
```bash
37
34
brew install cmake
38
35
```
39
-
40
-
You can verify the host architecture features as outlined below, confirming that `FEAT_SME` is supported:
36
+
To check which Arm architecture features your Mac supports, run the following command in your terminal:
41
37
42
38
```bash
43
39
sysctl -a | grep hw.optional.arm.FEAT
44
40
```
45
41
42
+
Look for `hw.optional.arm.FEAT_SME: 1` in the output. If you see this line, your system supports SME (Scalable Matrix Extension). If the value is `0`, SME is not available on your hardware.
If you don't have an M4 Mac you will not see the `FEAT_SME` flags set to 1.
97
+
If your Mac does not have an M4 processor, you won't see the `FEAT_SME` flags set to `1`. In that case, SME (Scalable Matrix Extension) features are not available on your hardware, and KleidiCV will use other optimized code paths instead.
100
98
101
-
## Create a workspace.
99
+
## Create a workspace
102
100
103
-
You can use an environment variable to define your workspace.
101
+
You can use an environment variable to define your workspace:
104
102
105
103
```bash
106
104
export WORKSPACE=<your-workspace-directdory>
@@ -113,7 +111,7 @@ mkdir $HOME/kleidi
113
111
export WORKSPACE=$HOME/kleidi
114
112
```
115
113
116
-
## Download the Software
114
+
## Download the software
117
115
118
116
To set up KleidiCV and OpenCV, first download the source code from GitLab.
KleidiCV provides several CMake options to control which instruction sets and features are enabled during the build. Here are the most important options for Arm systems:
145
+
146
+
-**KLEIDICV_ENABLE_SVE2**
147
+
Enables Scalable Vector Extension 2 (SVE2) code paths. This is on by default for popular compilers that support SVE2, but off otherwise.
145
148
146
-
* KLEIDICV_ENABLE_SVE2 - Enable Scalable Vector Extension 2 code paths. This is on by default for some popular compilers known to support SVE2 but otherwise off by default.
147
-
- KLEIDICV_LIMIT_SVE2_TO_SELECTED_ALGORITHMS - Limit Scalable Vector Extension 2 code paths to cases where it is expected to provide a benefit over other code paths. On by default. Has no effect if KLEIDICV_ENABLE_SVE2 is off.
148
-
* KLEIDICV_BENCHMARK - Enable building KleidiCV benchmarks. The benchmarks use Google Benchmark which will be downloaded automatically. Off by default.
149
-
* KLEIDICV_ENABLE_SME2 - Enable Scalable Matrix Extension 2 and Streaming Scalable Vector Extension code paths. Off by default while the ACLE SME specification is in beta.
150
-
- KLEIDICV_LIMIT_SME2_TO_SELECTED_ALGORITHMS - Limit Scalable Matrix Extension 2 code paths to cases where it is expected to provide a benefit over other code paths. On by default. Has no effect if KLEIDICV_ENABLE_SME2 is off.
149
+
-**KLEIDICV_LIMIT_SVE2_TO_SELECTED_ALGORITHMS**
150
+
Limits SVE2 code paths to algorithms where SVE2 is expected to outperform other options. This is on by default. It has no effect if SVE2 is disabled.
151
+
152
+
-**KLEIDICV_BENCHMARK**
153
+
Enables building KleidiCV benchmarks. The benchmarks use Google Benchmark, which is downloaded automatically. This is off by default.
154
+
155
+
-**KLEIDICV_ENABLE_SME2**
156
+
Enables Scalable Matrix Extension 2 (SME2) and Streaming SVE code paths. This is off by default while the ACLE SME specification is in beta.
157
+
158
+
-**KLEIDICV_LIMIT_SME2_TO_SELECTED_ALGORITHMS**
159
+
Limits SME2 code paths to cases where SME2 is expected to provide a benefit. This is on by default. It has no effect if SME2 is disabled.
160
+
161
+
You can set these options when running `cmake` to customize your build for your hardware and use case.
151
162
152
163
{{% notice Note %}}
153
-
Normally, if our tests show SVE2 or SME2 are slower than NEON, we default to NEON (unless overridden with -DKLEIDICV_LIMIT_SVE2_TO_SELECTED_ALGORITHMS=OFF or -DKLEIDICV_LIMIT_SME2_TO_SELECTED_ALGORITHMS=OFF).
164
+
KleidiCV automatically selects the fastest available code path for your hardware. If the library detects that SVE2 (Scalable Vector Extension 2) or SME2 (Scalable Matrix Extension 2) is slower than NEON for a specific function, it defaults to NEON—unless you explicitly turn off this behavior by setting `-DKLEIDICV_LIMIT_SVE2_TO_SELECTED_ALGORITHMS=OFF` or `-DKLEIDICV_LIMIT_SME2_TO_SELECTED_ALGORITHMS=OFF`.
154
165
{{% /notice %}}
155
166
156
167
## Build the KleidiCV standalone
@@ -203,4 +214,8 @@ ls build-opencv-kleidicv-sme/bin/opencv_perf_core
203
214
ls build-opencv-kleidicv-sme/bin/opencv_perf_imgproc
204
215
```
205
216
206
-
Continue to the next section to run the benchmarks and learn about SME.
217
+
## What you've accomplished and what's next
218
+
219
+
You've successfully set up your development environment, downloaded the KleidiCV and OpenCV source code, and built both libraries with SME2 support on your Apple Silicon Mac. At this point, you have all the tools you need to explore how KleidiCV optimizes for Arm architectures.
220
+
221
+
In the next section, you'll run benchmarks to see SME in action and learn how KleidiCV automatically selects the best code paths for your hardware. This will help you understand the performance benefits of Arm's advanced instruction sets for computer vision workloads.
@@ -60,16 +65,17 @@ Currently, Apple Xcode is built on Clang 17. Version clang-1700.3.19.1 has an SM
60
65
{{% /notice %}}
61
66
62
67
63
-
### Run the OpenCV test
68
+
## Run the OpenCV test
69
+
70
+
After building OpenCV with KleidiCV, you will find the test binaries in the `build-opencv-kleidicv-sme/bin/` directory. The main tool for benchmarking image processing performance is `opencv_perf_imgproc`. This utility measures both execution speed and throughput for the OpenCV `imgproc` module, including KleidiCV-accelerated operations.
64
71
65
-
Upon completing the build steps for OpenCV with KleidiCV, the test binaries are located in the `build-opencv-kleidicv-sme/bin/` directory. For example, `opencv_perf_imgproc` is OpenCV’s performance benchmark suite for the image processing (`imgproc`) module, which evaluates both execution speed and throughput.
72
+
To focus your testing, use the `--gtest_filter` option to select specific tests and `--gtest_param_filter` to set test parameters. For example, you can run the Gaussian blur 5×5 performance test three times on a 1920x1080 grayscale image with replicated borders:
66
73
67
-
You can customize testing by selecting specific test filters and parameters using the `--gtest_filter` and `--gtest_param_filter` options, respectively. For instance, to run the Gaussian blur 5×5 performance tests three times with the following parameter settings:
68
74
- Image size: 1920x1080 (Full HD)
69
-
- Image type: 8UC1 (8-bit unsigned, single channel, grayscale)
75
+
- Image type: 8UC1 (8-bit unsigned, single channel)
70
76
- Border type: BORDER_REPLICATE
71
77
72
-
Additional test cases are available in [benchmarks.txt](https://gitlab.arm.com/kleidi/kleidicv/-/blob/0.6.0/scripts/benchmark/benchmarks.txt?ref_type=tags).
78
+
You can explore additional test cases and parameter combinations in the [benchmarks.txt](https://gitlab.arm.com/kleidi/kleidicv/-/blob/0.6.0/scripts/benchmark/benchmarks.txt?ref_type=tags) file in the KleidiCV repository.
0 commit comments