- 
                Notifications
    You must be signed in to change notification settings 
- Fork 5
How to Build
This guide describes the steps to build Android/Windows releases of the QNN backend for llama.cpp.
- Install the latest Docker Engine following the official steps: Install Docker Engine
- Clone the llama-cpp-qnn-builder repository
git clone https://github.com/chraac/llama-cpp-qnn-builder.git cd llama-cpp-qnn-builder
Note: Please update to the latest
mainbranch as we're using NDK r23. There are optimization flags that weren't correctly applied inReleasebuilds in earlier versions. See: https://github.com/android/ndk/issues/1740
- 
Navigate to the project root directory and run the build script: ./docker/docker_compose_compile_and_share.sh 
- 
The console output will look similar to this, and executables will be located in build_qnn_arm64-v8a/bin/: 
| Parameter | Short | Description | Default | 
|---|---|---|---|
| --rebuild | -r | Force rebuild of the project | false | 
| --repo-dir | Specify llama.cpp repository directory | ../llama.cpp | |
| --debug | -d | Build in Debug mode | Release | 
| --print-build-time | Display build and test execution times | false | |
| --asan | Enable AddressSanitizer | false | |
| --build-linux-x64 | Build for Linux x86_64 platform | android arm64-v8a | |
| --perf-log | Enable Hexagon performance tracking | false | |
| --enable-hexagon-backend | Enable Hexagon backend support | false | |
| --hexagon-npu-only | Build Hexagon NPU backend only | false | |
| --disable-hexagon-and-qnn | Disable both Hexagon and QNN backends | false | |
| --qnn-only | Build QNN backend only | false | |
| --enable-dequant | Enable quantized tensor support in Hexagon | false | 
# Basic build (default: Release mode, QNN + Hexagon backends)
./docker/docker_compose_compile_and_share.sh
# Debug build with Hexagon NPU backend
./docker/docker_compose_compile_and_share.sh -d --enable-hexagon-backend
# Debug build with Hexagon NPU backend only
./docker/docker_compose_compile_and_share.sh -d --hexagon-npu-only
# Debug build with Hexagon NPU backend and quantized tensor support
./docker/docker_compose_compile_and_share.sh -d --hexagon-npu-only --enable-dequant
# QNN-only build with performance logging
./docker/docker_compose_compile_and_share.sh --qnn-only --perf-log
# Force rebuild with debug symbols and build timing
./docker/docker_compose_compile_and_share.sh -r -d --print-build-time- 
Download Qualcomm AI Engine Direct SDK - Get it from Qualcomm Developer Portal
- Extract to a folder (e.g., C:/ml/qnn_sdk/qairt/2.31.0.250130/)
 
- 
Install Visual Studio 2022 - Ensure the following components are installed:
- 
Clang toolchain for ARM64 compilation  
- 
CMake tools for Visual Studio  
 
- 
 
- Ensure the following components are installed:
- 
Open the Project - Launch Visual Studio 2022
- Click Continue without code
- Go to File→Open→CMake
- Navigate to the llama.cpproot directory and selectCMakeLists.txt
 
- 
Configure CMake Presets Edit llama.cpp/CMakePresets.jsonand modify thearm64-windows-llvmconfiguration:{ "name": "arm64-windows-llvm", "hidden": true, "architecture": { "value": "arm64", "strategy": "external" }, "toolset": { "value": "host=x64", "strategy": "external" }, "cacheVariables": { - "CMAKE_TOOLCHAIN_FILE": "${sourceDir}/cmake/arm64-windows-llvm.cmake" + "CMAKE_TOOLCHAIN_FILE": "${sourceDir}/cmake/arm64-windows-llvm.cmake", + "GGML_QNN": "ON", + "GGML_QNN_SDK_PATH": "C:/ml/qnn_sdk/qairt/2.31.0.250130/", + "BUILD_SHARED_LIBS": "OFF" } },Important: Replace C:/ml/qnn_sdk/qairt/2.31.0.250130/with your actual QNN SDK path.
- 
Select Build Configuration - In Visual Studio, select the arm64-windows-llvm-debugconfiguration from the dropdown
  
- In Visual Studio, select the 
- 
Build the Project - Go to Build→Build All
- Output files will be located in build-arm64-windows-llvm-debug/bin/
 
- Go to 
After successful compilation, you'll find the following executables:
- 
llama-cli.exe- Main inference executable
- 
llama-bench.exe- Benchmarking tool
- 
test-backend-ops.exe- Backend operation tests