- 
                Notifications
    You must be signed in to change notification settings 
- Fork 5
How to Build
This guide describes how to build Android and Windows versions of the QNN backend for llama.cpp, enabling efficient inference on Qualcomm hardware.
- 
Docker Engine - Install following the official Docker guide
- Ensure Docker Compose is included with your installation
 
- 
Source Code - Clone the repository:
git clone https://github.com/chraac/llama-cpp-qnn-builder.git cd llama-cpp-qnn-builder
 
- Clone the repository:
Note: Use the latest
mainbranch as we're using NDK r27c with important optimization flags for Release builds.
- 
Basic Build - Navigate to the project root directory:
./docker/docker_compose_compile.sh 
 
- Navigate to the project root directory:
- 
Build Output - Executables will be in build_qnn_arm64-v8a/bin/
- The console will show build progress and completion status:
  
- Executables will be in 
| Parameter | Short | Description | Default | 
|---|---|---|---|
| --rebuild | -r | Force rebuild of the project | false | 
| --repo-dir | Specify llama.cpp repository directory | ../llama.cpp | |
| --debug | -d | Build in Debug mode | Release | 
| --asan | Enable AddressSanitizer | false | |
| --build-linux-x64 | Build for Linux x86_64 platform | android arm64-v8a | |
| --perf-log | Enable Hexagon performance tracking | false | |
| --enable-hexagon-backend | Enable Hexagon backend support | false | |
| --hexagon-npu-only | Build Hexagon NPU backend only | false | |
| --disable-hexagon-and-qnn | Disable both Hexagon and QNN backends | false | |
| --qnn-only | Build QNN backend only | false | |
| --enable-dequant | Enable quantized tensor support in Hexagon | false | 
# Basic build (default: Release mode, QNN + Hexagon backends)
./docker/docker_compose_compile.sh
# Debug build with Hexagon NPU backend
./docker/docker_compose_compile.sh -d --enable-hexagon-backend
# Debug build with Hexagon NPU backend only
./docker/docker_compose_compile.sh -d --hexagon-npu-only
# Debug build with Hexagon NPU backend and quantized tensor support
./docker/docker_compose_compile.sh -d --hexagon-npu-only --enable-dequant
# QNN-only build with performance logging
./docker/docker_compose_compile.sh --qnn-only --perf-log
# Force rebuild with debug symbols
./docker/docker_compose_compile.sh -r -dTo build with Hexagon NPU backend support, you need to create a Docker image that includes the Hexagon SDK.
- 
Hexagon SDK - Option 1: Download SDK from Hexagon NPU SDK - Getting started (version 6.3.0.0 for Linux)
- Option 2: Use an existing SDK installation
 
- 
Base Docker Image - Required image: chraac/llama-cpp-qnn-builder:2.36.0.250627-ndk-r27
- Contains Android NDK r27c and build tools
 
- Required image: 
If you already have the Hexagon SDK extracted on your machine:
- 
Create Dockerfile (save as Dockerfile.hexagon_sdk.local):FROM chraac/llama-cpp-qnn-builder:2.36.0.250627-ndk-r27 ENV HEXAGON_SDK_VERSION='6.3.0.0' ENV HEXAGON_SDK_BASE=/local/mnt/workspace/Qualcomm/Hexagon_SDK ENV HEXAGON_SDK_PATH=${HEXAGON_SDK_BASE}/${HEXAGON_SDK_VERSION} ENV ANDROID_NDK_HOME=/android-ndk/android-ndk-r27c ENV ANDROID_ROOT_DIR=${ANDROID_NDK_HOME}/ RUN mkdir -p ${HEXAGON_SDK_PATH} ARG LOCAL_SDK_PATH ADD ${LOCAL_SDK_PATH} ${HEXAGON_SDK_PATH}/6.3.0.0 # Install required dependencies RUN apt update && apt install -y \ python-is-python3 \ libncurses5 \ lsb-base \ lsb-release \ sqlite3 \ rsync \ git \ build-essential \ libc++-dev \ clang \ cmake # Dummy version info for hexagon-sdk RUN echo 'VERSION_ID="20.04"' > /etc/os-release 
- 
Create Setup Script (save as docker_compose_hexagon_local.sh):#!/bin/bash # Check if SDK path is provided if [ -z "$1" ]; then echo "Usage: $0 /path/to/hexagon/sdk/6.3.0.0" exit 1 fi SDK_PATH="$1" # Check if SDK path exists if [ ! -d "$SDK_PATH" ]; then echo "Error: SDK path does not exist: $SDK_PATH" exit 1 fi # Build the Docker image with SDK embedded docker build -f Dockerfile.hexagon_sdk.local --build-arg LOCAL_SDK_PATH="$SDK_PATH" -t llama-cpp-qnn-hexagon:embedded . # Create a Docker Compose configuration file cat > docker-compose.hexagon.yml << EOF version: '3' services: hexagon-builder: image: llama-cpp-qnn-hexagon:embedded volumes: - ./:/workspace working_dir: /workspace EOF echo "Setup complete! Use the following command to compile with Hexagon support:" echo "./docker/docker_compose_compile.sh --enable-hexagon-backend" 
- 
Run Setup: chmod +x docker_compose_hexagon_local.sh ./docker_compose_hexagon_local.sh /path/to/your/Hexagon_SDK/6.3.0.0 
- 
Build with Hexagon Support: # Enable Hexagon NPU backend ./docker/docker_compose_compile.sh --enable-hexagon-backend # Or build with Hexagon NPU backend only ./docker/docker_compose_compile.sh --hexagon-npu-only # Access container shell for manual builds docker-compose -f docker-compose.hexagon.yml run --rm hexagon-builder bash 
- 
Qualcomm AI Engine Direct SDK - Download from Qualcomm Developer Portal
- Extract to a folder (example: C:/ml/qnn_sdk/qairt/2.31.0.250130/)
 
- 
Visual Studio 2022 - Required components:
- 
Clang toolchain for ARM64 compilation  
- 
CMake tools for Visual Studio  
 
- 
 
- Required components:
- 
Hexagon SDK (optional, only for Hexagon NPU backend) - Follow Hexagon NPU SDK - Getting started
- Install Qualcomm Package Manager (QPM) first
- Use QPM to install the Hexagon SDK
- Set environment variable HEXAGON_SDK_ROOTto your installation directory
 
- 
Open Project - Launch Visual Studio 2022
- Click Continue without code
- Navigate to File→Open→CMake
- Select CMakeLists.txtin the llama.cpp root directory
 
- 
Configure CMake Edit llama.cpp/CMakePresets.jsonto modify thearm64-windows-llvmconfiguration:{ "name": "arm64-windows-llvm", "hidden": true, "architecture": { "value": "arm64", "strategy": "external" }, "toolset": { "value": "host=x64", "strategy": "external" }, "cacheVariables": { - "CMAKE_TOOLCHAIN_FILE": "${sourceDir}/cmake/arm64-windows-llvm.cmake" + "CMAKE_TOOLCHAIN_FILE": "${sourceDir}/cmake/arm64-windows-llvm.cmake", + "GGML_QNN": "ON", + "GGML_QNN_SDK_PATH": "C:/ml/qnn_sdk/qairt/2.31.0.250130/", + "BUILD_SHARED_LIBS": "OFF" } },Important: Replace the QNN SDK path with your actual installation path. 
- 
Select Configuration - Choose arm64-windows-llvm-debugconfiguration from the dropdown menu
  
- Choose 
- 
Build - Select Build→Build All
- Output will be in build-arm64-windows-llvm-debug/bin/
 
- Select 
After successful compilation, you'll have these executables:
- 
llama-cli.exe- Main inference executable
- 
llama-bench.exe- Benchmarking tool
- 
test-backend-ops.exe- Backend operation tests
- 
Docker Permission Issues - Add your user to the docker group:
sudo usermod -aG docker $USER # Log out and back in for changes to take effect 
 
- Add your user to the docker group:
- 
Hexagon SDK Compatibility - Verify you're using exactly version 6.3.0.0 of the SDK
- Ensure SDK directory permissions allow Docker container access
 
- 
Build Failures - Check Docker logs for detailed error messages:
docker-compose -f docker-compose.hexagon.yml logs 
 
- Check Docker logs for detailed error messages: