|  | 
|  | 1 | +> [!IMPORTANT] | 
|  | 2 | +> This build documentation is specific only to RISC-V SpacemiT SOCs. | 
|  | 3 | +
 | 
|  | 4 | +## Build llama.cpp locally (for riscv64) | 
|  | 5 | + | 
|  | 6 | +1. Prepare Toolchain For RISCV | 
|  | 7 | +~~~ | 
|  | 8 | +wget https://archive.spacemit.com/toolchain/spacemit-toolchain-linux-glibc-x86_64-v1.1.2.tar.xz | 
|  | 9 | +~~~ | 
|  | 10 | + | 
|  | 11 | +2. Build | 
|  | 12 | +Below is the build script: it requires utilizing RISC-V vector instructions for acceleration. Ensure the `GGML_CPU_RISCV64_SPACEMIT` compilation option is enabled. The currently supported optimization version is `RISCV64_SPACEMIT_IME1`, corresponding to the `RISCV64_SPACEMIT_IME_SPEC` compilation option. Compiler configurations are defined in the `riscv64-spacemit-linux-gnu-gcc.cmake` file. Please ensure you have installed the RISC-V compiler and set the environment variable via `export RISCV_ROOT_PATH={your_compiler_path}`. | 
|  | 13 | +```bash | 
|  | 14 | + | 
|  | 15 | +cmake -B build \ | 
|  | 16 | +    -DCMAKE_BUILD_TYPE=Release \ | 
|  | 17 | +    -DGGML_CPU_RISCV64_SPACEMIT=ON \ | 
|  | 18 | +    -DLLAMA_CURL=OFF \ | 
|  | 19 | +    -DGGML_RVV=ON \ | 
|  | 20 | +    -DGGML_RV_ZFH=ON \ | 
|  | 21 | +    -DGGML_RV_ZICBOP=ON \ | 
|  | 22 | +    -DRISCV64_SPACEMIT_IME_SPEC=RISCV64_SPACEMIT_IME1 \ | 
|  | 23 | +    -DCMAKE_TOOLCHAIN_FILE=${PWD}/cmake/riscv64-spacemit-linux-gnu-gcc.cmake \ | 
|  | 24 | +    -DCMAKE_INSTALL_PREFIX=build/installed | 
|  | 25 | + | 
|  | 26 | +cmake --build build --parallel $(nproc) --config Release | 
|  | 27 | + | 
|  | 28 | +pushd build | 
|  | 29 | +make install | 
|  | 30 | +popd | 
|  | 31 | +``` | 
|  | 32 | + | 
|  | 33 | +## Simulation | 
|  | 34 | +You can use QEMU to perform emulation on non-RISC-V architectures. | 
|  | 35 | + | 
|  | 36 | +1. Download QEMU | 
|  | 37 | +~~~ | 
|  | 38 | +wget https://archive.spacemit.com/spacemit-ai/qemu/jdsk-qemu-v0.0.14.tar.gz | 
|  | 39 | +~~~ | 
|  | 40 | + | 
|  | 41 | +2. Run Simulation | 
|  | 42 | +After build your llama.cpp, you can run the executable file via QEMU for simulation, for example: | 
|  | 43 | +~~~ | 
|  | 44 | +export QEMU_ROOT_PATH={your QEMU file path} | 
|  | 45 | +export RISCV_ROOT_PATH_IME1={your RISC-V compiler path} | 
|  | 46 | +
 | 
|  | 47 | +${QEMU_ROOT_PATH}/bin/qemu-riscv64 -L ${RISCV_ROOT_PATH_IME1}/sysroot -cpu max,vlen=256,elen=64,vext_spec=v1.0 ${PWD}/build/bin/llama-cli -m ${PWD}/models/Qwen2.5-0.5B-Instruct-Q4_0.gguf -t 1 | 
|  | 48 | +~~~ | 
|  | 49 | +## Performance | 
|  | 50 | +#### Quantization Support For Matrix | 
|  | 51 | +~~~ | 
|  | 52 | +model name      : Spacemit(R) X60 | 
|  | 53 | +isa             : rv64imafdcv_zicbom_zicboz_zicntr_zicond_zicsr_zifencei_zihintpause_zihpm_zfh_zfhmin_zca_zcd_zba_zbb_zbc_zbs_zkt_zve32f_zve32x_zve64d_zve64f_zve64x_zvfh_zvfhmin_zvkt_sscofpmf_sstc_svinval_svnapot_svpbmt | 
|  | 54 | +mmu             : sv39 | 
|  | 55 | +uarch           : spacemit,x60 | 
|  | 56 | +mvendorid       : 0x710 | 
|  | 57 | +marchid         : 0x8000000058000001 | 
|  | 58 | +~~~ | 
|  | 59 | + | 
|  | 60 | +Q4_0 | 
|  | 61 | +|   Model    |   Size   | Params | backend | threads | test | t/s | | 
|  | 62 | +| -----------| -------- | ------ | ------- | ------- | ---- |------| | 
|  | 63 | +Qwen2.5 0.5B |403.20 MiB|630.17 M|   cpu   |    4    | pp512|64.12 ± 0.26| | 
|  | 64 | +Qwen2.5 0.5B |403.20 MiB|630.17 M|   cpu   |    4    | tg128|10.03 ± 0.01| | 
|  | 65 | +Qwen2.5 1.5B |1011.16 MiB| 1.78 B |   cpu   |    4    | pp512|24.16 ± 0.02| | 
|  | 66 | +Qwen2.5 1.5B |1011.16 MiB| 1.78 B |   cpu   |    4    | tg128|3.83 ± 0.06| | 
|  | 67 | +Qwen2.5 3B   | 1.86 GiB  | 3.40 B |   cpu   |    4    | pp512|12.08 ± 0.02| | 
|  | 68 | +Qwen2.5 3B   | 1.86 GiB  | 3.40 B |   cpu   |    4    | tg128|2.23 ± 0.02| | 
|  | 69 | + | 
|  | 70 | +Q4_1 | 
|  | 71 | +|   Model    |   Size   | Params | backend | threads | test | t/s | | 
|  | 72 | +| -----------| -------- | ------ | ------- | ------- | ---- |------| | 
|  | 73 | +Qwen2.5 0.5B |351.50 MiB|494.03 M|   cpu   |    4    | pp512|62.07 ± 0.12| | 
|  | 74 | +Qwen2.5 0.5B |351.50 MiB|494.03 M|   cpu   |    4    | tg128|9.91 ± 0.01| | 
|  | 75 | +Qwen2.5 1.5B |964.06 MiB| 1.54 B |   cpu   |    4    | pp512|22.95 ± 0.25| | 
|  | 76 | +Qwen2.5 1.5B |964.06 MiB| 1.54 B |   cpu   |    4    | tg128|4.01 ± 0.15| | 
|  | 77 | +Qwen2.5 3B   | 1.85 GiB | 3.09 B |   cpu   |    4    | pp512|11.55 ± 0.16| | 
|  | 78 | +Qwen2.5 3B   | 1.85 GiB | 3.09 B |   cpu   |    4    | tg128|2.25 ± 0.04| | 
|  | 79 | + | 
|  | 80 | + | 
|  | 81 | +Q4_K | 
|  | 82 | +|   Model    |   Size   | Params | backend | threads | test | t/s | | 
|  | 83 | +| -----------| -------- | ------ | ------- | ------- | ---- |------| | 
|  | 84 | +Qwen2.5 0.5B |462.96 MiB|630.17 M|   cpu   |    4    | pp512|9.29 ± 0.05| | 
|  | 85 | +Qwen2.5 0.5B |462.96 MiB|630.17 M|   cpu   |    4    | tg128|5.67 ± 0.04| | 
|  | 86 | +Qwen2.5 1.5B | 1.04 GiB | 1.78 B |   cpu   |    4    | pp512|10.38 ± 0.10| | 
|  | 87 | +Qwen2.5 1.5B | 1.04 GiB | 1.78 B |   cpu   |    4    | tg128|3.17 ± 0.08| | 
|  | 88 | +Qwen2.5 3B   | 1.95 GiB | 3.40 B |   cpu   |    4    | pp512|4.23 ± 0.04| | 
|  | 89 | +Qwen2.5 3B   | 1.95 GiB | 3.40 B |   cpu   |    4    | tg128|1.73 ± 0.00| | 
0 commit comments