-
Notifications
You must be signed in to change notification settings - Fork 15
Running an AI example with Walrus project
Zoltan Herczeg edited this page Oct 13, 2025
·
2 revisions
The llama project ( https://www.llama.com/ ) was chosen for running the model. A few changes were made since wasm is not a regular target for the project:
- a simple build script were made, because the default one does not support wasm (with emscripten) compilation
- the following features were disabled: llama_curl, multithreading, and ggml_native
- some (minor) code changes were made: disabling process priority setting, fixed some include errors, emulating mmap (wasm has no concept of mmap), implement a few missing libc functions
Since Memory64 has not been supported by Walrus yet, the maximum available memory is 4GB, which limits the models that can be used. The following model were selected: lille-130m-instruct-f16.gguf:
https://huggingface.co/Nikity/lille-130m-instruct
The x86 measurements were made on an Intel I7-7700 @ 3.6 GHz cpu with 32GB RAM.
64 bit results:
- Interpreter: 234s
- JIT no-reg-alloc: 73s (3.2 times as fast)
- JIT: 27s (8.6 times as fast)
32 bit results:
- Interpreter: 407s
- JIT no-reg-alloc: 74s (5.5 times as fast)
- JIT: 30s (13.5 times as fast)
The ARM measurements were made on a Raspberry PI 4 ARM Cortex-A72 @ 1.8 Ghz with 4GB RAM.
ARM64 mode:
- Interpreter: 1060.2s
- JIT no-reg-alloc: 429.91s (2.46 times as fast)
- JIT: 148.67s (7.1 times as fast)
ARM32 (Thumb2 instruction set) mode:
- Interpreter: 1439.4s
- JIT no-reg-alloc: 485.06s (2.9 times as fast)
- JIT: 169.95s (8.4 times as fast)