Skip to content

Running an AI example with Walrus project

Zoltan Herczeg edited this page Oct 13, 2025 · 2 revisions

This issue summarizes running a simpler AI model on Walrus (13/10/2025).

The llama project ( https://www.llama.com/ ) was chosen for running the model. A few changes were made since wasm is not a regular target for the project:

  • a simple build script were made, because the default one does not support wasm (with emscripten) compilation
  • the following features were disabled: llama_curl, multithreading, and ggml_native
  • some (minor) code changes were made: disabling process priority setting, fixed some include errors, emulating mmap (wasm has no concept of mmap), implement a few missing libc functions

Since Memory64 has not been supported by Walrus yet, the maximum available memory is 4GB, which limits the models that can be used. The following model were selected: lille-130m-instruct-f16.gguf:

https://huggingface.co/Nikity/lille-130m-instruct

x86 platform results

The x86 measurements were made on an Intel I7-7700 @ 3.6 GHz cpu with 32GB RAM.

64 bit results:

  • Interpreter: 234s
  • JIT no-reg-alloc: 73s (3.2 times as fast)
  • JIT: 27s (8.6 times as fast)

32 bit results:

  • Interpreter: 407s
  • JIT no-reg-alloc: 74s (5.5 times as fast)
  • JIT: 30s (13.5 times as fast)

ARM platform results

The ARM measurements were made on a Raspberry PI 4 ARM Cortex-A72 @ 1.8 Ghz with 4GB RAM.

ARM64 mode:

  • Interpreter: 1060.2s
  • JIT no-reg-alloc: 429.91s (2.46 times as fast)
  • JIT: 148.67s (7.1 times as fast)

ARM32 (Thumb2 instruction set) mode:

  • Interpreter: 1439.4s
  • JIT no-reg-alloc: 485.06s (2.9 times as fast)
  • JIT: 169.95s (8.4 times as fast)