| Field | Value |
|---|---|
| Status | Proposed |
| Date | 2026-01-18 |
| Authors | RuvLLM Architecture Team |
| Reviewers | - |
| Supersedes | - |
| Superseded by | - |
Note: The WASM runtime approach described here is complemented by ADR-029. The RVF WASM microkernel (rvf-wasm) provides a <8 KB Cognitum tile target that replaces ad-hoc WASM builds for vector operations.
RuvLLM requires a mechanism for executing user-provided and community-contributed compute kernels in a secure, sandboxed environment. These kernels implement performance-critical operations such as:
- Rotary Position Embeddings (RoPE)
- RMS Normalization (RMSNorm)
- SwiGLU activation functions
- KV cache quantization/dequantization
- LoRA delta application
Without proper isolation, malicious or buggy kernels could:
- Access unauthorized memory regions
- Consume unbounded compute resources
- Compromise the host system
- Corrupt model state
| Requirement | Priority | Rationale |
|---|---|---|
| Sandboxed execution | Critical | Prevent kernel code from accessing host resources |
| Execution budgets | Critical | Prevent runaway code and DoS conditions |
| Low overhead | High | Kernels are in the inference hot path |
| Cross-platform | High | Support x86, ARM, embedded devices |
| Framework agnostic | Medium | Enable ML inference without vendor lock-in |
| Hot-swappable kernels | Medium | Update kernels without service restart |
- Memory: Embedded targets have as little as 256KB RAM
- Latency: Kernel invocation overhead must be <10us for small tensors
- Compatibility: Must support existing Rust/C kernel implementations
- Security: Kernel supply chain must be verifiable
We will adopt WebAssembly (WASM) as the sandboxed execution environment for compute kernels, with the following architecture:
| Device Class | Runtime | Rationale |
|---|---|---|
| Edge servers (x86/ARM64) | Wasmtime | Mature, well-optimized, excellent tooling |
| Embedded/MCU (<1MB RAM) | WAMR | <85KB footprint, AOT compilation support |
| Browser/WASI Preview 2 | wasmtime/browser | Future consideration |
We choose epoch-based interruption over fuel-based metering:
| Aspect | Epoch | Fuel |
|---|---|---|
| Overhead | ~2-5% | ~15-30% |
| Granularity | Coarse (polling points) | Fine (per instruction) |
| Determinism | Non-deterministic | Deterministic |
| Implementation | Store-level epoch counter | Instruction instrumentation |
Rationale: For inference workloads, coarse-grained interruption is acceptable. The 10-25% overhead reduction from avoiding fuel metering is significant for latency-sensitive operations.
// Epoch configuration example
let mut config = Config::new();
config.epoch_interruption(true);
let engine = Engine::new(&config)?;
let mut store = Store::new(&engine, ());
// Set epoch deadline (e.g., 100ms budget)
store.set_epoch_deadline(100);
// Increment epoch from async timer
engine.increment_epoch();WASI-NN provides framework-agnostic ML inference capabilities:
+-------------------+
| RuvLLM Host |
+-------------------+
|
v
+-------------------+
| WASI-NN API |
+-------------------+
|
+----+----+
| |
v v
+-------+ +--------+
| ONNX | | Custom |
| RT | | Kernel |
+-------+ +--------+
WASI-NN Backends:
- ONNX Runtime (portable)
- Native kernels (performance-critical paths)
- Custom quantized formats (memory efficiency)
We use raw WASM ABI rather than the Component Model:
| Aspect | Raw ABI | Component Model |
|---|---|---|
| Maturity | Stable | Evolving (Preview 2) |
| Overhead | Minimal | Higher (canonical ABI) |
| Tooling | Excellent | Improving |
| Adoption | Universal | Growing |
Migration Path: Design interfaces to be Component Model-compatible for future migration.
Host Linear Memory
+--------------------------------------------------+
| Tensor A | Tensor B | Output | Scratch |
| (read-only) | (read-only) | (write) | (r/w) |
+--------------------------------------------------+
^ ^ ^ ^
| | | |
offset_a offset_b offset_out offset_scratch
Shared Memory Protocol:
/// Kernel invocation descriptor passed to WASM
#[repr(C)]
pub struct KernelDescriptor {
/// Input tensor A offset in linear memory
pub input_a_offset: u32,
/// Input tensor A size in bytes
pub input_a_size: u32,
/// Input tensor B offset (0 if unused)
pub input_b_offset: u32,
/// Input tensor B size in bytes
pub input_b_size: u32,
/// Output tensor offset
pub output_offset: u32,
/// Output tensor size in bytes
pub output_size: u32,
/// Scratch space offset
pub scratch_offset: u32,
/// Scratch space size in bytes
pub scratch_size: u32,
/// Kernel-specific parameters offset
pub params_offset: u32,
/// Kernel-specific parameters size
pub params_size: u32,
}WASM traps are handled as non-fatal errors:
pub enum KernelError {
/// Execution budget exceeded
EpochDeadline,
/// Out of bounds memory access
MemoryAccessViolation {
offset: u32,
size: u32,
},
/// Integer overflow/underflow
IntegerOverflow,
/// Unreachable code executed
Unreachable,
/// Stack overflow
StackOverflow,
/// Invalid function call
IndirectCallTypeMismatch,
/// Custom trap from kernel
KernelTrap {
code: u32,
message: Option<String>,
},
}
impl From<wasmtime::Trap> for KernelError {
fn from(trap: wasmtime::Trap) -> Self {
match trap.trap_code() {
Some(TrapCode::Interrupt) => KernelError::EpochDeadline,
Some(TrapCode::MemoryOutOfBounds) => KernelError::MemoryAccessViolation {
offset: 0, // Extract from trap info
size: 0,
},
// ... other mappings
}
}
}Recovery Strategy:
- Log trap with full context
- Release kernel resources
- Fall back to reference implementation (if available)
- Report degraded performance to metrics
kernel-pack-v1.0.0/
├── kernels.json # Manifest
├── kernels.json.sig # Ed25519 signature
├── rope/
│ ├── rope_f32.wasm
│ ├── rope_f16.wasm
│ └── rope_q8.wasm
├── rmsnorm/
│ ├── rmsnorm_f32.wasm
│ └── rmsnorm_f16.wasm
├── swiglu/
│ ├── swiglu_f32.wasm
│ └── swiglu_f16.wasm
├── kv/
│ ├── kv_pack_q4.wasm
│ ├── kv_pack_q8.wasm
│ ├── kv_unpack_q4.wasm
│ └── kv_unpack_q8.wasm
└── lora/
├── lora_apply_f32.wasm
└── lora_apply_f16.wasm
{
"$schema": "https://ruvllm.dev/schemas/kernel-pack-v1.json",
"version": "1.0.0",
"name": "ruvllm-core-kernels",
"description": "Core compute kernels for RuvLLM inference",
"min_runtime_version": "0.5.0",
"max_runtime_version": "1.0.0",
"created_at": "2026-01-18T00:00:00Z",
"author": {
"name": "RuvLLM Team",
"email": "kernels@ruvllm.dev",
"signing_key": "ed25519:AAAA..."
},
"kernels": [
{
"id": "rope_f32",
"name": "Rotary Position Embedding (FP32)",
"category": "positional_encoding",
"path": "rope/rope_f32.wasm",
"hash": "sha256:abc123...",
"entry_point": "rope_forward",
"inputs": [
{
"name": "x",
"dtype": "f32",
"shape": ["batch", "seq", "heads", "dim"]
},
{
"name": "freqs",
"dtype": "f32",
"shape": ["seq", "dim_half"]
}
],
"outputs": [
{
"name": "y",
"dtype": "f32",
"shape": ["batch", "seq", "heads", "dim"]
}
],
"params": {
"theta": {
"type": "f32",
"default": 10000.0
}
},
"resource_limits": {
"max_memory_pages": 256,
"max_epoch_ticks": 1000,
"max_table_elements": 1024
},
"platforms": {
"wasmtime": {
"min_version": "15.0.0",
"features": ["simd", "bulk-memory"]
},
"wamr": {
"min_version": "1.3.0",
"aot_available": true
}
},
"benchmarks": {
"seq_512_dim_128": {
"latency_us": 45,
"throughput_gflops": 2.1
}
}
}
],
"fallbacks": {
"rope_f32": "rope_reference",
"rmsnorm_f32": "rmsnorm_reference"
}
}| Category | Kernels | Notes |
|---|---|---|
| Positional | RoPE (f32, f16, q8) | Rotary embeddings |
| Normalization | RMSNorm (f32, f16) | Pre-attention normalization |
| Activation | SwiGLU (f32, f16) | Gated activation |
| KV Cache | pack_q4, pack_q8, unpack_q4, unpack_q8 | Quantize/dequantize |
| Adapter | LoRA apply (f32, f16) | Delta weight application |
Attention Note: Attention kernels remain native initially due to:
- Complex memory access patterns
- Heavy reliance on hardware-specific optimizations (Flash Attention, xformers)
- Significant overhead from WASM boundary crossing for large tensors
use ed25519_dalek::{Signature, VerifyingKey, Verifier};
pub struct KernelPackVerifier {
trusted_keys: Vec<VerifyingKey>,
}
impl KernelPackVerifier {
/// Verify kernel pack signature
pub fn verify(&self, manifest: &[u8], signature: &[u8]) -> Result<(), VerifyError> {
let sig = Signature::try_from(signature)?;
for key in &self.trusted_keys {
if key.verify(manifest, &sig).is_ok() {
return Ok(());
}
}
Err(VerifyError::NoTrustedKey)
}
/// Verify individual kernel hash
pub fn verify_kernel(&self, kernel_bytes: &[u8], expected_hash: &str) -> Result<(), VerifyError> {
use sha2::{Sha256, Digest};
let mut hasher = Sha256::new();
hasher.update(kernel_bytes);
let hash = format!("sha256:{:x}", hasher.finalize());
if hash == expected_hash {
Ok(())
} else {
Err(VerifyError::HashMismatch {
expected: expected_hash.to_string(),
actual: hash,
})
}
}
}pub struct CompatibilityChecker {
runtime_version: Version,
}
impl CompatibilityChecker {
pub fn check(&self, manifest: &KernelManifest) -> CompatibilityResult {
// Check runtime version bounds
if self.runtime_version < manifest.min_runtime_version {
return CompatibilityResult::RuntimeTooOld {
required: manifest.min_runtime_version.clone(),
actual: self.runtime_version.clone(),
};
}
if self.runtime_version > manifest.max_runtime_version {
return CompatibilityResult::RuntimeTooNew {
max_supported: manifest.max_runtime_version.clone(),
actual: self.runtime_version.clone(),
};
}
// Check WASM feature requirements
for kernel in &manifest.kernels {
if let Some(platform) = kernel.platforms.get("wasmtime") {
for feature in &platform.features {
if !self.has_feature(feature) {
return CompatibilityResult::MissingFeature {
kernel: kernel.id.clone(),
feature: feature.clone(),
};
}
}
}
}
CompatibilityResult::Compatible
}
}pub struct KernelManager {
active_pack: Arc<RwLock<KernelPack>>,
previous_pack: Arc<RwLock<Option<KernelPack>>>,
metrics: KernelMetrics,
}
impl KernelManager {
/// Upgrade to new kernel pack with automatic rollback on failure
pub async fn upgrade(&self, new_pack: KernelPack) -> Result<(), UpgradeError> {
// Step 1: Verify new pack
self.verifier.verify(&new_pack)?;
self.compatibility.check(&new_pack.manifest)?;
// Step 2: Compile kernels (AOT if supported)
let compiled = self.compile_pack(&new_pack).await?;
// Step 3: Atomic swap with rollback capability
{
let mut active = self.active_pack.write().await;
let mut previous = self.previous_pack.write().await;
// Store current as rollback target
*previous = Some(std::mem::replace(&mut *active, compiled));
}
// Step 4: Health check with new kernels
if let Err(e) = self.health_check().await {
tracing::error!("Kernel health check failed: {}", e);
self.rollback().await?;
return Err(UpgradeError::HealthCheckFailed(e));
}
// Step 5: Clear rollback after grace period
tokio::spawn({
let previous = self.previous_pack.clone();
async move {
tokio::time::sleep(Duration::from_secs(300)).await;
*previous.write().await = None;
}
});
Ok(())
}
/// Rollback to previous kernel pack
pub async fn rollback(&self) -> Result<(), RollbackError> {
let mut active = self.active_pack.write().await;
let mut previous = self.previous_pack.write().await;
if let Some(prev) = previous.take() {
*active = prev;
tracing::info!("Rolled back to previous kernel pack");
Ok(())
} else {
Err(RollbackError::NoPreviousPack)
}
}
}pub fn create_server_runtime() -> Result<WasmRuntime, RuntimeError> {
let mut config = Config::new();
// Performance optimizations
config.cranelift_opt_level(OptLevel::Speed);
config.cranelift_nan_canonicalization(false);
config.parallel_compilation(true);
// SIMD support for vectorized operations
config.wasm_simd(true);
config.wasm_bulk_memory(true);
config.wasm_multi_value(true);
// Memory configuration
config.static_memory_maximum_size(1 << 32); // 4GB max
config.dynamic_memory_guard_size(1 << 16); // 64KB guard
// Epoch-based interruption
config.epoch_interruption(true);
let engine = Engine::new(&config)?;
Ok(WasmRuntime {
engine,
epoch_tick_interval: Duration::from_millis(10),
default_epoch_budget: 1000, // 10 seconds max
})
}pub fn create_embedded_runtime() -> Result<WamrRuntime, RuntimeError> {
let mut config = WamrConfig::new();
// Minimal footprint configuration
config.set_stack_size(32 * 1024); // 32KB stack
config.set_heap_size(128 * 1024); // 128KB heap
config.enable_aot(true); // Pre-compiled modules
config.enable_simd(false); // Often unavailable on MCU
config.enable_bulk_memory(true);
// Interpreter fallback for debugging
config.enable_interp(cfg!(debug_assertions));
// Execution limits
config.set_exec_timeout_ms(100); // 100ms max per invocation
Ok(WamrRuntime::new(config)?)
}For platforms supporting WASI threads:
pub fn create_threaded_runtime() -> Result<WasmRuntime, RuntimeError> {
let mut config = Config::new();
// Enable threading support
config.wasm_threads(true);
config.wasm_shared_memory(true);
// Thread pool configuration
config.async_support(true);
config.max_wasm_threads(4);
let engine = Engine::new(&config)?;
Ok(WasmRuntime {
engine,
thread_pool_size: 4,
})
}Platform Support Matrix:
| Platform | WASI Threads | Notes |
|---|---|---|
| Linux x86_64 | Yes | Full support |
| Linux ARM64 | Yes | Full support |
| macOS | Yes | Full support |
| Windows | Yes | Full support |
| WAMR | No | Single-threaded only |
| Browser | Yes | Via SharedArrayBuffer |
| Operation | Latency | Notes |
|---|---|---|
| Kernel lookup | ~100ns | Hash table lookup |
| Instance creation | ~1us | Pre-compiled module |
| Memory setup | ~500ns | Shared memory mapping |
| Epoch check | ~2ns | Single atomic read |
| Return value | ~100ns | Register transfer |
| Total | ~2us | Per invocation |
- Module Caching: Pre-compile and cache WASM modules
- Instance Pooling: Reuse instances across invocations
- Memory Sharing: Map host tensors directly into WASM linear memory
- Batch Invocations: Process multiple requests per kernel call
WASM sandboxing should be bypassed (with explicit opt-in) for:
- Attention kernels (complex memory patterns)
- Large matrix multiplications (>1000x1000)
- Operations with <1ms latency requirements
- Trusted, verified native kernels
| Aspect | eBPF | WASM |
|---|---|---|
| Platform | Linux only | Cross-platform |
| Verification | Static, strict | Dynamic, flexible |
| Memory model | Constrained | Linear memory |
| Tooling | Improving | Mature |
Decision: WASM chosen for cross-platform support.
| Aspect | Lua | WASM |
|---|---|---|
| Performance | Good (JIT) | Excellent (AOT) |
| Sandboxing | Manual effort | Built-in |
| Type safety | Dynamic | Static |
| Ecosystem | Large | Growing |
Decision: WASM chosen for type safety and native compilation.
| Aspect | seccomp | WASM |
|---|---|---|
| Isolation | Process-level | In-process |
| Overhead | IPC cost | Minimal |
| Portability | Linux only | Cross-platform |
| Complexity | High | Moderate |
Decision: WASM chosen for in-process efficiency and portability.
- Security: Strong isolation prevents kernel code from compromising host
- Portability: Same kernels run on servers and embedded devices
- Hot Updates: Kernels can be updated without service restart
- Ecosystem: Large WASM toolchain and community support
- Auditability: WASM modules can be inspected and verified
- Overhead: ~2us per invocation vs. native direct call
- Complexity: Additional abstraction layer to maintain
- Tooling: WASM debugging tools less mature than native
- Learning Curve: Team needs WASM expertise
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| Performance regression | Medium | High | Benchmark suite, native fallbacks |
| WASI-NN instability | Low | Medium | Abstract behind internal API |
| Supply chain attack | Low | Critical | Signature verification, trusted keys |
| Epoch timing variability | Medium | Low | Generous budgets, monitoring |
- Set up Wasmtime integration
- Implement kernel descriptor ABI
- Create basic kernel loader
- Implement RoPE kernel
- Implement RMSNorm kernel
- Implement SwiGLU kernel
- Implement quantization kernels
- Implement dequantization kernels
- Integration with cache manager
- Implement signature verification
- Create version compatibility checker
- Build rollback system
- WAMR integration
- AOT compilation pipeline
- Resource-constrained testing
- Wasmtime Documentation
- WAMR Documentation
- WASI-NN Specification
- WebAssembly Security Model
- Component Model Proposal
/// Standard kernel interface (exported by WASM modules)
#[link(wasm_import_module = "ruvllm")]
extern "C" {
/// Initialize kernel with parameters
fn kernel_init(params_ptr: *const u8, params_len: u32) -> i32;
/// Execute kernel forward pass
fn kernel_forward(desc_ptr: *const KernelDescriptor) -> i32;
/// Execute kernel backward pass (optional)
fn kernel_backward(desc_ptr: *const KernelDescriptor) -> i32;
/// Get kernel metadata
fn kernel_info(info_ptr: *mut KernelInfo) -> i32;
/// Cleanup kernel resources
fn kernel_cleanup() -> i32;
}| Code | Name | Description |
|---|---|---|
| 0 | OK | Success |
| 1 | INVALID_INPUT | Invalid input tensor |
| 2 | INVALID_OUTPUT | Invalid output tensor |
| 3 | INVALID_PARAMS | Invalid kernel parameters |
| 4 | OUT_OF_MEMORY | Insufficient memory |
| 5 | NOT_IMPLEMENTED | Operation not supported |
| 6 | INTERNAL_ERROR | Internal kernel error |
#[cfg(test)]
mod benchmarks {
use criterion::{criterion_group, criterion_main, Criterion};
fn bench_rope_f32(c: &mut Criterion) {
let runtime = create_server_runtime().unwrap();
let kernel = runtime.load_kernel("rope_f32").unwrap();
let input = Tensor::random([1, 512, 32, 128], DType::F32);
let freqs = Tensor::random([512, 64], DType::F32);
c.bench_function("rope_f32_seq512", |b| {
b.iter(|| {
kernel.forward(&input, &freqs).unwrap()
})
});
}
criterion_group!(benches, bench_rope_f32);
criterion_main!(benches);
}- ADR-001: Ruvector Core Architecture
- ADR-002: RuvLLM Integration
- ADR-003: SIMD Optimization Strategy
- ADR-007: Security Review & Technical Debt
| Component | Status | Notes |
|---|---|---|
| SharedArrayBuffer | ✅ Secure | Safety documentation for race conditions |
| WASM Memory | ✅ Secure | Bounds checking via WASM sandbox |
| Kernel Loading | Signature verification pending |
Fixes Applied:
- Added comprehensive safety comments documenting race condition prevention in
shared.rs - JavaScript/WASM coordination patterns documented
Outstanding Items:
- TD-007 (P2): Embedded JavaScript should be extracted to separate files
See ADR-007 for full security audit trail.
| Version | Date | Author | Changes |
|---|---|---|---|
| 1.0 | 2026-01-18 | RuVector Architecture Team | Initial version |
| 1.1 | 2026-01-19 | Security Review Agent | Added security status, related decisions |