Status: Proposed Date: 2026-02-06 Authors: ruv.io, RuVector Team Deciders: Architecture Review Board
ruVector targets browsers, embedded/edge runtimes, and IoT devices via
WebAssembly. The quantum simulation engine must compile to
wasm32-unknown-unknown and run correctly in these constrained environments.
WASM introduces fundamental constraints that differ significantly from native
execution and must be addressed at the architectural level rather than
worked around at runtime.
| Constraint | Detail | Impact on Quantum Simulation |
|---|---|---|
| 32-bit address space | ~4 GB theoretical max, ~2 GB practical | Hard ceiling on state vector size |
| Memory model | Linear memory, grows in 64 KB pages | Allocation must be page-aware |
| No native threads | Web Workers required for parallelism | Requires SharedArrayBuffer + COOP/COEP headers |
| No direct GPU | WebGPU is separate API, not WASM-native | GPU acceleration unavailable in WASM path |
| No OS syscalls | Sandboxed execution, no file/network | All I/O must go through host bindings |
| JIT compilation | V8/SpiderMonkey JIT, not AOT | ~1.5-3x slower than native, variable warmup |
| SIMD support | 128-bit SIMD proposal (widely supported since 2021) | 4 f32 or 2 f64 per vector lane |
| Stack size | Default ~1 MB, configurable | Deep recursion limited |
The critical constraint is WASM's 32-bit address space. With a practical usable limit of approximately 2 GB (due to browser memory allocation behavior and address space fragmentation), the maximum feasible state vector size is bounded:
Available WASM Memory Budget:
Total addressable: 4,294,967,296 bytes (4 GB theoretical)
Practical usable: ~2,147,483,648 bytes (2 GB, browser-dependent)
WASM overhead: ~100,000,000 bytes (module, stack, heap metadata)
Application overhead: ~50,000,000 bytes (circuit data, scratch buffers)
-------------------------------------------------
Available for state: ~2,000,000,000 bytes (1.86 GB)
State vector sizes:
24 qubits: 268,435,456 bytes (256 MB) -- comfortable
25 qubits: 536,870,912 bytes (512 MB) -- feasible
25 + scratch: ~1,073,741,824 bytes -- tight but within budget
26 qubits: 1,073,741,824 bytes (1 GB) -- state alone, no scratch room
27 qubits: 2,147,483,648 bytes (2 GB) -- exceeds practical limit
The ruvector-router-wasm crate establishes conventions for WASM compilation:
wasm-pack buildas the compilation toolwasm-bindgenfor JavaScript interop- TypeScript definition generation
- Feature-flag controlled inclusion/exclusion of capabilities
- Dedicated test suites using
wasm-bindgen-test
Target triple: wasm32-unknown-unknown
Build toolchain: wasm-pack with wasm-bindgen
# Development build
wasm-pack build crates/ruqu-wasm --target web --dev
# Release build with size optimization
wasm-pack build crates/ruqu-wasm --target web --release
# Node.js target (for server-side WASM)
wasm-pack build crates/ruqu-wasm --target nodejs --releaseCargo profile for WASM release:
[profile.wasm-release]
inherits = "release"
opt-level = "z" # Optimize for binary size
lto = true # Link-time optimization
codegen-units = 1 # Single codegen unit for maximum optimization
strip = true # Strip debug symbols
panic = "abort" # Smaller panic handlingruqu-wasm enforces qubit limits before any allocation occurs. This is a hard
gate, not a soft warning.
Enforcement strategy:
User requests N qubits
|
v
[N <= 25?] ---NO---> Return WasmLimitError {
| requested: N,
YES maximum: 25,
| estimated_memory: 16 * 2^N,
v suggestion: "Use native build for >25 qubits"
[Estimate total }
memory needed]
|
v
[< 1.5 GB?] ---NO---> Return WasmLimitError::InsufficientMemory
|
YES
|
v
Proceed with allocation
Qubit limits by precision:
| Precision | Max Qubits (WASM) | State Size | With Scratch |
|---|---|---|---|
| Complex f64 (default) | 25 | 512 MB | ~1.07 GB |
| Complex f32 (optional) | 26 | 512 MB | ~1.07 GB |
Error reporting:
#[wasm_bindgen]
#[derive(Debug)]
pub struct WasmLimitError {
pub requested_qubits: usize,
pub maximum_qubits: usize,
pub estimated_bytes: usize,
pub message: String,
}
impl WasmLimitError {
pub fn qubit_overflow(requested: usize) -> Self {
let max = if cfg!(feature = "f32") { 26 } else { 25 };
let bytes_per_amplitude = if cfg!(feature = "f32") { 8 } else { 16 };
Self {
requested_qubits: requested,
maximum_qubits: max,
estimated_bytes: bytes_per_amplitude * (1usize << requested),
message: format!(
"Cannot simulate {} qubits in WASM: requires {} bytes, \
exceeds WASM address space. Maximum: {} qubits. \
Use native build for larger simulations.",
requested,
bytes_per_amplitude * (1usize << requested),
max
),
}
}
}WASM multi-threading requires SharedArrayBuffer, which in turn requires specific HTTP security headers (Cross-Origin-Opener-Policy and Cross-Origin-Embedder-Policy). Not all deployment environments support these.
Strategy: Optional multi-threading with graceful fallback.
ruqu-wasm execution
|
v
[SharedArrayBuffer
available?]
/ \
YES NO
/ \
[wasm-bindgen-rayon] [single-threaded
parallel execution] execution]
| |
Split state vector Sequential gate
across Web Workers application
| |
v v
Fast (N cores) Slower (1 core)
Compile-time configuration:
# In ruqu-wasm/Cargo.toml
[features]
default = []
threads = ["wasm-bindgen-rayon", "ruqu-core/parallel"]Runtime detection:
#[wasm_bindgen]
pub fn threading_available() -> bool {
// Check if SharedArrayBuffer is available in this environment
js_sys::eval("typeof SharedArrayBuffer !== 'undefined'")
.ok()
.and_then(|v| v.as_bool())
.unwrap_or(false)
}Required HTTP headers for threading:
Cross-Origin-Opener-Policy: same-origin
Cross-Origin-Embedder-Policy: require-corp
The WASM SIMD proposal (128-bit vectors) is widely supported in modern browsers and runtimes. The quantum engine uses SIMD for amplitude manipulation when available.
WASM SIMD capabilities:
| Operation | WASM SIMD Instruction | Use in Quantum Sim |
|---|---|---|
| f64x2 multiply | f64x2.mul |
Complex multiplication (real part) |
| f64x2 add | f64x2.add |
Amplitude accumulation |
| f64x2 sub | f64x2.sub |
Complex multiplication (cross terms) |
| f64x2 shuffle | i64x2.shuffle |
Swapping real/imaginary parts |
| f32x4 multiply | f32x4.mul |
f32 mode complex multiply |
| f32x4 fma | emulated | Fused multiply-add for accuracy |
Conditional compilation:
// In ruqu-core, WASM SIMD path
#[cfg(all(target_arch = "wasm32", target_feature = "simd128"))]
mod wasm_simd {
use core::arch::wasm32::*;
/// Apply 2x2 unitary to a pair of amplitudes using WASM SIMD
#[inline(always)]
pub fn apply_gate_2x2_simd(
a_re: f64, a_im: f64,
b_re: f64, b_im: f64,
u00_re: f64, u00_im: f64,
u01_re: f64, u01_im: f64,
u10_re: f64, u10_im: f64,
u11_re: f64, u11_im: f64,
) -> (f64, f64, f64, f64) {
// Pack amplitude pair into SIMD lanes
let a = f64x2(a_re, a_im);
let b = f64x2(b_re, b_im);
// Complex multiply-accumulate for output amplitudes
// c0 = u00*a + u01*b
// c1 = u10*a + u11*b
// (expanded for complex arithmetic)
// ...
todo!()
}
}
// Fallback scalar path
#[cfg(not(all(target_arch = "wasm32", target_feature = "simd128")))]
mod scalar {
// Pure scalar complex arithmetic
}Comparison of SIMD widths across targets:
Native (AVX-512): 512-bit = 8 f64 = 4 complex f64 per instruction
Native (AVX2): 256-bit = 4 f64 = 2 complex f64 per instruction
Native (NEON): 128-bit = 2 f64 = 1 complex f64 per instruction
WASM SIMD: 128-bit = 2 f64 = 1 complex f64 per instruction
WASM SIMD matches ARM NEON width but is slower due to JIT overhead. The engine uses the same algorithmic structure as the NEON path, adapted for WASM SIMD intrinsics.
GPU acceleration is exclusively available in native builds. The WASM path uses CPU-only simulation.
Rationale:
- WebGPU is a separate browser API, not accessible from WASM linear memory
- Bridging WASM to WebGPU would require complex JavaScript glue code
- WebGPU compute shader support varies across browsers
- The performance benefit is uncertain for the 25-qubit WASM ceiling
Future consideration: If WebGPU stabilizes and WASM-WebGPU interop matures,
a ruqu-webgpu crate could provide browser-side GPU acceleration. This is out
of scope for the initial release.
ruqu-wasm exposes an API that is functionally identical to ruqu-core native.
The same circuit description produces the same measurement results (within
floating-point tolerance). Only performance and capacity differ.
Parity guarantee:
Same Circuit
|
+------------+------------+
| |
ruqu-core (native) ruqu-wasm (browser)
| |
- 30+ qubits - 25 qubits max
- AVX2/AVX-512 SIMD - WASM SIMD128
- Rayon threading - Optional Web Workers
- Optional GPU - CPU only
- ~17.5M gates/sec - ~5-12M gates/sec
| |
+------------+------------+
|
Same Results
(within fp tolerance)
Verified by: Shared test suite that runs against both native and WASM targets, comparing outputs bitwise (for deterministic operations) or statistically (for measurement sampling).
Target .wasm binary size: < 2 MB for the default feature set.
Size budget:
| Component | Estimated Size |
|---|---|
| Core simulation engine | ~800 KB |
| Gate implementations | ~200 KB |
| Measurement and sampling | ~100 KB |
| wasm-bindgen glue | ~50 KB |
| Circuit optimization | ~150 KB |
| Error handling and validation | ~50 KB |
| Total (default features) | ~1.35 MB |
| + noise-model feature | +200 KB |
| + tensor-network feature | +400 KB |
| Total (all features) | ~1.95 MB |
Size reduction techniques:
opt-level = "z"for size-optimized compilation- LTO (Link-Time Optimization) for dead code elimination
wasm-optpost-processing pass (binaryen)- Feature flags to exclude unused capabilities
panic = "abort"to eliminate unwinding machinery- Avoid
format!andstd::fmtwhere possible in hot paths
Build pipeline:
# Build with wasm-pack
wasm-pack build crates/ruqu-wasm --target web --release
# Post-process with wasm-opt for additional size reduction
wasm-opt -Oz --enable-simd \
crates/ruqu-wasm/pkg/ruqu_wasm_bg.wasm \
-o crates/ruqu-wasm/pkg/ruqu_wasm_bg.wasm
# Verify size
ls -lh crates/ruqu-wasm/pkg/ruqu_wasm_bg.wasm
# Expected: < 2 MBThe WebAssembly Memory64 proposal extends the address space to 64 bits, removing the 4 GB limitation. When this proposal reaches broad runtime support:
- Recompile
ruqu-wasmtargetingwasm64-unknown-unknown - Lift the 25-qubit ceiling to match native limits
- Maintain backward compatibility with wasm32 via conditional compilation
Current status: Memory64 is at Phase 4 (standardized) in the WASM specification process. Browser support is emerging but not yet universal.
Migration path:
# Future Cargo.toml
[features]
wasm64 = [] # Enable when targeting wasm64
# In code
#[cfg(feature = "wasm64")]
const MAX_QUBITS_WASM: usize = 30;
#[cfg(not(feature = "wasm64"))]
const MAX_QUBITS_WASM: usize = 25;| Trade-off | Accepted Limitation | Justification |
|---|---|---|
| Performance | ~1.5-3x slower than native | Universal deployment outweighs raw speed |
| Qubit ceiling | 25 qubits in WASM vs 30+ native | Sufficient for most educational and research workloads |
| Threading | Requires specific browser headers | Graceful fallback ensures always-works baseline |
| No GPU | CPU-only in browser | GPU simulation at 25 qubits shows minimal benefit |
| Binary size | ~1.35 MB module | Acceptable for a quantum simulation library |
- Universal deployment: Any modern browser or WASM runtime can execute quantum simulations without installation
- Security sandboxing: WASM's memory isolation prevents quantum simulation code from accessing host resources
- Edge-aligned: Matches ruVector's philosophy of computation at the edge
- Testable: WASM builds can be tested in CI via headless browsers and wasm-bindgen-test
- Progressive enhancement: Single-threaded baseline with optional threading ensures broad compatibility
- Performance ceiling: JIT overhead and narrower SIMD limit throughput
- Memory limits: 25-qubit hard ceiling until wasm64 adoption
- Threading complexity: SharedArrayBuffer requirement adds deployment configuration burden
- Debugging difficulty: WASM debugging tools are less mature than native debuggers
| Issue | Mitigation |
|---|---|
| Performance gap | Document native vs WASM trade-offs; recommend native for >20 qubits |
| Memory exhaustion | Hard limit enforcement with informative error messages |
| Threading failures | Automatic fallback to single-threaded; no silent degradation |
| Debug difficulty | Source maps via wasm-pack; comprehensive logging to console |
| Binary size creep | CI size gate: fail build if .wasm exceeds 2 MB |
- ADR-QE-001: Quantum Engine Core Architecture
- ADR-QE-002: Crate Structure & Integration
- ADR-QE-004: Performance Optimization & Benchmarks
- ADR-005: WASM Runtime Integration
- ruvector-router-wasm crate
- WebAssembly SIMD Proposal
- WebAssembly Memory64 Proposal
- wasm-bindgen-rayon
- Cross-Origin Isolation Guide (MDN)