ADR-QE-003: WebAssembly Compilation Strategy

Status: Proposed Date: 2026-02-06 Authors: ruv.io, RuVector Team Deciders: Architecture Review Board

Context

Problem Statement

ruVector targets browsers, embedded/edge runtimes, and IoT devices via WebAssembly. The quantum simulation engine must compile to wasm32-unknown-unknown and run correctly in these constrained environments. WASM introduces fundamental constraints that differ significantly from native execution and must be addressed at the architectural level rather than worked around at runtime.

WASM Execution Environment Constraints

Constraint	Detail	Impact on Quantum Simulation
32-bit address space	~4 GB theoretical max, ~2 GB practical	Hard ceiling on state vector size
Memory model	Linear memory, grows in 64 KB pages	Allocation must be page-aware
No native threads	Web Workers required for parallelism	Requires SharedArrayBuffer + COOP/COEP headers
No direct GPU	WebGPU is separate API, not WASM-native	GPU acceleration unavailable in WASM path
No OS syscalls	Sandboxed execution, no file/network	All I/O must go through host bindings
JIT compilation	V8/SpiderMonkey JIT, not AOT	~1.5-3x slower than native, variable warmup
SIMD support	128-bit SIMD proposal (widely supported since 2021)	4 f32 or 2 f64 per vector lane
Stack size	Default ~1 MB, configurable	Deep recursion limited

Memory Budget Analysis for Quantum Simulation

The critical constraint is WASM's 32-bit address space. With a practical usable limit of approximately 2 GB (due to browser memory allocation behavior and address space fragmentation), the maximum feasible state vector size is bounded:

Available WASM Memory Budget:

  Total addressable:     4,294,967,296 bytes  (4 GB theoretical)
  Practical usable:     ~2,147,483,648 bytes  (2 GB, browser-dependent)
  WASM overhead:          ~100,000,000 bytes  (module, stack, heap metadata)
  Application overhead:    ~50,000,000 bytes  (circuit data, scratch buffers)
  -------------------------------------------------
  Available for state:  ~2,000,000,000 bytes  (1.86 GB)

  State vector sizes:
    24 qubits:  268,435,456 bytes (256 MB)  -- comfortable
    25 qubits:  536,870,912 bytes (512 MB)  -- feasible
    25 + scratch: ~1,073,741,824 bytes       -- tight but within budget
    26 qubits: 1,073,741,824 bytes (1 GB)   -- state alone, no scratch room
    27 qubits: 2,147,483,648 bytes (2 GB)   -- exceeds practical limit

Existing WASM Patterns in ruVector

The ruvector-router-wasm crate establishes conventions for WASM compilation:

wasm-pack build as the compilation tool
wasm-bindgen for JavaScript interop
TypeScript definition generation
Feature-flag controlled inclusion/exclusion of capabilities
Dedicated test suites using wasm-bindgen-test

Decision

1. Target and Toolchain

Target triple: wasm32-unknown-unknown

Build toolchain: wasm-pack with wasm-bindgen

# Development build
wasm-pack build crates/ruqu-wasm --target web --dev

# Release build with size optimization
wasm-pack build crates/ruqu-wasm --target web --release

# Node.js target (for server-side WASM)
wasm-pack build crates/ruqu-wasm --target nodejs --release

Cargo profile for WASM release:

[profile.wasm-release]
inherits = "release"
opt-level = "z"          # Optimize for binary size
lto = true               # Link-time optimization
codegen-units = 1        # Single codegen unit for maximum optimization
strip = true             # Strip debug symbols
panic = "abort"          # Smaller panic handling

2. Memory Limit Enforcement

ruqu-wasm enforces qubit limits before any allocation occurs. This is a hard gate, not a soft warning.

Enforcement strategy:

User requests N qubits
        |
        v
  [N <= 25?] ---NO---> Return WasmLimitError {
        |                 requested: N,
       YES                maximum: 25,
        |                 estimated_memory: 16 * 2^N,
        v                 suggestion: "Use native build for >25 qubits"
  [Estimate total       }
   memory needed]
        |
        v
  [< 1.5 GB?] ---NO---> Return WasmLimitError::InsufficientMemory
        |
       YES
        |
        v
  Proceed with allocation

Qubit limits by precision:

Precision	Max Qubits (WASM)	State Size	With Scratch
Complex f64 (default)	25	512 MB	~1.07 GB
Complex f32 (optional)	26	512 MB	~1.07 GB

Error reporting:

#[wasm_bindgen]
#[derive(Debug)]
pub struct WasmLimitError {
    pub requested_qubits: usize,
    pub maximum_qubits: usize,
    pub estimated_bytes: usize,
    pub message: String,
}

impl WasmLimitError {
    pub fn qubit_overflow(requested: usize) -> Self {
        let max = if cfg!(feature = "f32") { 26 } else { 25 };
        let bytes_per_amplitude = if cfg!(feature = "f32") { 8 } else { 16 };
        Self {
            requested_qubits: requested,
            maximum_qubits: max,
            estimated_bytes: bytes_per_amplitude * (1usize << requested),
            message: format!(
                "Cannot simulate {} qubits in WASM: requires {} bytes, \
                 exceeds WASM address space. Maximum: {} qubits. \
                 Use native build for larger simulations.",
                requested,
                bytes_per_amplitude * (1usize << requested),
                max
            ),
        }
    }
}

3. Threading Strategy

WASM multi-threading requires SharedArrayBuffer, which in turn requires specific HTTP security headers (Cross-Origin-Opener-Policy and Cross-Origin-Embedder-Policy). Not all deployment environments support these.

Strategy: Optional multi-threading with graceful fallback.

                  ruqu-wasm execution
                        |
                        v
              [SharedArrayBuffer
               available?]
                /           \
              YES            NO
              /               \
    [wasm-bindgen-rayon]    [single-threaded
     parallel execution]     execution]
              |                    |
     Split state vector      Sequential gate
     across Web Workers      application
              |                    |
              v                    v
         Fast (N cores)     Slower (1 core)

Compile-time configuration:

# In ruqu-wasm/Cargo.toml
[features]
default = []
threads = ["wasm-bindgen-rayon", "ruqu-core/parallel"]

Runtime detection:

#[wasm_bindgen]
pub fn threading_available() -> bool {
    // Check if SharedArrayBuffer is available in this environment
    js_sys::eval("typeof SharedArrayBuffer !== 'undefined'")
        .ok()
        .and_then(|v| v.as_bool())
        .unwrap_or(false)
}

Required HTTP headers for threading:

Cross-Origin-Opener-Policy: same-origin
Cross-Origin-Embedder-Policy: require-corp

4. SIMD Utilization

The WASM SIMD proposal (128-bit vectors) is widely supported in modern browsers and runtimes. The quantum engine uses SIMD for amplitude manipulation when available.

WASM SIMD capabilities:

Operation	WASM SIMD Instruction	Use in Quantum Sim
f64x2 multiply	`f64x2.mul`	Complex multiplication (real part)
f64x2 add	`f64x2.add`	Amplitude accumulation
f64x2 sub	`f64x2.sub`	Complex multiplication (cross terms)
f64x2 shuffle	`i64x2.shuffle`	Swapping real/imaginary parts
f32x4 multiply	`f32x4.mul`	f32 mode complex multiply
f32x4 fma	emulated	Fused multiply-add for accuracy

Conditional compilation:

// In ruqu-core, WASM SIMD path
#[cfg(all(target_arch = "wasm32", target_feature = "simd128"))]
mod wasm_simd {
    use core::arch::wasm32::*;

    /// Apply 2x2 unitary to a pair of amplitudes using WASM SIMD
    #[inline(always)]
    pub fn apply_gate_2x2_simd(
        a_re: f64, a_im: f64,
        b_re: f64, b_im: f64,
        u00_re: f64, u00_im: f64,
        u01_re: f64, u01_im: f64,
        u10_re: f64, u10_im: f64,
        u11_re: f64, u11_im: f64,
    ) -> (f64, f64, f64, f64) {
        // Pack amplitude pair into SIMD lanes
        let a = f64x2(a_re, a_im);
        let b = f64x2(b_re, b_im);

        // Complex multiply-accumulate for output amplitudes
        // c0 = u00*a + u01*b
        // c1 = u10*a + u11*b
        // (expanded for complex arithmetic)
        // ...
        todo!()
    }
}

// Fallback scalar path
#[cfg(not(all(target_arch = "wasm32", target_feature = "simd128")))]
mod scalar {
    // Pure scalar complex arithmetic
}

Comparison of SIMD widths across targets:

Native (AVX-512):  512-bit  =  8 f64  =  4 complex f64 per instruction
Native (AVX2):     256-bit  =  4 f64  =  2 complex f64 per instruction
Native (NEON):     128-bit  =  2 f64  =  1 complex f64 per instruction
WASM SIMD:         128-bit  =  2 f64  =  1 complex f64 per instruction

WASM SIMD matches ARM NEON width but is slower due to JIT overhead. The engine uses the same algorithmic structure as the NEON path, adapted for WASM SIMD intrinsics.

5. No GPU in WASM

GPU acceleration is exclusively available in native builds. The WASM path uses CPU-only simulation.

Rationale:

WebGPU is a separate browser API, not accessible from WASM linear memory
Bridging WASM to WebGPU would require complex JavaScript glue code
WebGPU compute shader support varies across browsers
The performance benefit is uncertain for the 25-qubit WASM ceiling

Future consideration: If WebGPU stabilizes and WASM-WebGPU interop matures, a ruqu-webgpu crate could provide browser-side GPU acceleration. This is out of scope for the initial release.

6. API Parity

ruqu-wasm exposes an API that is functionally identical to ruqu-core native. The same circuit description produces the same measurement results (within floating-point tolerance). Only performance and capacity differ.

Parity guarantee:

                    Same Circuit
                        |
           +------------+------------+
           |                         |
     ruqu-core (native)       ruqu-wasm (browser)
           |                         |
    - 30+ qubits              - 25 qubits max
    - AVX2/AVX-512 SIMD       - WASM SIMD128
    - Rayon threading          - Optional Web Workers
    - Optional GPU             - CPU only
    - ~17.5M gates/sec         - ~5-12M gates/sec
           |                         |
           +------------+------------+
                        |
                  Same Results
              (within fp tolerance)

Verified by: Shared test suite that runs against both native and WASM targets, comparing outputs bitwise (for deterministic operations) or statistically (for measurement sampling).

7. Module Size Target

Target .wasm binary size: < 2 MB for the default feature set.

Size budget:

Component	Estimated Size
Core simulation engine	~800 KB
Gate implementations	~200 KB
Measurement and sampling	~100 KB
wasm-bindgen glue	~50 KB
Circuit optimization	~150 KB
Error handling and validation	~50 KB
Total (default features)	~1.35 MB
+ noise-model feature	+200 KB
+ tensor-network feature	+400 KB
Total (all features)	~1.95 MB

Size reduction techniques:

opt-level = "z" for size-optimized compilation
LTO (Link-Time Optimization) for dead code elimination
wasm-opt post-processing pass (binaryen)
Feature flags to exclude unused capabilities
panic = "abort" to eliminate unwinding machinery
Avoid format! and std::fmt where possible in hot paths

Build pipeline:

# Build with wasm-pack
wasm-pack build crates/ruqu-wasm --target web --release

# Post-process with wasm-opt for additional size reduction
wasm-opt -Oz --enable-simd \
    crates/ruqu-wasm/pkg/ruqu_wasm_bg.wasm \
    -o crates/ruqu-wasm/pkg/ruqu_wasm_bg.wasm

# Verify size
ls -lh crates/ruqu-wasm/pkg/ruqu_wasm_bg.wasm
# Expected: < 2 MB

8. Future: wasm64 (Memory64 Proposal)

The WebAssembly Memory64 proposal extends the address space to 64 bits, removing the 4 GB limitation. When this proposal reaches broad runtime support:

Recompile ruqu-wasm targeting wasm64-unknown-unknown
Lift the 25-qubit ceiling to match native limits
Maintain backward compatibility with wasm32 via conditional compilation

Current status: Memory64 is at Phase 4 (standardized) in the WASM specification process. Browser support is emerging but not yet universal.

Migration path:

# Future Cargo.toml
[features]
wasm64 = []  # Enable when targeting wasm64

# In code
#[cfg(feature = "wasm64")]
const MAX_QUBITS_WASM: usize = 30;

#[cfg(not(feature = "wasm64"))]
const MAX_QUBITS_WASM: usize = 25;

Trade-offs Accepted

Trade-off	Accepted Limitation	Justification
Performance	~1.5-3x slower than native	Universal deployment outweighs raw speed
Qubit ceiling	25 qubits in WASM vs 30+ native	Sufficient for most educational and research workloads
Threading	Requires specific browser headers	Graceful fallback ensures always-works baseline
No GPU	CPU-only in browser	GPU simulation at 25 qubits shows minimal benefit
Binary size	~1.35 MB module	Acceptable for a quantum simulation library

Consequences

Positive

Universal deployment: Any modern browser or WASM runtime can execute quantum simulations without installation
Security sandboxing: WASM's memory isolation prevents quantum simulation code from accessing host resources
Edge-aligned: Matches ruVector's philosophy of computation at the edge
Testable: WASM builds can be tested in CI via headless browsers and wasm-bindgen-test
Progressive enhancement: Single-threaded baseline with optional threading ensures broad compatibility

Negative

Performance ceiling: JIT overhead and narrower SIMD limit throughput
Memory limits: 25-qubit hard ceiling until wasm64 adoption
Threading complexity: SharedArrayBuffer requirement adds deployment configuration burden
Debugging difficulty: WASM debugging tools are less mature than native debuggers

Mitigations

Issue	Mitigation
Performance gap	Document native vs WASM trade-offs; recommend native for >20 qubits
Memory exhaustion	Hard limit enforcement with informative error messages
Threading failures	Automatic fallback to single-threaded; no silent degradation
Debug difficulty	Source maps via wasm-pack; comprehensive logging to console
Binary size creep	CI size gate: fail build if .wasm exceeds 2 MB

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ADR-QE-003: WebAssembly Compilation Strategy

Context

Problem Statement

WASM Execution Environment Constraints

Memory Budget Analysis for Quantum Simulation

Existing WASM Patterns in ruVector

Decision

1. Target and Toolchain

2. Memory Limit Enforcement

3. Threading Strategy

4. SIMD Utilization

5. No GPU in WASM

6. API Parity

7. Module Size Target

8. Future: wasm64 (Memory64 Proposal)

Trade-offs Accepted

Consequences

Positive

Negative

Mitigations

References

FilesExpand file tree

ADR-QE-003-wasm-compilation-strategy.md

Latest commit

History

ADR-QE-003-wasm-compilation-strategy.md

File metadata and controls

ADR-QE-003: WebAssembly Compilation Strategy

Context

Problem Statement

WASM Execution Environment Constraints

Memory Budget Analysis for Quantum Simulation

Existing WASM Patterns in ruVector

Decision

1. Target and Toolchain

2. Memory Limit Enforcement

3. Threading Strategy

4. SIMD Utilization

5. No GPU in WASM

6. API Parity

7. Module Size Target

8. Future: wasm64 (Memory64 Proposal)

Trade-offs Accepted

Consequences

Positive

Negative

Mitigations

References