Status: Proposed Date: 2026-02-06 Authors: ruv.io, RuVector Team Deciders: Architecture Review Board
ruVector is designed to operate within the Cognitum computing paradigm: a tile-based architecture with 256 low-power processor cores, event-driven activation, and aggressive power gating. Agents (software components) remain fully dormant until an event triggers their activation. Once their work completes, they release all resources and return to dormancy.
The quantum simulation engine must adhere to this model:
- Zero idle footprint: When no simulation is running, the engine consumes zero CPU cycles and zero heap memory beyond its compiled code and static data.
- Rapid activation: The engine must be ready to execute a simulation within microseconds of receiving a request.
- Prompt resource release: Upon simulation completion (or failure), all allocated memory is immediately freed.
- Predictable memory: Callers must be able to determine exact memory requirements before committing to a simulation.
The state vector for n qubits requires 2^n complex amplitudes, each consuming 16 bytes (two f64 values):
| Qubits | Amplitudes | Memory | Notes |
|---|---|---|---|
| 10 | 1,024 | 16 KiB | Trivial |
| 15 | 32,768 | 512 KiB | Small |
| 20 | 1,048,576 | 16 MiB | Moderate |
| 25 | 33,554,432 | 512 MiB | Large |
| 28 | 268,435,456 | 4 GiB | Needs dedicated memory |
| 30 | 1,073,741,824 | 16 GiB | Workstation-class |
| 32 | 4,294,967,296 | 64 GiB | Server-class |
| 35 | 34,359,738,368 | 512 GiB | HPC |
| 40 | 1,099,511,627,776 | 16 TiB | Infeasible (state vector) |
Each additional qubit doubles memory. This exponential scaling makes memory the primary resource constraint and the most important resource to manage.
On edge devices (embedded ruVector nodes, IoT gateways, mobile processors), memory is severely limited:
| Platform | Typical RAM | Max qubits (state vector) |
|---|---|---|
| Cognitum tile (single) | 256 MiB | 23 |
| Cognitum tile cluster (4) | 1 GiB | 25 |
| Raspberry Pi 4 | 8 GiB | 28 |
| Mobile device | 4-6 GiB | 27-28 (with other apps) |
| Laptop | 16-64 GiB | 29-31 |
| Server | 256-512 GiB | 33-34 |
WebAssembly uses a linear memory that can grow but cannot shrink. Once a large simulation allocates pages, those pages remain mapped until the WASM instance is destroyed. This is a fundamental platform limitation that must be documented and accounted for.
The quantum engine is implemented as a pure library with no runtime overhead:
// The engine is a collection of functions and types.
// No background threads, no event loops, no persistent state.
// When not called, it consumes exactly zero CPU and zero heap.
pub struct QuantumEngine; // Zero-sized type; purely a namespace
impl QuantumEngine {
/// Execute a simulation. All resources are allocated on entry
/// and freed on exit (or on error).
pub fn execute(
circuit: &QuantumCircuit,
shots: usize,
config: &SimulationConfig,
) -> Result<SimulationResult, SimulationError> {
// 1. Estimate and validate memory
let required = Self::estimate_memory(circuit.num_qubits());
Self::validate_memory_available(required)?;
// 2. Allocate state vector (the big allocation)
let mut state = Self::allocate_state(circuit.num_qubits())?;
// 3. Execute gates (all computation happens here)
Self::apply_gates(circuit, &mut state, config)?;
// 4. Measure (if requested)
let measurements = Self::measure(&state, shots)?;
// 5. Build result (copies out what we need)
let result = SimulationResult::from_state_and_measurements(
&state, measurements, circuit,
);
// 6. state is dropped here -- Vec<Complex<f64>> deallocated
// No cleanup needed. No finalizers. Just drop.
Ok(result)
}
// state goes out of scope and is deallocated by Rust's ownership system
}Key properties:
- No
new()orinit()methods that create persistent state. - No
Dropimpl with complex cleanup logic. - No
Arc,Mutex, or shared state between calls. - Each call is fully independent and self-contained.
State vectors are allocated at simulation start and freed at simulation end:
fn allocate_state(n_qubits: u32) -> Result<StateVector, SimulationError> {
let num_amplitudes = 1_usize.checked_shl(n_qubits)
.ok_or(SimulationError::QubitLimitExceeded {
requested: n_qubits,
maximum: (usize::BITS - 1) as u32,
estimated_memory_bytes: u64::MAX,
available_memory_bytes: estimate_available_memory() as u64,
})?;
let required_bytes = num_amplitudes
.checked_mul(std::mem::size_of::<Complex<f64>>())
.ok_or(SimulationError::MemoryAllocationFailed {
requested_bytes: u64::MAX,
qubit_count: n_qubits,
suggestion: "Qubit count exceeds addressable memory",
})?;
// Attempt allocation. Rust's global allocator will return an error
// (with #[global_allocator] configured) or the OS will OOM-kill us.
// We use try_reserve to handle this gracefully.
let mut amplitudes = Vec::new();
amplitudes.try_reserve_exact(num_amplitudes)
.map_err(|_| SimulationError::MemoryAllocationFailed {
requested_bytes: required_bytes as u64,
qubit_count: n_qubits,
suggestion: "Reduce qubit count or use tensor-network backend",
})?;
// Initialize to |00...0> state
amplitudes.resize(num_amplitudes, Complex::new(0.0, 0.0));
amplitudes[0] = Complex::new(1.0, 0.0);
Ok(StateVector { amplitudes, n_qubits })
}The allocation sequence:
IDLE (zero memory)
|
v
estimate_memory(n) --> returns bytes needed
|
v
validate_memory_available(bytes) --> checks against OS/platform limits
| returns Err if insufficient
v
Vec::try_reserve_exact(2^n) --> attempts allocation
| returns Err on failure (no panic)
v
ALLOCATED (2^n * 16 bytes on heap)
|
v
[... simulation runs ...]
|
v
Vec::drop() --> automatic deallocation
|
v
IDLE (zero memory)
Callers can query exact memory requirements before committing:
/// Returns the number of bytes required to simulate n_qubits.
/// This accounts for the state vector plus working memory for
/// gate application (temporary buffers, measurement arrays, etc.).
///
/// # Returns
/// - `Ok(bytes)` if the qubit count is representable
/// - `Err(...)` if 2^n_qubits overflows usize
pub fn estimate_memory(n_qubits: u32) -> Result<MemoryEstimate, SimulationError> {
let num_amplitudes = 1_usize.checked_shl(n_qubits)
.ok_or(SimulationError::QubitLimitExceeded {
requested: n_qubits,
maximum: (usize::BITS - 1) as u32,
estimated_memory_bytes: u64::MAX,
available_memory_bytes: 0,
})?;
let state_vector_bytes = num_amplitudes * std::mem::size_of::<Complex<f64>>();
// Working memory: temporary buffer for gate application (1 amplitude slice)
// Plus measurement result storage
let working_bytes = num_amplitudes * std::mem::size_of::<Complex<f64>>() / 4;
// Thread-local scratch space (per Rayon thread)
let thread_count = rayon::current_num_threads();
let scratch_per_thread = 64 * 1024; // 64 KiB per thread for local buffers
let thread_scratch = thread_count * scratch_per_thread;
Ok(MemoryEstimate {
state_vector_bytes: state_vector_bytes as u64,
working_bytes: working_bytes as u64,
thread_scratch_bytes: thread_scratch as u64,
total_bytes: (state_vector_bytes + working_bytes + thread_scratch) as u64,
num_amplitudes: num_amplitudes as u64,
})
}
#[derive(Debug, Clone)]
pub struct MemoryEstimate {
/// Bytes for the state vector (dominant cost).
pub state_vector_bytes: u64,
/// Bytes for gate-application working memory.
pub working_bytes: u64,
/// Bytes for thread-local scratch space.
pub thread_scratch_bytes: u64,
/// Total estimated bytes.
pub total_bytes: u64,
/// Number of complex amplitudes.
pub num_amplitudes: u64,
}
impl MemoryEstimate {
/// Returns true if the estimate fits within the given byte budget.
pub fn fits_in(&self, available_bytes: u64) -> bool {
self.total_bytes <= available_bytes
}
/// Suggest the maximum qubits for a given memory budget.
pub fn max_qubits_for(available_bytes: u64) -> u32 {
// Each qubit doubles memory; find largest n where 20 * 2^n <= available
// Factor of 20 accounts for 16-byte amplitudes + 25% working memory
let effective = available_bytes / 20;
if effective == 0 { return 0; }
(effective.ilog2()) as u32
}
}The engine never panics on allocation failure. All paths return structured errors:
// Pattern: every allocation is fallible and returns a descriptive error.
// State vector allocation failure:
SimulationError::MemoryAllocationFailed {
requested_bytes: 17_179_869_184, // 16 GiB
qubit_count: 30,
suggestion: "Reduce qubit count by 2 (to 28, ~4 GiB) or enable tensor-network backend",
}
// Integer overflow (qubit count too large):
SimulationError::QubitLimitExceeded {
requested: 64,
maximum: 33, // based on available memory
estimated_memory_bytes: u64::MAX,
available_memory_bytes: 68_719_476_736, // 64 GiB
}Decision tree on allocation failure:
Memory allocation failed
|
+-- Is tensor-network feature enabled?
| |
| +-- YES: Suggest tensor-network backend
| | (may work if circuit has low treewidth)
| |
| +-- NO: Suggest reducing qubit count
| Calculate: max_qubits = floor(log2(available / 20))
| Suggest: "Reduce to {max_qubits} qubits ({memory} bytes)"
|
+-- Is the request wildly over budget (>100x)?
| |
| +-- YES: "Circuit requires {X} GiB but only {Y} MiB available"
| |
| +-- NO: "Circuit requires {X} GiB, {Y} GiB available.
| Reducing by {delta} qubits would fit."
|
+-- Return SimulationError (no panic, no abort)
For simulations estimated to exceed 100ms, the engine can optionally yield between gate batches to allow the OS scheduler to manage power states:
pub struct YieldConfig {
/// Enable cooperative yielding between gate batches.
/// Default: false (maximum throughput).
pub enabled: bool,
/// Number of gates to apply before yielding.
/// Default: 1000.
pub gates_per_slice: usize,
/// Yield mechanism.
/// Default: ThreadYield (std::thread::yield_now).
pub yield_strategy: YieldStrategy,
}
pub enum YieldStrategy {
/// Call std::thread::yield_now() between slices.
ThreadYield,
/// Sleep for specified duration between slices.
Sleep(Duration),
/// Call a user-provided callback between slices.
Callback(Box<dyn Fn(SliceProgress) + Send>),
}
pub struct SliceProgress {
pub gates_completed: u64,
pub gates_remaining: u64,
pub elapsed: Duration,
pub estimated_remaining: Duration,
}
// Usage in gate application loop:
fn apply_gates_with_yield(
circuit: &QuantumCircuit,
state: &mut StateVector,
yield_config: &YieldConfig,
) -> Result<(), SimulationError> {
let gates = circuit.gates();
for (i, gate) in gates.iter().enumerate() {
apply_single_gate(gate, state)?;
if yield_config.enabled && (i + 1) % yield_config.gates_per_slice == 0 {
match &yield_config.yield_strategy {
YieldStrategy::ThreadYield => std::thread::yield_now(),
YieldStrategy::Sleep(d) => std::thread::sleep(*d),
YieldStrategy::Callback(cb) => cb(SliceProgress {
gates_completed: (i + 1) as u64,
gates_remaining: (gates.len() - i - 1) as u64,
elapsed: start.elapsed(),
estimated_remaining: estimate_remaining(i, gates.len(), start),
}),
}
}
}
Ok(())
}Yield is disabled by default to maximize throughput. It is primarily intended for:
- Edge devices where power management is critical.
- Interactive applications where UI responsiveness matters.
- Long-running simulations (>1 second) where progress reporting is needed.
The quantum engine does not create or manage its own threads:
+-----------------------------------------------+
| Global Rayon Thread Pool |
| (shared by all ruVector subsystems) |
| |
| [Thread 0] [Thread 1] ... [Thread N-1] |
| ^ ^ ^ |
| | | | |
| +--+---+ +--+---+ +---+--+ |
| | ruQu | | ruQu | | idle | |
| | gate | | gate | | | |
| | apply | | apply| | | |
| +-------+ +------+ +------+ |
| |
| During simulation: threads work on gates |
| After simulation: threads return to pool |
| Pool idle: OS can power-gate cores |
+-----------------------------------------------+
Key properties:
- Rayon's global thread pool is initialized once by
ruvector-coreat startup. - The quantum engine calls
rayon::par_iter()and related APIs, borrowing threads temporarily. - When simulation completes, all threads are returned to the global pool.
- If no ruVector work is pending, Rayon threads park (blocking on a condvar), consuming zero CPU. The OS can then power-gate the underlying cores.
WebAssembly linear memory has a specific behavior that affects resource management:
WASM Memory Layout
+------------------+------------------+
| Initial pages | Grown pages |
| (compiled size) | (runtime alloc) |
+------------------+------------------+
0 initial_size current_size
Growth: memory.grow(delta_pages) -> adds pages to the end
Shrink: NOT SUPPORTED in WASM spec
After 25-qubit simulation:
+------------------+----------------------------------+
| Initial (1 MiB) | Grown for state vec (512 MiB) | <- HIGH WATER MARK
+------------------+----------------------------------+
After simulation completes:
+------------------+----------------------------------+
| Initial (1 MiB) | FREED internally but pages |
| | still mapped (512 MiB virtual) |
+------------------+----------------------------------+
The Rust allocator returns memory to its free list,
but WASM pages are not returned to the host.
Implications and mitigations:
-
Document the behavior: Users must understand that WASM memory is a high-water mark. A 25-qubit simulation permanently increases the WASM instance's memory footprint to ~512 MiB.
-
Instance recycling: For applications that run multiple simulations, create a new WASM instance periodically to reset the memory high-water mark.
-
Memory budget enforcement: The WASM host can set
WebAssembly.Memorywith amaximumparameter to cap growth:
const memory = new WebAssembly.Memory({
initial: 16, // 1 MiB
maximum: 8192, // 512 MiB cap
});- Pre-check in WASM: The engine's
estimate_memory()function works in WASM and should be called before simulation to verify the allocation will succeed.
On Cognitum's tile-based architecture, the quantum engine maps to tiles as follows:
Cognitum Processor (256 tiles)
+--------+--------+--------+--------+
| Tile 0 | Tile 1 | Tile 2 | Tile 3 | <- Assigned to quantum sim
| ACTIVE | ACTIVE | ACTIVE | ACTIVE |
+--------+--------+--------+--------+
| Tile 4 | Tile 5 | Tile 6 | Tile 7 | <- Other ruVector work (or sleeping)
| sleep | vecDB | sleep | sleep |
+--------+--------+--------+--------+
| ... | ... | ... | ... |
| sleep | sleep | sleep | sleep | <- Power gated (zero consumption)
+--------+--------+--------+--------+
Power state diagram for a quantum simulation lifecycle:
State: ALL_TILES_IDLE
|
| Simulation request arrives
v
State: ALLOCATING
Action: Wake tiles 0-3 (or however many are needed)
Action: Allocate state vector across tile-local memory
Power: Tiles 0-3 ACTIVE, rest SLEEP
|
v
State: SIMULATING
Action: Apply gates in parallel across active tiles
Power: Tiles 0-3 at full clock rate
Duration: microseconds to seconds depending on circuit
|
v
State: MEASURING
Action: Sample measurement outcomes
Power: Tile 0 only (measurement is sequential)
|
v
State: DEALLOCATING
Action: Free state vector
Action: Return tiles to idle pool
|
v
State: ALL_TILES_IDLE
Power: Tiles 0-3 back to SLEEP
Memory: Zero heap allocation
Tile assignment policy:
- Small simulations (n <= 20): 1 tile sufficient.
- Medium simulations (20 < n <= 25): 2-4 tiles for parallel gate application.
- Large simulations (25 < n <= 30): All available tiles.
- The tile scheduler (part of Cognitum runtime) handles assignment. The quantum engine simply uses Rayon parallelism; the runtime maps Rayon threads to tiles.
Quick reference for capacity planning:
| Qubits | State Vector | Working Memory | Total | Platform Fit |
|---|---|---|---|---|
| 10 | 16 KiB | 4 KiB | 20 KiB | Any |
| 12 | 64 KiB | 16 KiB | 80 KiB | Any |
| 14 | 256 KiB | 64 KiB | 320 KiB | Any |
| 16 | 1 MiB | 256 KiB | 1.3 MiB | Any |
| 18 | 4 MiB | 1 MiB | 5 MiB | Any |
| 20 | 16 MiB | 4 MiB | 20 MiB | Any |
| 22 | 64 MiB | 16 MiB | 80 MiB | Cognitum single tile |
| 24 | 256 MiB | 64 MiB | 320 MiB | Cognitum 2+ tiles |
| 26 | 1 GiB | 256 MiB | 1.3 GiB | Cognitum cluster |
| 28 | 4 GiB | 1 GiB | 5 GiB | Laptop / RPi 8GB |
| 30 | 16 GiB | 4 GiB | 20 GiB | Workstation |
| 32 | 64 GiB | 16 GiB | 80 GiB | Server |
| 34 | 256 GiB | 64 GiB | 320 GiB | Large server |
Caller Engine OS/Allocator
| | |
| execute(circuit) | |
|-------------------->| |
| | |
| | estimate_memory(n) |
| | validate_available() |
| | |
| | try_reserve_exact(2^n) |
| |------------------------>|
| | |
| | Ok(ptr) or Err |
| |<------------------------|
| | |
| | [if Err: return |
| | SimulationError] |
| | |
| | initialize |00...0> |
| | apply gates |
| | measure |
| | |
| | build result |
| | (copies measurements, |
| | expectation values) |
| | |
| | drop(state_vector) |
| |------------------------>|
| | | free(ptr, 2^n * 16)
| | |
| Ok(result) | |
|<--------------------| |
| | |
| [Engine holds ZERO | |
| heap memory now] | |
- True zero-idle cost: No background resource consumption. Perfectly aligned with Cognitum's event-driven architecture and power gating.
- Predictable memory:
estimate_memory()gives exact requirements before committing, preventing OOM surprises. - Graceful degradation: Allocation failures return structured errors with actionable suggestions, never panics.
- Platform portable: The same allocation strategy works on native (Linux, macOS, Windows), WASM, and embedded (Cognitum tiles).
- No resource leaks: Rust's ownership system guarantees deallocation on all exit paths (success, error, panic).
- No state caching: Each simulation allocates and deallocates independently. Repeated simulations on the same qubit count pay allocation cost each time. Mitigation: allocation is O(2^n) but fast compared to O(G * 2^n) simulation.
- WASM memory high-water mark: Cannot reclaim WASM linear memory pages. Documented as a platform limitation with instance-recycling workaround.
- No memory pooling: Could theoretically amortize allocation across simulations, but this conflicts with the zero-idle-footprint requirement.
- Yield overhead: When enabled, cooperative yielding adds per-slice overhead. Mitigated by making it opt-in and configurable.
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| OOM despite estimate_memory check | Low | Crash | Check returns conservative estimate including working memory |
| WASM instance runs out of address space | Medium | Failure | Set WebAssembly.Memory maximum; document limitation |
| Allocation latency spike (OS page faults) | Medium | Slow start | Consider madvise / mlock hints for large allocations |
| Rayon thread pool contention | Medium | Degraded perf | Quantum engine yields between slices; Rayon work-stealing handles contention |
- Cognitum Architecture Specification: event-driven tile-based computing
- Rust
Vec::try_reserve_exact: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.try_reserve_exact - WebAssembly Memory: https://webassembly.github.io/spec/core/syntax/modules.html#memories
- Rayon thread pool: https://docs.rs/rayon
- ADR-QE-001: Core Engine Architecture (zero-overhead design principle)
- ADR-QE-005: WASM Compilation Target (WASM constraints)
- ADR-QE-009: Tensor Network Evaluation Mode (alternative for large circuits)
- ADR-QE-010: Observability & Monitoring (memory metrics reporting)