Status: Proposed Date: 2026-02-06 Authors: ruv.io, RuVector Team Deciders: Architecture Review Board
Full state-vector simulation stores all 2^n complex amplitudes explicitly, yielding O(2^n) memory and O(G * 2^n) time for G gates. At n=30 this is 16 GiB; at n=40 it exceeds 16 TiB. Many practically interesting circuits, however, contain limited entanglement:
| Circuit family | Entanglement structure | Treewidth |
|---|---|---|
| Shallow QAOA on sparse graphs | Bounded by graph degree | Low (often < 20) |
| Separate-register circuits | Disjoint qubit subsets | Sum of sub-widths |
| Near-Clifford circuits | Stabilizer + few T gates | Depends on T count |
| 1D brickwork (finite depth) | Area-law entanglement | O(depth) |
| Random deep circuits (all-to-all) | Volume-law entanglement | O(n) -- no gain |
For the first four families, tensor network (TN) methods can trade increased computation for drastically reduced memory by representing each gate as a tensor and contracting the resulting network in an optimized order. The contraction cost scales exponentially in the treewidth of the circuit's line graph rather than in the total qubit count.
QuantRS2 (the Rust quantum simulation reference) demonstrated tensor network
contraction for circuits up to 60 qubits on commodity hardware when treewidth
remained below ~25. ruVector's existing ruvector-mincut crate already solves graph
partitioning problems that are structurally identical to contraction-order
optimization, providing a natural integration point.
The ruQu engine needs this capability to support:
- Surface code simulations at distance d >= 7 (49+ data qubits) for decoder validation, where the syndrome extraction circuit is shallow and geometrically local.
- Variational algorithm prototyping (VQE, QAOA) on graphs larger than 30 nodes.
- Hybrid workflows where part of the circuit is simulated via state vector and part via tensor contraction.
Tensor network evaluation is implemented as an optional backend behind the
tensor-network feature flag in ruqu-core:
# ruqu-core/Cargo.toml
[features]
default = ["state-vector"]
state-vector = []
tensor-network = ["dep:ndarray", "dep:petgraph"]
all-backends = ["state-vector", "tensor-network"]When both backends are compiled in, the engine selects the backend at runtime based on circuit analysis (see Section 4 below).
Every gate becomes a tensor connecting the qubit wire indices it acts on:
| Gate type | Tensor rank | Shape | Example |
|---|---|---|---|
| Single-qubit (H, X, Rz, ...) | 2 | [2, 2] | Input wire -> output wire |
| Two-qubit (CNOT, CZ, ...) | 4 | [2, 2, 2, 2] | Two input wires -> two output wires |
| Three-qubit (Toffoli) | 6 | [2, 2, 2, 2, 2, 2] | Three input -> three output |
| Measurement projector | 2 | [2, 2] | Diagonal in computational basis |
| Initial state | 0> | 1 | [2] |
The circuit is converted into a tensor network graph where:
- Each tensor is a node.
- Each shared index (qubit wire between consecutive gates) is an edge.
- Open indices represent initial states and final measurement outcomes.
|0>---[H]---[CNOT_ctrl]---[Rz]---<meas>
|
|0>-----------[CNOT_tgt]---------<meas>
Becomes:
Node: init_0 (rank 1)
|
Node: H_0 (rank 2)
|
Node: CNOT_01 (rank 4)
/ \
| Node: Rz_0 (rank 2)
| |
| Node: meas_0 (rank 2)
|
Node: init_1 (rank 1)
... (connected via CNOT shared index)
Node: meas_1 (rank 2)
Contraction order determines whether the computation is tractable. The cost of contracting two tensors is the product of the dimensions of all indices involved. Finding the optimal contraction order is NP-hard (equivalent to finding minimum treewidth), so we use heuristics.
function find_contraction_path(tensor_network: TN) -> ContractionPath:
// Phase 1: Simplify the network
apply_trivial_contractions(tensor_network) // rank-1 tensors, diagonal pairs
// Phase 2: Detect community structure
communities = detect_communities(tensor_network.graph)
// Phase 3: Contract within communities first (small subproblems)
intra_paths = []
for community in communities:
subgraph = tensor_network.subgraph(community)
if subgraph.num_tensors <= 20:
// Exact dynamic programming for small subgraphs
path = optimal_einsum_dp(subgraph)
else:
// Greedy with lookahead for larger subgraphs
path = greedy_with_lookahead(subgraph, lookahead=2)
intra_paths.append(path)
// Phase 4: Contract inter-community edges
// Each community is now a single large tensor
meta_graph = contract_communities(tensor_network, intra_paths)
inter_path = greedy_with_lookahead(meta_graph, lookahead=3)
// Phase 5: Compose the full path
return compose_paths(intra_paths, inter_path)
function greedy_with_lookahead(tn: TN, lookahead: int) -> Path:
path = []
remaining = tn.clone()
while remaining.num_tensors > 1:
best_cost = INFINITY
best_pair = None
// Evaluate all candidate contractions
for (i, j) in remaining.candidate_pairs():
cost = contraction_cost(remaining, i, j)
// Lookahead: estimate cost of subsequent contractions
if lookahead > 0:
simulated = remaining.simulate_contraction(i, j)
future_cost = estimate_future_cost(simulated, lookahead - 1)
cost += future_cost * DISCOUNT_FACTOR
if cost < best_cost:
best_cost = cost
best_pair = (i, j)
path.append(best_pair)
remaining.contract(best_pair)
return path
The ruvector-mincut crate provides graph partitioning that is directly applicable
to contraction ordering:
use ruvector_mincut::{partition, PartitionConfig};
fn partition_tensor_network(tn: &TensorNetwork) -> Vec<Vec<TensorId>> {
let graph = tn.to_adjacency_graph();
let config = PartitionConfig {
num_partitions: estimate_optimal_partitions(tn),
balance_factor: 1.1, // Allow 10% imbalance
minimize: Objective::EdgeCut, // Minimize inter-partition wires
};
partition(&graph, &config)
}The edge cut directly corresponds to the bond dimension of the inter-community contraction, so minimizing edge cut minimizes the most expensive contraction step.
For circuits with 1D-like connectivity (nearest-neighbor gates on a line), a Matrix Product State representation is more efficient than general tensor contraction.
A[1] -- A[2] -- A[3] -- ... -- A[n]
| | | |
phys_1 phys_2 phys_3 phys_n
Each site tensor A[i] has shape [bond_left, physical, bond_right] where:
physical= 2 (qubit dimension)bond_left,bond_right= bond dimension chi
| Bond dimension (chi) | Memory per site | Total memory (n qubits) | Approximation |
|---|---|---|---|
| 1 | 16 bytes | 16n bytes | Product state only |
| 16 | 4 KiB | 4n KiB | Low entanglement |
| 64 | 64 KiB | 64n KiB | Moderate entanglement |
| 256 | 1 MiB | n MiB | High entanglement |
| 1024 | 16 MiB | 16n MiB | Near exact for many circuits |
Truncation policy: After each two-qubit gate, perform SVD on the updated bond.
If the bond dimension exceeds chi_max, truncate the smallest singular values.
Track the total discarded weight (sum of squared discarded singular values) as a
fidelity estimate:
pub struct MpsConfig {
/// Maximum bond dimension. Truncation occurs above this.
pub chi_max: usize,
/// Minimum singular value to retain (relative to largest).
pub svd_cutoff: f64,
/// Accumulated truncation error (updated during simulation).
pub fidelity_estimate: f64,
}
impl Default for MpsConfig {
fn default() -> Self {
Self {
chi_max: 256,
svd_cutoff: 1e-12,
fidelity_estimate: 1.0,
}
}
}The engine analyzes the circuit before execution to recommend a backend:
pub enum RecommendedBackend {
StateVector { reason: &'static str },
TensorNetwork { estimated_treewidth: usize, reason: &'static str },
Mps { estimated_max_bond: usize, reason: &'static str },
}
pub fn recommend_backend(circuit: &QuantumCircuit) -> RecommendedBackend {
let n = circuit.num_qubits();
let depth = circuit.depth();
let connectivity = circuit.connectivity_graph();
// Rule 1: Small circuits always use state vector
if n <= 20 {
return RecommendedBackend::StateVector {
reason: "Small circuit; state vector is fastest below 20 qubits",
};
}
// Rule 2: Check for 1D connectivity (MPS candidate)
if connectivity.max_degree() <= 2 && connectivity.is_path_graph() {
let estimated_bond = 2_usize.pow(depth.min(20) as u32);
return RecommendedBackend::Mps {
estimated_max_bond: estimated_bond,
reason: "1D nearest-neighbor connectivity detected",
};
}
// Rule 3: Estimate treewidth for general TN
let estimated_tw = estimate_treewidth(&connectivity, depth);
if estimated_tw < 25 && n > 25 {
return RecommendedBackend::TensorNetwork {
estimated_treewidth: estimated_tw,
reason: "Low treewidth relative to qubit count",
};
}
// Rule 4: Check memory feasibility for state vector
let sv_memory = 16 * (1_usize << n); // bytes
let available = estimate_available_memory();
if sv_memory > available {
// Force TN even if treewidth is high -- at least it has a chance
return RecommendedBackend::TensorNetwork {
estimated_treewidth: estimated_tw,
reason: "State vector exceeds available memory; TN is only option",
};
}
RecommendedBackend::StateVector {
reason: "High treewidth circuit; state vector is more efficient",
}
}Tensor networks win when:
| Scenario | Why TN wins | Example |
|---|---|---|
| Shallow circuits on many qubits | Treewidth ~ depth, not n | 50-qubit depth-4 QAOA |
| Sparse graph connectivity | Low treewidth from graph structure | MaxCut on 3-regular graph |
| Separate registers | Independent contractions | n/2 Bell pairs |
| Near-Clifford | Stabilizer + few non-Clifford gates | Clifford + 5 T gates |
| Amplitude computation | Contract to single output, not full state | Sampling one bitstring |
Tensor networks lose when:
| Scenario | Why TN loses | Fallback |
|---|---|---|
| Deep random circuits | Treewidth ~ n | State vector (if n <= 30) |
| All-to-all connectivity | No structure to exploit | State vector |
| Full state tomography needed | Must contract once per amplitude | State vector |
| Very small circuits (n < 20) | Overhead exceeds state vector | State vector |
| High-fidelity MPS needed | Bond dimension grows exponentially | State vector or exact TN |
Consider QAOA depth p=1 on a 50-node 3-regular graph:
Circuit structure:
- 50 qubits, initialized to |+>
- 75 ZZ gates (one per edge), parameterized by gamma
- 50 Rx gates, parameterized by beta
- Total: 125 + 50 = 175 gates
- Circuit depth: 4 (H layer, ZZ layer (3-colorable), Rx layer, measure)
Graph treewidth of 3-regular graph: typically 8-15
Tensor network contraction:
- Community detection finds ~5-8 communities of 6-10 nodes
- Intra-community contraction: O(2^10) ~ 1024 per community
- Inter-community bonds: ~15 edges cut
- Effective contraction complexity: O(2^15) = 32768
- Compare to state vector: O(2^50) = 1.1 * 10^15
Memory comparison:
- State vector: 2^50 * 16 bytes = 16 PiB (impossible)
- Tensor network: ~100 MiB working memory
- Speedup factor: practically infinite (feasible vs infeasible)
Contraction Diagram (simplified):
Community A Community B Community C
[q0-q9] [q10-q19] [q20-q29]
| | |
+--- bond=2^3 ----+---- bond=2^4 -----+
|
Community D Community E
[q30-q39] [q40-q49]
| |
+--- bond=2^3 ----+
Peak intermediate tensor: 2^15 elements = 512 KiB
Both backends implement the same trait:
pub trait SimulationBackend {
/// Execute the circuit and return measurement results.
fn execute(
&self,
circuit: &QuantumCircuit,
shots: usize,
config: &SimulationConfig,
) -> Result<SimulationResult, SimulationError>;
/// Compute expectation value of an observable.
fn expectation_value(
&self,
circuit: &QuantumCircuit,
observable: &Observable,
config: &SimulationConfig,
) -> Result<f64, SimulationError>;
/// Return the backend name for logging.
fn name(&self) -> &'static str;
}Users interact through QuantumCircuit and never need to know which backend is
active:
let circuit = QuantumCircuit::new(50)
.h_all()
.append_qaoa_layer(graph, gamma, beta)
.measure_all();
// Automatic backend selection
let result = ruqu::execute(&circuit, 1000)?;
// -> Internally selects TensorNetwork backend due to n=50, low treewidth
// Or explicit backend override
let result = ruqu::execute_with_backend(
&circuit,
1000,
Backend::TensorNetwork(TnConfig::default()),
)?;The ruvector-mincut crate currently solves balanced graph partitioning for vector
index sharding. The same algorithm directly applies to tensor network contraction
ordering via the following correspondence:
| Graph partitioning concept | TN contraction concept |
|---|---|
| Vertex | Tensor |
| Edge weight | Bond dimension (log2) |
| Partition | Contraction subtree |
| Edge cut | Inter-partition bond cost |
| Balanced partition | Balanced contraction tree |
Phase 1 (this ADR): Use ruvector-mincut for community detection in contraction
path optimization.
Phase 2 (future): Extend ruvector-mincut with hypergraph partitioning for
multi-index tensor contractions, enabling handling of higher-order tensor networks
(e.g., PEPS for 2D circuits).
- Dramatically expanded qubit range: Shallow circuits on 40-60 qubits become tractable on commodity hardware.
- Surface code simulation: Distance-7 surface codes (49 data + 48 ancilla = 97 qubits) can be simulated for decoder validation using MPS (the circuit is geometrically local).
- Unified interface: Users write circuits once; backend selection is automatic.
- Synergy with ruvector-mincut: Leverages existing graph partitioning investment.
- Complementary to state vector: Each backend covers the other's weakness.
- Implementation complexity: Tensor contraction, SVD truncation, and path optimization are non-trivial to implement correctly and efficiently.
- Approximation risk: MPS truncation introduces controlled but nonzero error. Users must understand fidelity estimates.
- Compilation time: The
ndarrayandpetgraphdependencies add to compile time when the feature is enabled. - Testing surface: Two backends doubles the testing matrix for correctness validation.
- Performance unpredictability: Contraction cost depends on circuit structure in ways that are hard to predict without running the path optimizer.
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| Path optimizer finds poor ordering | Medium | High cost | Multiple heuristics + timeout fallback to greedy |
| MPS fidelity silently degrades | Medium | Incorrect results | Track discarded weight; warn if fidelity < 0.99 |
| Feature interaction bugs | Low | Incorrect results | Shared test suite: both backends must agree on small circuits |
| Memory spike during contraction | Medium | OOM | Pre-estimate peak intermediate tensor size; abort if too large |
- QuantRS2 tensor network implementation: internal reference
- Markov & Shi, "Simulating Quantum Computation by Contracting Tensor Networks" (2008)
- Gray & Kourtis, "Hyper-optimized tensor network contraction" (2021) -- cotengra
- Schollwock, "The density-matrix renormalization group in the age of matrix product states" (2011)
- ADR-QE-001: Core Engine Architecture (state vector backend)
- ADR-QE-005: WASM Compilation Target
ruvector-mincutcrate documentation- ADR-014: Coherence Engine (graph partitioning reuse)