ADR-QE-009: Tensor Network Evaluation Mode

Status: Proposed Date: 2026-02-06 Authors: ruv.io, RuVector Team Deciders: Architecture Review Board

Context

Full state-vector simulation stores all 2^n complex amplitudes explicitly, yielding O(2^n) memory and O(G * 2^n) time for G gates. At n=30 this is 16 GiB; at n=40 it exceeds 16 TiB. Many practically interesting circuits, however, contain limited entanglement:

Circuit family	Entanglement structure	Treewidth
Shallow QAOA on sparse graphs	Bounded by graph degree	Low (often < 20)
Separate-register circuits	Disjoint qubit subsets	Sum of sub-widths
Near-Clifford circuits	Stabilizer + few T gates	Depends on T count
1D brickwork (finite depth)	Area-law entanglement	O(depth)
Random deep circuits (all-to-all)	Volume-law entanglement	O(n) -- no gain

For the first four families, tensor network (TN) methods can trade increased computation for drastically reduced memory by representing each gate as a tensor and contracting the resulting network in an optimized order. The contraction cost scales exponentially in the treewidth of the circuit's line graph rather than in the total qubit count.

QuantRS2 (the Rust quantum simulation reference) demonstrated tensor network contraction for circuits up to 60 qubits on commodity hardware when treewidth remained below ~25. ruVector's existing ruvector-mincut crate already solves graph partitioning problems that are structurally identical to contraction-order optimization, providing a natural integration point.

The ruQu engine needs this capability to support:

Surface code simulations at distance d >= 7 (49+ data qubits) for decoder validation, where the syndrome extraction circuit is shallow and geometrically local.
Variational algorithm prototyping (VQE, QAOA) on graphs larger than 30 nodes.
Hybrid workflows where part of the circuit is simulated via state vector and part via tensor contraction.

Decision

1. Feature-Gated Backend

Tensor network evaluation is implemented as an optional backend behind the tensor-network feature flag in ruqu-core:

# ruqu-core/Cargo.toml
[features]
default = ["state-vector"]
state-vector = []
tensor-network = ["dep:ndarray", "dep:petgraph"]
all-backends = ["state-vector", "tensor-network"]

When both backends are compiled in, the engine selects the backend at runtime based on circuit analysis (see Section 4 below).

2. Tensor Representation

Every gate becomes a tensor connecting the qubit wire indices it acts on:

Gate type	Tensor rank	Shape	Example
Single-qubit (H, X, Rz, ...)	2	[2, 2]	Input wire -> output wire
Two-qubit (CNOT, CZ, ...)	4	[2, 2, 2, 2]	Two input wires -> two output wires
Three-qubit (Toffoli)	6	[2, 2, 2, 2, 2, 2]	Three input -> three output
Measurement projector	2	[2, 2]	Diagonal in computational basis
Initial state	0>	1	[2]

The circuit is converted into a tensor network graph where:

Each tensor is a node.
Each shared index (qubit wire between consecutive gates) is an edge.
Open indices represent initial states and final measurement outcomes.

  |0>---[H]---[CNOT_ctrl]---[Rz]---<meas>
                  |
  |0>-----------[CNOT_tgt]---------<meas>

Becomes:

  Node: init_0 (rank 1)
    |
  Node: H_0 (rank 2)
    |
  Node: CNOT_01 (rank 4)
   / \
  |   Node: Rz_0 (rank 2)
  |     |
  |   Node: meas_0 (rank 2)
  |
  Node: init_1 (rank 1)
    ... (connected via CNOT shared index)
  Node: meas_1 (rank 2)

3. Contraction Strategy

Contraction order determines whether the computation is tractable. The cost of contracting two tensors is the product of the dimensions of all indices involved. Finding the optimal contraction order is NP-hard (equivalent to finding minimum treewidth), so we use heuristics.

Contraction Path Optimization Pseudocode

function find_contraction_path(tensor_network: TN) -> ContractionPath:
    // Phase 1: Simplify the network
    apply_trivial_contractions(tensor_network)  // rank-1 tensors, diagonal pairs

    // Phase 2: Detect community structure
    communities = detect_communities(tensor_network.graph)

    // Phase 3: Contract within communities first (small subproblems)
    intra_paths = []
    for community in communities:
        subgraph = tensor_network.subgraph(community)
        if subgraph.num_tensors <= 20:
            // Exact dynamic programming for small subgraphs
            path = optimal_einsum_dp(subgraph)
        else:
            // Greedy with lookahead for larger subgraphs
            path = greedy_with_lookahead(subgraph, lookahead=2)
        intra_paths.append(path)

    // Phase 4: Contract inter-community edges
    // Each community is now a single large tensor
    meta_graph = contract_communities(tensor_network, intra_paths)
    inter_path = greedy_with_lookahead(meta_graph, lookahead=3)

    // Phase 5: Compose the full path
    return compose_paths(intra_paths, inter_path)


function greedy_with_lookahead(tn: TN, lookahead: int) -> Path:
    path = []
    remaining = tn.clone()

    while remaining.num_tensors > 1:
        best_cost = INFINITY
        best_pair = None

        // Evaluate all candidate contractions
        for (i, j) in remaining.candidate_pairs():
            cost = contraction_cost(remaining, i, j)

            // Lookahead: estimate cost of subsequent contractions
            if lookahead > 0:
                simulated = remaining.simulate_contraction(i, j)
                future_cost = estimate_future_cost(simulated, lookahead - 1)
                cost += future_cost * DISCOUNT_FACTOR

            if cost < best_cost:
                best_cost = cost
                best_pair = (i, j)

        path.append(best_pair)
        remaining.contract(best_pair)

    return path

Community Detection via ruvector-mincut

The ruvector-mincut crate provides graph partitioning that is directly applicable to contraction ordering:

use ruvector_mincut::{partition, PartitionConfig};

fn partition_tensor_network(tn: &TensorNetwork) -> Vec<Vec<TensorId>> {
    let graph = tn.to_adjacency_graph();
    let config = PartitionConfig {
        num_partitions: estimate_optimal_partitions(tn),
        balance_factor: 1.1,  // Allow 10% imbalance
        minimize: Objective::EdgeCut,  // Minimize inter-partition wires
    };
    partition(&graph, &config)
}

The edge cut directly corresponds to the bond dimension of the inter-community contraction, so minimizing edge cut minimizes the most expensive contraction step.

4. MPS (Matrix Product State) Mode

For circuits with 1D-like connectivity (nearest-neighbor gates on a line), a Matrix Product State representation is more efficient than general tensor contraction.

    A[1] -- A[2] -- A[3] -- ... -- A[n]
     |       |       |               |
   phys_1  phys_2  phys_3         phys_n

Each site tensor A[i] has shape [bond_left, physical, bond_right] where:

physical = 2 (qubit dimension)
bond_left, bond_right = bond dimension chi

Bond dimension (chi)	Memory per site	Total memory (n qubits)	Approximation
1	16 bytes	16n bytes	Product state only
16	4 KiB	4n KiB	Low entanglement
64	64 KiB	64n KiB	Moderate entanglement
256	1 MiB	n MiB	High entanglement
1024	16 MiB	16n MiB	Near exact for many circuits

Truncation policy: After each two-qubit gate, perform SVD on the updated bond. If the bond dimension exceeds chi_max, truncate the smallest singular values. Track the total discarded weight (sum of squared discarded singular values) as a fidelity estimate:

pub struct MpsConfig {
    /// Maximum bond dimension. Truncation occurs above this.
    pub chi_max: usize,
    /// Minimum singular value to retain (relative to largest).
    pub svd_cutoff: f64,
    /// Accumulated truncation error (updated during simulation).
    pub fidelity_estimate: f64,
}

impl Default for MpsConfig {
    fn default() -> Self {
        Self {
            chi_max: 256,
            svd_cutoff: 1e-12,
            fidelity_estimate: 1.0,
        }
    }
}

5. Automatic Mode Selection

The engine analyzes the circuit before execution to recommend a backend:

pub enum RecommendedBackend {
    StateVector { reason: &'static str },
    TensorNetwork { estimated_treewidth: usize, reason: &'static str },
    Mps { estimated_max_bond: usize, reason: &'static str },
}

pub fn recommend_backend(circuit: &QuantumCircuit) -> RecommendedBackend {
    let n = circuit.num_qubits();
    let depth = circuit.depth();
    let connectivity = circuit.connectivity_graph();

    // Rule 1: Small circuits always use state vector
    if n <= 20 {
        return RecommendedBackend::StateVector {
            reason: "Small circuit; state vector is fastest below 20 qubits",
        };
    }

    // Rule 2: Check for 1D connectivity (MPS candidate)
    if connectivity.max_degree() <= 2 && connectivity.is_path_graph() {
        let estimated_bond = 2_usize.pow(depth.min(20) as u32);
        return RecommendedBackend::Mps {
            estimated_max_bond: estimated_bond,
            reason: "1D nearest-neighbor connectivity detected",
        };
    }

    // Rule 3: Estimate treewidth for general TN
    let estimated_tw = estimate_treewidth(&connectivity, depth);
    if estimated_tw < 25 && n > 25 {
        return RecommendedBackend::TensorNetwork {
            estimated_treewidth: estimated_tw,
            reason: "Low treewidth relative to qubit count",
        };
    }

    // Rule 4: Check memory feasibility for state vector
    let sv_memory = 16 * (1_usize << n);  // bytes
    let available = estimate_available_memory();
    if sv_memory > available {
        // Force TN even if treewidth is high -- at least it has a chance
        return RecommendedBackend::TensorNetwork {
            estimated_treewidth: estimated_tw,
            reason: "State vector exceeds available memory; TN is only option",
        };
    }

    RecommendedBackend::StateVector {
        reason: "High treewidth circuit; state vector is more efficient",
    }
}

6. When Tensor Networks Win vs Lose

Tensor networks win when:

Scenario	Why TN wins	Example
Shallow circuits on many qubits	Treewidth ~ depth, not n	50-qubit depth-4 QAOA
Sparse graph connectivity	Low treewidth from graph structure	MaxCut on 3-regular graph
Separate registers	Independent contractions	n/2 Bell pairs
Near-Clifford	Stabilizer + few non-Clifford gates	Clifford + 5 T gates
Amplitude computation	Contract to single output, not full state	Sampling one bitstring

Tensor networks lose when:

Scenario	Why TN loses	Fallback
Deep random circuits	Treewidth ~ n	State vector (if n <= 30)
All-to-all connectivity	No structure to exploit	State vector
Full state tomography needed	Must contract once per amplitude	State vector
Very small circuits (n < 20)	Overhead exceeds state vector	State vector
High-fidelity MPS needed	Bond dimension grows exponentially	State vector or exact TN

7. Example: 50-Qubit Shallow QAOA

Consider QAOA depth p=1 on a 50-node 3-regular graph:

Circuit structure:
  - 50 qubits, initialized to |+>
  - 75 ZZ gates (one per edge), parameterized by gamma
  - 50 Rx gates, parameterized by beta
  - Total: 125 + 50 = 175 gates
  - Circuit depth: 4 (H layer, ZZ layer (3-colorable), Rx layer, measure)

Graph treewidth of 3-regular graph: typically 8-15

Tensor network contraction:
  - Community detection finds ~5-8 communities of 6-10 nodes
  - Intra-community contraction: O(2^10) ~ 1024 per community
  - Inter-community bonds: ~15 edges cut
  - Effective contraction complexity: O(2^15) = 32768
  - Compare to state vector: O(2^50) = 1.1 * 10^15

Memory comparison:
  - State vector: 2^50 * 16 bytes = 16 PiB (impossible)
  - Tensor network: ~100 MiB working memory
  - Speedup factor: practically infinite (feasible vs infeasible)

Contraction Diagram (simplified):

  Community A        Community B        Community C
  [q0-q9]           [q10-q19]          [q20-q29]
     |                  |                   |
     +--- bond=2^3 ----+---- bond=2^4 -----+
                        |
  Community D        Community E
  [q30-q39]          [q40-q49]
     |                  |
     +--- bond=2^3 ----+

  Peak intermediate tensor: 2^15 elements = 512 KiB

8. Integration with State Vector Backend

Both backends implement the same trait:

pub trait SimulationBackend {
    /// Execute the circuit and return measurement results.
    fn execute(
        &self,
        circuit: &QuantumCircuit,
        shots: usize,
        config: &SimulationConfig,
    ) -> Result<SimulationResult, SimulationError>;

    /// Compute expectation value of an observable.
    fn expectation_value(
        &self,
        circuit: &QuantumCircuit,
        observable: &Observable,
        config: &SimulationConfig,
    ) -> Result<f64, SimulationError>;

    /// Return the backend name for logging.
    fn name(&self) -> &'static str;
}

Users interact through QuantumCircuit and never need to know which backend is active:

let circuit = QuantumCircuit::new(50)
    .h_all()
    .append_qaoa_layer(graph, gamma, beta)
    .measure_all();

// Automatic backend selection
let result = ruqu::execute(&circuit, 1000)?;
// -> Internally selects TensorNetwork backend due to n=50, low treewidth

// Or explicit backend override
let result = ruqu::execute_with_backend(
    &circuit,
    1000,
    Backend::TensorNetwork(TnConfig::default()),
)?;

9. Future: ruvector-mincut Integration for Contraction Ordering

The ruvector-mincut crate currently solves balanced graph partitioning for vector index sharding. The same algorithm directly applies to tensor network contraction ordering via the following correspondence:

Graph partitioning concept	TN contraction concept
Vertex	Tensor
Edge weight	Bond dimension (log2)
Partition	Contraction subtree
Edge cut	Inter-partition bond cost
Balanced partition	Balanced contraction tree

Phase 1 (this ADR): Use ruvector-mincut for community detection in contraction path optimization.

Phase 2 (future): Extend ruvector-mincut with hypergraph partitioning for multi-index tensor contractions, enabling handling of higher-order tensor networks (e.g., PEPS for 2D circuits).

Consequences

Positive

Dramatically expanded qubit range: Shallow circuits on 40-60 qubits become tractable on commodity hardware.
Surface code simulation: Distance-7 surface codes (49 data + 48 ancilla = 97 qubits) can be simulated for decoder validation using MPS (the circuit is geometrically local).
Unified interface: Users write circuits once; backend selection is automatic.
Synergy with ruvector-mincut: Leverages existing graph partitioning investment.
Complementary to state vector: Each backend covers the other's weakness.

Negative

Implementation complexity: Tensor contraction, SVD truncation, and path optimization are non-trivial to implement correctly and efficiently.
Approximation risk: MPS truncation introduces controlled but nonzero error. Users must understand fidelity estimates.
Compilation time: The ndarray and petgraph dependencies add to compile time when the feature is enabled.
Testing surface: Two backends doubles the testing matrix for correctness validation.
Performance unpredictability: Contraction cost depends on circuit structure in ways that are hard to predict without running the path optimizer.

Risks and Mitigations

Risk	Likelihood	Impact	Mitigation
Path optimizer finds poor ordering	Medium	High cost	Multiple heuristics + timeout fallback to greedy
MPS fidelity silently degrades	Medium	Incorrect results	Track discarded weight; warn if fidelity < 0.99
Feature interaction bugs	Low	Incorrect results	Shared test suite: both backends must agree on small circuits
Memory spike during contraction	Medium	OOM	Pre-estimate peak intermediate tensor size; abort if too large

References

QuantRS2 tensor network implementation: internal reference
Markov & Shi, "Simulating Quantum Computation by Contracting Tensor Networks" (2008)
Gray & Kourtis, "Hyper-optimized tensor network contraction" (2021) -- cotengra
Schollwock, "The density-matrix renormalization group in the age of matrix product states" (2011)
ADR-QE-001: Core Engine Architecture (state vector backend)
ADR-QE-005: WASM Compilation Target
ruvector-mincut crate documentation
ADR-014: Coherence Engine (graph partitioning reuse)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ADR-QE-009: Tensor Network Evaluation Mode

Context

Decision

1. Feature-Gated Backend

2. Tensor Representation

3. Contraction Strategy

Contraction Path Optimization Pseudocode

Community Detection via ruvector-mincut

4. MPS (Matrix Product State) Mode

5. Automatic Mode Selection

6. When Tensor Networks Win vs Lose

7. Example: 50-Qubit Shallow QAOA

8. Integration with State Vector Backend

9. Future: ruvector-mincut Integration for Contraction Ordering

Consequences

Positive

Negative

Risks and Mitigations

References

FilesExpand file tree

ADR-QE-009-tensor-network-evaluation.md

Latest commit

History

ADR-QE-009-tensor-network-evaluation.md

File metadata and controls

ADR-QE-009: Tensor Network Evaluation Mode

Context

Decision

1. Feature-Gated Backend

2. Tensor Representation

3. Contraction Strategy

Contraction Path Optimization Pseudocode

Community Detection via ruvector-mincut

4. MPS (Matrix Product State) Mode

5. Automatic Mode Selection

6. When Tensor Networks Win vs Lose

7. Example: 50-Qubit Shallow QAOA

8. Integration with State Vector Backend

9. Future: ruvector-mincut Integration for Contraction Ordering

Consequences

Positive

Negative

Risks and Mitigations

References