Skip to content

Latest commit

 

History

History
336 lines (253 loc) · 13.4 KB

File metadata and controls

336 lines (253 loc) · 13.4 KB

ADR-016: RuVector Integration for Training Pipeline

Status

Accepted

Context

The wifi-densepose-train crate (ADR-015) was initially implemented using standard crates (petgraph, ndarray, custom signal processing). The ruvector ecosystem provides published Rust crates with subpolynomial algorithms that directly replace several components with superior implementations.

All ruvector crates are published at v2.0.4 on crates.io (confirmed) and their source is available at https://github.com/ruvnet/ruvector.

Available ruvector crates (all at v2.0.4, published on crates.io)

Crate Description Default Features
ruvector-mincut World's first subpolynomial dynamic min-cut exact, approximate
ruvector-attn-mincut Min-cut gating attention (graph-based alternative to softmax) all modules
ruvector-attention Geometric, graph, and sparse attention mechanisms all modules
ruvector-temporal-tensor Temporal tensor compression with tiered quantization all modules
ruvector-solver Sublinear-time sparse linear solvers O(log n) to O(√n) neumann, cg, forward-push
ruvector-core HNSW-indexed vector database core v2.0.5
ruvector-math Optimal transport, information geometry v2.0.4

Verified API Details (from source inspection of github.com/ruvnet/ruvector)

ruvector-mincut

use ruvector_mincut::{MinCutBuilder, DynamicMinCut, MinCutResult, VertexId, Weight};

// Build a dynamic min-cut structure
let mut mincut = MinCutBuilder::new()
    .exact()                                          // or .approximate(0.1)
    .with_edges(vec![(u: VertexId, v: VertexId, w: Weight)])  // (u32, u32, f64) tuples
    .build()
    .expect("Failed to build");

// Subpolynomial O(n^{o(1)}) amortized dynamic updates
mincut.insert_edge(u, v, weight) -> Result<f64>   // new cut value
mincut.delete_edge(u, v) -> Result<f64>           // new cut value

// Queries
mincut.min_cut_value() -> f64
mincut.min_cut() -> MinCutResult                  // includes partition
mincut.partition() -> (Vec<VertexId>, Vec<VertexId>)   // S and T sets
mincut.cut_edges() -> Vec<Edge>                   // edges crossing the cut
// Note: VertexId = u64 (not u32); Edge has fields { source: u64, target: u64, weight: f64 }

MinCutResult contains:

  • value: f64 — minimum cut weight
  • is_exact: bool
  • approximation_ratio: f64
  • partition: Option<(Vec<VertexId>, Vec<VertexId>)> — S and T node sets

ruvector-attn-mincut

use ruvector_attn_mincut::{attn_mincut, attn_softmax, AttentionOutput, MinCutConfig};

// Min-cut gated attention (drop-in for softmax attention)
// Q, K, V are all flat &[f32] with shape [seq_len, d]
let output: AttentionOutput = attn_mincut(
    q: &[f32],       // queries: flat [seq_len * d]
    k: &[f32],       // keys:    flat [seq_len * d]
    v: &[f32],       // values:  flat [seq_len * d]
    d: usize,        // feature dimension
    seq_len: usize,  // number of tokens / antenna paths
    lambda: f32,     // min-cut threshold (larger = more pruning)
    tau: usize,      // temporal hysteresis window
    eps: f32,        // numerical epsilon
) -> AttentionOutput;

// AttentionOutput
pub struct AttentionOutput {
    pub output: Vec<f32>,  // attended values [seq_len * d]
    pub gating: GatingResult,  // which edges were kept/pruned
}

// Baseline softmax attention for comparison
let output: Vec<f32> = attn_softmax(q, k, v, d, seq_len);

Use case in wifi-densepose-train: In ModalityTranslator, treat the T * n_tx * n_rx antenna×time paths as seq_len tokens and the n_sc subcarriers as feature dimension d. Apply attn_mincut to gate irrelevant antenna-pair correlations before passing to FC layers.

ruvector-solver (NeumannSolver)

use ruvector_solver::neumann::NeumannSolver;
use ruvector_solver::types::CsrMatrix;
use ruvector_solver::traits::SolverEngine;

// Build sparse matrix from COO entries
let matrix = CsrMatrix::<f32>::from_coo(rows, cols, vec![
    (row: usize, col: usize, val: f32), ...
]);

// Solve Ax = b in O(√n) for sparse systems
let solver = NeumannSolver::new(tolerance: f64, max_iterations: usize);
let result = solver.solve(&matrix, rhs: &[f32]) -> Result<SolverResult, SolverError>;

// SolverResult
result.solution: Vec<f32>   // solution vector x
result.residual_norm: f64   // ||b - Ax||
result.iterations: usize    // number of iterations used

Use case in wifi-densepose-train: In subcarrier.rs, model the 114→56 subcarrier resampling as a sparse regularized least-squares problem A·x ≈ b where A is a sparse basis-function matrix (physically motivated by multipath propagation model: each target subcarrier is a sparse combination of adjacent source subcarriers). Gives O(√n) vs O(n) for n=114 subcarriers.

ruvector-temporal-tensor

use ruvector_temporal_tensor::{TemporalTensorCompressor, TierPolicy};
use ruvector_temporal_tensor::segment;

// Create compressor for `element_count` f32 elements per frame
let mut comp = TemporalTensorCompressor::new(
    TierPolicy::default(),  // configures hot/warm/cold thresholds
    element_count: usize,   // n_tx * n_rx * n_sc (elements per CSI frame)
    id: u64,                // tensor identity (0 for amplitude, 1 for phase)
);

// Mark access recency (drives tier selection):
//   hot  = accessed within last few timestamps → 8-bit  (~4x compression)
//   warm = moderately recent               → 5 or 7-bit (~4.6–6.4x)
//   cold = rarely accessed                 → 3-bit     (~10.67x)
comp.set_access(timestamp: u64, tensor_id: u64);

// Compress frames into a byte segment
let mut segment_buf: Vec<u8> = Vec::new();
comp.push_frame(frame: &[f32], timestamp: u64, &mut segment_buf);
comp.flush(&mut segment_buf);  // flush current partial segment

// Decompress
let mut decoded: Vec<f32> = Vec::new();
segment::decode(&segment_buf, &mut decoded);  // all frames
segment::decode_single_frame(&segment_buf, frame_index: usize) -> Option<Vec<f32>>;
segment::compression_ratio(&segment_buf) -> f64;

Use case in wifi-densepose-train: In dataset.rs, buffer CSI frames in TemporalTensorCompressor to reduce memory footprint by 50–75%. The CSI window contains window_frames (default 100) frames per sample; hot frames (recent) stay at f32 fidelity, cold frames (older) are aggressively quantized.

ruvector-attention

use ruvector_attention::{
    attention::ScaledDotProductAttention,
    traits::Attention,
};

let attention = ScaledDotProductAttention::new(d: usize);  // feature dim

// Compute attention: q is [d], keys and values are Vec<&[f32]>
let output: Vec<f32> = attention.compute(
    query: &[f32],          // [d]
    keys: &[&[f32]],        // n_nodes × [d]
    values: &[&[f32]],      // n_nodes × [d]
) -> Result<Vec<f32>>;

Use case in wifi-densepose-train: In model.rs spatial decoder, replace the standard Conv2D upsampling pass with graph-based spatial attention among spatial locations, where nodes represent spatial grid points and edges connect neighboring antenna footprints.


Decision

Integrate ruvector crates into wifi-densepose-train at five integration points:

1. ruvector-mincutmetrics.rs (replaces petgraph Hungarian for multi-frame)

Before: O(n³) Kuhn-Munkres via DFS augmenting paths using petgraph::DiGraph, single-frame only (no state across frames).

After: DynamicPersonMatcher struct wrapping ruvector_mincut::DynamicMinCut. Maintains the bipartite assignment graph across frames using subpolynomial updates:

  • insert_edge(pred_id, gt_id, oks_cost) when new person detected
  • delete_edge(pred_id, gt_id) when person leaves scene
  • partition() returns S/T split → cut_edges() returns the matched pred→gt pairs

Performance: O(n^{1.5} log n) amortized update vs O(n³) rebuild per frame. Critical for >3 person scenarios and video tracking (frame-to-frame updates).

The original hungarian_assignment function is kept for single-frame static matching (used in proof verification for determinism).

2. ruvector-attn-mincutmodel.rs (replaces flat MLP fusion in ModalityTranslator)

Before: Amplitude/phase FC encoders → concatenate [B, 512] → fuse Linear → ReLU.

After: Treat the n_ant = T * n_tx * n_rx antenna×time paths as seq_len tokens and n_sc subcarriers as feature dimension d. Apply attn_mincut to gate irrelevant antenna-pair correlations:

// In ModalityTranslator::forward_t:
// amp/ph tensors: [B, n_ant, n_sc] → convert to Vec<f32>
// Apply attn_mincut with seq_len=n_ant, d=n_sc, lambda=0.3
// → attended output [B, n_ant, n_sc] → flatten → FC layers

Benefit: Automatic antenna-path selection without explicit learned masks; min-cut gating is more computationally principled than learned gates.

3. ruvector-temporal-tensordataset.rs (CSI temporal compression)

Before: Raw CSI windows stored as full f32 Array4<f32> in memory.

After: CompressedCsiBuffer struct backed by TemporalTensorCompressor. Tiered quantization based on frame access recency:

  • Hot frames (last 10): f32 equivalent (8-bit quant ≈ 4× smaller than f32)
  • Warm frames (11–50): 5/7-bit quantization
  • Cold frames (>50): 3-bit (10.67× smaller)

Encode on push_frame, decode on get(idx) for transparent access.

Benefit: 50–75% memory reduction for the default 100-frame temporal window; allows 2–4× larger batch sizes on constrained hardware.

4. ruvector-solversubcarrier.rs (phase sanitization)

Before: Linear interpolation across subcarriers using precomputed (i0, i1, frac) tuples.

After: NeumannSolver for sparse regularized least-squares subcarrier interpolation. The CSI spectrum is modeled as a sparse combination of Fourier basis functions (physically motivated by multipath propagation):

// A = sparse basis matrix [target_sc, src_sc] (Gaussian or sinc basis)
// b = source CSI values [src_sc]
// Solve: A·x ≈ b via NeumannSolver(tolerance=1e-5, max_iter=500)
// x = interpolated values at target subcarrier positions

Benefit: O(√n) vs O(n) for n=114 source subcarriers; more accurate at subcarrier boundaries than linear interpolation.

5. ruvector-attentionmodel.rs (spatial decoder)

Before: Standard ConvTranspose2D upsampling in KeypointHead and DensePoseHead.

After: ScaledDotProductAttention applied to spatial feature nodes. Each spatial location [H×W] becomes a token; attention captures long-range spatial dependencies between antenna footprint regions:

// feature map: [B, C, H, W] → flatten to [B, H*W, C]
// For each batch: compute attention among H*W spatial nodes
// → reshape back to [B, C, H, W]

Benefit: Captures long-range spatial dependencies missed by local convolutions; important for multi-person scenarios.


Implementation Plan

Files modified

File Change
Cargo.toml (workspace + crate) Add ruvector-mincut, ruvector-attn-mincut, ruvector-temporal-tensor, ruvector-solver, ruvector-attention = "2.0.4"
metrics.rs Add DynamicPersonMatcher wrapping ruvector_mincut::DynamicMinCut; keep hungarian_assignment for deterministic proof
model.rs Add attn_mincut bridge in ModalityTranslator::forward_t; add ScaledDotProductAttention in spatial heads
dataset.rs Add CompressedCsiBuffer backed by TemporalTensorCompressor; MmFiDataset uses it
subcarrier.rs Add interpolate_subcarriers_sparse using NeumannSolver; keep interpolate_subcarriers as fallback

Files unchanged

config.rs, losses.rs, trainer.rs, proof.rs, error.rs — no change needed.

Feature gating

All ruvector integrations are always-on (not feature-gated). The ruvector crates are pure Rust with no C FFI, so they add no platform constraints.


Implementation Status

Phase Status
Cargo.toml (workspace + crate) Complete
ADR-016 documentation Complete
ruvector-mincut in metrics.rs Complete
ruvector-attn-mincut in model.rs Complete
ruvector-temporal-tensor in dataset.rs Complete
ruvector-solver in subcarrier.rs Complete
ruvector-attention in model.rs spatial decoder Complete

Consequences

Positive:

  • Subpolynomial O(n^{1.5} log n) dynamic min-cut for multi-person tracking
  • Min-cut gated attention is physically motivated for CSI antenna arrays
  • 50–75% memory reduction from temporal quantization
  • Sparse least-squares interpolation is physically principled vs linear
  • All ruvector crates are pure Rust (no C FFI, no platform restrictions)

Negative:

  • Additional compile-time dependencies (ruvector crates)
  • attn_mincut requires tensor↔Vec conversion overhead per batch element
  • TemporalTensorCompressor adds compression/decompression latency on dataset load
  • NeumannSolver requires diagonally dominant matrices; a sparse Tikhonov regularization term (λI) is added to ensure convergence

References