Skip to content

Latest commit

 

History

History
825 lines (701 loc) · 50.8 KB

File metadata and controls

825 lines (701 loc) · 50.8 KB

ADR-023: Trained DensePose Model with RuVector Signal Intelligence Pipeline

Field Value
Status Proposed
Date 2026-02-28
Deciders ruv
Relates to ADR-003 (RVF Cognitive Containers), ADR-005 (SONA Self-Learning), ADR-015 (Public Dataset Strategy), ADR-016 (RuVector Integration), ADR-017 (RuVector-Signal-MAT), ADR-020 (Rust AI Migration), ADR-021 (Vital Sign Detection)

Context

The Gap Between Sensing and DensePose

The WiFi-DensePose system currently operates in two distinct modes:

  1. WiFi CSI sensing (working): ESP32 streams CSI frames → Rust aggregator → feature extraction → presence/motion classification. 41 tests passing, verified at ~20 Hz with real hardware.

  2. Heuristic pose derivation (working but approximate): The Rust sensing server generates 17 COCO keypoints from WiFi signal properties using hand-crafted rules (derive_pose_from_sensing() in sensing-server/src/main.rs). This is not a trained model — keypoint positions are derived from signal amplitude, phase variance, and motion metrics rather than learned from labeled data.

Neither mode produces DensePose-quality body surface estimation. The CMU "DensePose From WiFi" paper (arXiv:2301.00250) demonstrated that a neural network trained on paired WiFi CSI + camera pose data can produce dense body surface UV coordinates from WiFi alone. However, that approach requires:

  • Environment-specific training: The model must be trained or fine-tuned for each deployment environment because CSI multipath patterns are environment-dependent.
  • Paired training data: Simultaneous WiFi CSI captures + ground-truth pose annotations (or a camera-based teacher model generating pseudo-labels).
  • Substantial compute: Training a modality translation network + DensePose head requires GPU time (hours to days depending on dataset size).

What Exists in the Codebase

The Rust workspace already has the complete model architecture ready for training:

Component Crate File Status
WiFiDensePoseModel wifi-densepose-train model.rs Implemented (random weights)
ModalityTranslator wifi-densepose-train model.rs Implemented with RuVector attention
KeypointHead wifi-densepose-train model.rs Implemented (17 COCO heatmaps)
DensePoseHead wifi-densepose-nn densepose.rs Implemented (25 parts + 48 UV)
WiFiDensePoseLoss wifi-densepose-train losses.rs Implemented (keypoint + part + UV + transfer)
MmFiDataset loader wifi-densepose-train dataset.rs Planned (ADR-015)
WiFiDensePosePipeline wifi-densepose-nn inference.rs Implemented (generic over Backend)
Training proof verification wifi-densepose-train proof.rs Implemented (deterministic hash)
Subcarrier resampling (114→56) wifi-densepose-train subcarrier.rs Planned (ADR-016)

RuVector Crates Available

The vendor/ruvector/ subtree provides 90+ crates. The following are directly relevant to a trained DensePose pipeline:

Already integrated (5 crates, ADR-016):

Crate Algorithm Current Use
ruvector-mincut Subpolynomial dynamic min-cut O(n^{o(1)}) Multi-person assignment in metrics.rs
ruvector-attn-mincut Attention-gated min-cut Noise-suppressed spectrogram in model.rs
ruvector-attention Scaled dot-product + geometric attention Spatial decoder in model.rs
ruvector-solver Sparse Neumann solver O(√n) Subcarrier resampling in subcarrier.rs
ruvector-temporal-tensor Tiered temporal compression CSI frame buffering in dataset.rs

Newly proposed for DensePose pipeline (6 additional crates):

Crate Description Proposed Use
ruvector-gnn Graph neural network on HNSW topology Spatial body-graph reasoning
ruvector-graph-transformer Proof-gated graph transformer (8 modules) CSI-to-pose cross-attention
ruvector-sparse-inference PowerInfer-style sparse inference engine Edge deployment with neuron activation sparsity
ruvector-sona Self-Optimizing Neural Architecture (LoRA + EWC++) Online environment adaptation
ruvector-fpga-transformer FPGA-optimized transformer Hardware-accelerated inference path
ruvector-math Optimal transport, information geometry Domain adaptation loss functions

RVF Container Format

The RuVector Format (RVF) is a segment-based binary container format designed to package intelligence artifacts — embeddings, HNSW indexes, quantized weights, WASM runtimes, witness proofs, and metadata — into a single self-contained file. Key properties:

  • 64-byte segment headers (SegmentHeader, magic 0x52564653 "RVFS") with type discriminator, content hash, compression, and timestamp
  • Progressive loading: Layer A (entry points, <5ms) → Layer B (hot adjacency, 100ms–1s) → Layer C (full graph, seconds)
  • 20+ segment types: Vec (embeddings), Index (HNSW), Overlay (min-cut witnesses), Quant (codebooks), Witness (proof-of-computation), Wasm (self-bootstrapping runtime), Dashboard (embedded UI), AggregateWeights (federated SONA deltas), Crypto (Ed25519 signatures), and more
  • Temperature-tiered quantization (rvf-quant): f32 / f16 / u8 / binary per-segment, with SIMD-accelerated distance computation
  • AGI Cognitive Container (agi_container.rs): packages kernel + WASM + world model + orchestrator + evaluation harness + witness chains into a single deployable file

The trained DensePose model will be packaged as an .rvf container, making it a single self-contained artifact that includes model weights, HNSW-indexed embedding tables, min-cut graph overlays, quantization codebooks, SONA adaptation deltas, and the WASM inference runtime — deployable to any host without external dependencies.

Decision

Implement a fully trained DensePose model using RuVector signal intelligence as the backbone signal processing layer, packaged in the RVF container format. The pipeline has three stages: (1) offline training on public datasets, (2) teacher-student distillation for DensePose UV labels, and (3) online SONA adaptation for environment-specific fine-tuning. The trained model, its embeddings, indexes, and adaptation state are serialized into a single .rvf file.

Architecture Overview

┌─────────────────────────────────────────────────────────────────────────────┐
│                    TRAINED DENSEPOSE PIPELINE                                │
│                                                                             │
│  ┌─────────────┐    ┌──────────────────────┐    ┌──────────────────────┐   │
│  │ ESP32 CSI    │    │  RuVector Signal      │    │  Trained Neural      │   │
│  │ Raw I/Q      │───▶│  Intelligence Layer   │───▶│  Network             │   │
│  │ [ant×sub×T]  │    │  (preprocessing)      │    │  (inference)         │   │
│  └─────────────┘    └──────────────────────┘    └──────────────────────┘   │
│                              │                           │                   │
│                    ┌─────────┴─────────┐       ┌────────┴────────┐         │
│                    │ 5 RuVector crates  │       │ 6 RuVector      │         │
│                    │ (signal processing)│       │ crates (neural) │         │
│                    └───────────────────┘       └─────────────────┘         │
│                                                        │                    │
│                              ┌──────────────────────────┘                   │
│                              ▼                                              │
│                    ┌──────────────────────────────────────┐                 │
│                    │              Outputs                   │                 │
│                    │  • 17 COCO keypoints [B,17,H,W]       │                 │
│                    │  • 25 body parts     [B,25,H,W]       │                 │
│                    │  • 48 UV coords      [B,48,H,W]       │                 │
│                    │  • Confidence scores                   │                 │
│                    └──────────────────────────────────────┘                 │
└─────────────────────────────────────────────────────────────────────────────┘

Stage 1: RuVector Signal Preprocessing Layer

Raw CSI frames from ESP32 (56–192 subcarriers × N antennas × T time frames) are processed through the RuVector signal intelligence stack before entering the neural network. This replaces hand-crafted feature extraction with learned, graph-aware preprocessing.

Raw CSI [ant, sub, T]
    │
    ▼
┌─────────────────────────────────────────────────────┐
│  1. ruvector-attn-mincut: gate_spectrogram()        │
│     Input:  Q=amplitude, K=phase, V=combined        │
│     Effect: Suppress multipath noise, keep motion-  │
│             relevant subcarrier paths                │
│     Output: Gated spectrogram [ant, sub', T]        │
├─────────────────────────────────────────────────────┤
│  2. ruvector-mincut: mincut_subcarrier_partition()   │
│     Input:  Subcarrier coherence graph               │
│     Effect: Partition into sensitive (motion-         │
│             responsive) vs insensitive (static)      │
│     Output: Partition mask + per-subcarrier weights   │
├─────────────────────────────────────────────────────┤
│  3. ruvector-attention: attention_weighted_bvp()     │
│     Input:  Gated spectrogram + partition weights    │
│     Effect: Compute body velocity profile with       │
│             sensitivity-weighted attention            │
│     Output: BVP feature vector [D_bvp]               │
├─────────────────────────────────────────────────────┤
│  4. ruvector-solver: solve_fresnel_geometry()        │
│     Input:  Amplitude + known TX/RX positions        │
│     Effect: Estimate TX-body-RX ellipsoid distances  │
│     Output: Fresnel geometry features [D_fresnel]    │
├─────────────────────────────────────────────────────┤
│  5. ruvector-temporal-tensor: compress + buffer      │
│     Input:  Temporal CSI window (100 frames)         │
│     Effect: Tiered quantization (hot/warm/cold)      │
│     Output: Compressed tensor, 50-75% memory saving  │
└─────────────────────────────────────────────────────┘
    │
    ▼
Feature tensor [B, T*tx*rx, sub] (preprocessed, noise-suppressed)

Stage 2: Neural Network Architecture

The neural network follows the CMU teacher-student architecture with RuVector enhancements at three critical points.

2a. ModalityTranslator (CSI → Visual Feature Space)

CSI features [B, T*tx*rx, sub]
    │
    ├──amplitude──┐
    │              ├─► Encoder (Conv1D stack, 64→128→256)
    └──phase──────┘         │
                            ▼
              ┌──────────────────────────────┐
              │  ruvector-graph-transformer   │
              │                              │
              │  Treat antenna-pair×time as  │
              │  graph nodes. Edges connect  │
              │  spatially adjacent antenna  │
              │  pairs and temporally        │
              │  adjacent frames.            │
              │                              │
              │  Proof-gated attention:      │
              │  Each layer verifies that    │
              │  attention weights satisfy   │
              │  physical constraints        │
              │  (Fresnel ellipsoid bounds)  │
              └──────────────────────────────┘
                            │
                            ▼
              Decoder (ConvTranspose2d stack, 256→128→64→3)
                            │
                            ▼
              Visual features [B, 3, 48, 48]

RuVector enhancement: Replace standard multi-head self-attention in the bottleneck with ruvector-graph-transformer. The graph structure encodes the physical antenna topology — nodes that are closer in space (adjacent ESP32 nodes in the mesh) or time (consecutive frames) have stronger edge weights. This injects domain-specific inductive bias that standard attention lacks.

2b. GNN Body Graph Reasoning

Visual features [B, 3, 48, 48]
    │
    ▼
ResNet18 backbone → feature maps [B, 256, 12, 12]
    │
    ▼
┌─────────────────────────────────────────┐
│  ruvector-gnn: Body Graph Network       │
│                                         │
│  17 COCO keypoints as graph nodes       │
│  Edges: anatomical connections          │
│  (shoulder→elbow, hip→knee, etc.)       │
│                                         │
│  GNN message passing (3 rounds):        │
│  h_i^{l+1} = σ(W·h_i^l + Σ_j α_ij·h_j)│
│  α_ij = attention(h_i, h_j, edge_ij)   │
│                                         │
│  Enforces anatomical constraints:       │
│  - Limb length ratios                   │
│  - Joint angle limits                   │
│  - Left-right symmetry priors           │
└─────────────────────────────────────────┘
    │
    ├──────────────────┬──────────────────┐
    ▼                  ▼                  ▼
KeypointHead      DensePoseHead     ConfidenceHead
[B,17,H,W]       [B,25+48,H,W]     [B,1]
heatmaps          parts + UV         quality score

RuVector enhancement: ruvector-gnn replaces the flat spatial decoder with a graph neural network that operates on the human body graph. WiFi CSI is inherently noisy — GNN message passing between anatomically connected joints enforces that predicted keypoints maintain plausible body structure even when individual joint predictions are uncertain.

2c. Sparse Inference for Edge Deployment

Trained model weights (full precision)
    │
    ▼
┌─────────────────────────────────────────────┐
│  ruvector-sparse-inference                   │
│                                              │
│  PowerInfer-style activation sparsity:       │
│  - Profile neuron activation frequency       │
│  - Partition into hot (always active, 20%)   │
│    and cold (conditionally active, 80%)      │
│  - Hot neurons: GPU/SIMD fast path           │
│  - Cold neurons: sparse lookup on demand     │
│                                              │
│  Quantization:                               │
│  - Backbone: INT8 (4x memory reduction)      │
│  - DensePose head: FP16 (2x reduction)       │
│  - ModalityTranslator: FP16                  │
│                                              │
│  Target: <50ms inference on ESP32-S3         │
│          <10ms on x86 with AVX2              │
└─────────────────────────────────────────────┘

Stage 3: Training Pipeline

3a. Dataset Loading and Preprocessing

Primary dataset: MM-Fi (NeurIPS 2023) — 40 subjects, 27 actions, 114 subcarriers, 3 RX antennas, 17 COCO keypoints + DensePose UV annotations.

Secondary dataset: Wi-Pose — 12 subjects, 12 actions, 30 subcarriers, 3×3 antenna array, 18 keypoints.

┌──────────────────────────────────────────────────────────┐
│  Data Loading Pipeline                                    │
│                                                          │
│  MM-Fi .npy ──► Resample 114→56 subcarriers ──┐         │
│                (ruvector-solver NeumannSolver)  │         │
│                                                ├──► Batch│
│  Wi-Pose .mat ──► Zero-pad 30→56 subcarriers ──┘  [B,T*│
│                                                    ant, │
│  Phase sanitize ──► Hampel filter ──► unwrap        sub] │
│  (wifi-densepose-signal::phase_sanitizer)                │
│                                                          │
│  Temporal buffer ──► ruvector-temporal-tensor             │
│  (100 frames/sample, tiered quantization)                │
└──────────────────────────────────────────────────────────┘

3b. Teacher-Student DensePose Labels

For samples with 3D keypoints but no DensePose UV maps:

  1. Run Detectron2 DensePose R-CNN on paired RGB frames (one-time preprocessing step on GPU workstation)
  2. Generate (part_labels [H,W], u_coords [H,W], v_coords [H,W]) pseudo-labels
  3. Cache as .npy alongside original data
  4. Teacher model is discarded after label generation — inference uses WiFi only

3c. Loss Function

L_total = λ_kp  · L_keypoint      // MSE on predicted vs GT heatmaps
        + λ_part · L_part          // Cross-entropy on 25-class body part segmentation
        + λ_uv   · L_uv           // Smooth L1 on UV coordinate regression
        + λ_xfer · L_transfer     // MSE between CSI features and teacher visual features
        + λ_ot   · L_ot           // Optimal transport regularization (ruvector-math)
        + λ_graph · L_graph       // GNN edge consistency loss (ruvector-gnn)

RuVector enhancement: ruvector-math provides optimal transport (Wasserstein distance) as a regularization term. This penalizes predicted body part distributions that are far from the ground truth in the Wasserstein metric, which is more geometrically meaningful than pixel-wise cross-entropy for spatial body part segmentation.

3d. Training Configuration

Parameter Value Rationale
Optimizer AdamW Weight decay regularization
Learning rate 1e-3, cosine decay to 1e-5 Standard for modality translation
Batch size 32 Fits in 24GB GPU VRAM
Epochs 100 With early stopping (patience=15)
Warmup 5 epochs Linear LR warmup
Train/val split Subjects 1-32 / 33-40 Subject-disjoint for generalization
Augmentation Time-shift ±5 frames, amplitude noise ±2dB, antenna dropout 10% CSI-domain augmentations
Hardware Single RTX 3090 or A100 ~8 hours on A100
Checkpoint Every epoch, keep best-by-validation-PCK Deterministic seed

3e. Metrics

Metric Target Description
PCK@0.2 >70% on MM-Fi val Percentage of correct keypoints (threshold = 0.2 × torso diameter)
OKS mAP >0.50 on MM-Fi val Object Keypoint Similarity, COCO-standard
DensePose GPS >0.30 on MM-Fi val Geodesic Point Similarity for UV accuracy
Inference latency <50ms per frame On x86 with ONNX Runtime
Model size <25MB (FP16) Suitable for edge deployment

Stage 4: Online Adaptation with SONA

After offline training produces a base model, SONA enables continuous adaptation to new environments without retraining from scratch.

┌──────────────────────────────────────────────────────────┐
│  SONA Online Adaptation Loop                              │
│                                                          │
│  Base model (frozen weights W)                           │
│       │                                                  │
│       ▼                                                  │
│  ┌──────────────────────────────────┐                    │
│  │  LoRA Adaptation Matrices        │                    │
│  │  W_effective = W + α · A·B       │                    │
│  │                                  │                    │
│  │  Rank r=4 for translator layers  │                    │
│  │  Rank r=2 for backbone layers    │                    │
│  │  Rank r=8 for DensePose head     │                    │
│  │                                  │                    │
│  │  Total trainable params: ~50K    │                    │
│  │  (vs ~5M frozen base)            │                    │
│  └──────────────────────────────────┘                    │
│       │                                                  │
│       ▼                                                  │
│  ┌──────────────────────────────────┐                    │
│  │  EWC++ Regularizer               │                    │
│  │  L = L_task + λ·Σ F_i(θ-θ*)²    │                    │
│  │                                  │                    │
│  │  Prevents forgetting base model  │                    │
│  │  knowledge when adapting to new  │                    │
│  │  environment                     │                    │
│  └──────────────────────────────────┘                    │
│       │                                                  │
│       ▼                                                  │
│  Adaptation triggers:                                    │
│  • First deployment in new room                          │
│  • PCK drops below threshold (drift detection)           │
│  • User manually initiates calibration                   │
│  • Furniture/layout change detected (CSI baseline shift) │
│                                                          │
│  Adaptation data:                                        │
│  • Self-supervised: temporal consistency loss             │
│    (pose at t should be similar to t-1 for slow motion)  │
│  • Semi-supervised: user confirmation of presence/count  │
│  • Optional: brief camera calibration session (5 min)    │
│                                                          │
│  Convergence: 10-50 gradient steps, <5 seconds on CPU    │
└──────────────────────────────────────────────────────────┘

Stage 5: Inference Pipeline (Production)

ESP32 CSI (UDP :5005)
    │
    ▼
Rust Axum server (port 8080)
    │
    ├─► RuVector signal preprocessing (Stage 1)
    │       5 crates, ~2ms per frame
    │
    ├─► ONNX Runtime inference (Stage 2)
    │       Quantized model, ~10ms per frame
    │       OR ruvector-sparse-inference, ~8ms per frame
    │
    ├─► GNN post-processing (ruvector-gnn)
    │       Anatomical constraint enforcement, ~1ms
    │
    ├─► SONA adaptation check (Stage 4)
    │       <0.05ms per frame (gradient accumulation only)
    │
    └─► Output: DensePose results
            │
            ├──► /api/v1/stream/pose (WebSocket, 17 keypoints)
            ├──► /api/v1/pose/current (REST, full DensePose)
            └──► /ws/sensing (WebSocket, raw + processed)

Total inference budget: <15ms per frame at 20 Hz on x86, <50ms on ESP32-S3 (with sparse inference).

Stage 6: RVF Model Container Format

The trained model is packaged as a single .rvf file that contains everything needed for inference — no external weight files, no ONNX runtime, no Python dependencies.

RVF DensePose Container Layout

wifi-densepose-v1.rvf (single file, ~15-30 MB)
┌───────────────────────────────────────────────────────────────┐
│  SEGMENT 0: Manifest (0x05)                                   │
│  ├── Model ID: "wifi-densepose-v1.0"                          │
│  ├── Training dataset: "mmfi-v1+wipose-v1"                    │
│  ├── Training config hash: SHA-256                            │
│  ├── Target hardware: x86_64, aarch64, wasm32                 │
│  ├── Segment directory (offsets to all segments)               │
│  └── Level-1 TLV manifest with metadata tags                  │
├───────────────────────────────────────────────────────────────┤
│  SEGMENT 1: Vec (0x01) — Model Weight Embeddings              │
│  ├── ModalityTranslator weights [64→128→256→3, Conv1D+ConvT]  │
│  ├── ResNet18 backbone weights [3→64→128→256, residual blocks] │
│  ├── KeypointHead weights [256→17, deconv layers]             │
│  ├── DensePoseHead weights [256→25+48, deconv layers]         │
│  ├── GNN body graph weights [3 message-passing rounds]        │
│  └── Graph transformer attention weights [proof-gated layers] │
│  Format: flat f32 vectors, 768-dim per weight tensor          │
│  Total: ~5M parameters → ~20MB f32, ~10MB f16, ~5MB INT8     │
├───────────────────────────────────────────────────────────────┤
│  SEGMENT 2: Index (0x02) — HNSW Embedding Index               │
│  ├── Layer A: Entry points + coarse routing centroids          │
│  │   (loaded first, <5ms, enables approximate search)         │
│  ├── Layer B: Hot region adjacency for frequently             │
│  │   accessed weight clusters (100ms load)                    │
│  └── Layer C: Full adjacency graph for exact nearest          │
│      neighbor lookup across all weight partitions             │
│  Use: Fast weight lookup for sparse inference —               │
│  only load hot neurons, skip cold neurons via HNSW routing    │
├───────────────────────────────────────────────────────────────┤
│  SEGMENT 3: Overlay (0x03) — Dynamic Min-Cut Graph            │
│  ├── Subcarrier partition graph (sensitive vs insensitive)     │
│  ├── Min-cut witnesses from ruvector-mincut                   │
│  ├── Antenna topology graph (ESP32 mesh spatial layout)       │
│  └── Body skeleton graph (17 COCO joints, 16 edges)           │
│  Use: Pre-computed graph structures loaded at init time.       │
│  Dynamic updates via ruvector-mincut insert/delete_edge       │
│  as environment changes (furniture moves, new obstacles)      │
├───────────────────────────────────────────────────────────────┤
│  SEGMENT 4: Quant (0x06) — Quantization Codebooks             │
│  ├── INT8 codebook for backbone (4x memory reduction)         │
│  ├── FP16 scale factors for translator + heads                │
│  ├── Binary quantization tables for SIMD distance compute     │
│  └── Per-layer calibration statistics (min, max, zero-point)  │
│  Use: rvf-quant temperature-tiered quantization —             │
│  hot layers stay f16, warm layers u8, cold layers binary      │
├───────────────────────────────────────────────────────────────┤
│  SEGMENT 5: Witness (0x0A) — Training Proof Chain             │
│  ├── Deterministic training proof (seed, loss curve, hash)    │
│  ├── Dataset provenance (MM-Fi commit hash, download URL)     │
│  ├── Validation metrics (PCK@0.2, OKS mAP, GPS scores)       │
│  ├── Ed25519 signature over weight hash                       │
│  └── Attestation: training hardware, duration, config         │
│  Use: Verifiable proof that model weights match a specific    │
│  training run. Anyone can re-run training with same seed      │
│  and verify the weight hash matches the witness.              │
├───────────────────────────────────────────────────────────────┤
│  SEGMENT 6: Meta (0x07) — Model Metadata                      │
│  ├── COCO keypoint names and skeleton connectivity            │
│  ├── DensePose body part labels (24 parts + background)       │
│  ├── UV coordinate range and resolution                       │
│  ├── Input normalization statistics (mean, std per subcarrier)│
│  ├── RuVector crate versions used during training             │
│  └── Environment calibration profiles (named, per-room)       │
├───────────────────────────────────────────────────────────────┤
│  SEGMENT 7: AggregateWeights (0x36) — SONA LoRA Deltas        │
│  ├── Per-environment LoRA adaptation matrices (A, B per layer)│
│  ├── EWC++ Fisher information diagonal                        │
│  ├── Optimal θ* reference parameters                          │
│  ├── Adaptation round count and convergence metrics           │
│  └── Named profiles: "lab-a", "living-room", "office-3f"     │
│  Use: Multiple environment adaptations stored in one file.    │
│  Server loads the matching profile or creates a new one.      │
├───────────────────────────────────────────────────────────────┤
│  SEGMENT 8: Profile (0x0B) — RVDNA Domain Profile             │
│  ├── Domain: "wifi-csi-densepose"                             │
│  ├── Input spec: [B, T*ant, sub] CSI tensor format            │
│  ├── Output spec: keypoints [B,17,H,W], parts [B,25,H,W],    │
│  │   UV [B,48,H,W], confidence [B,1]                         │
│  ├── Hardware requirements: min RAM, recommended GPU          │
│  └── Supported data sources: esp32, wifi-rssi, simulation    │
├───────────────────────────────────────────────────────────────┤
│  SEGMENT 9: Crypto (0x0C) — Signature and Keys                │
│  ├── Ed25519 public key for model publisher                   │
│  ├── Signature over all segment content hashes                │
│  └── Certificate chain (optional, for enterprise deployment)  │
├───────────────────────────────────────────────────────────────┤
│  SEGMENT 10: Wasm (0x10) — Self-Bootstrapping Runtime         │
│  ├── Compiled WASM inference engine                           │
│  │   (ruvector-sparse-inference-wasm)                         │
│  ├── WASM microkernel for RVF segment parsing                 │
│  └── Browser-compatible: load .rvf → run inference in-browser │
│  Use: The .rvf file is fully self-contained — a WASM host     │
│  can execute inference without any external dependencies.     │
├───────────────────────────────────────────────────────────────┤
│  SEGMENT 11: Dashboard (0x11) — Embedded Visualization        │
│  ├── Three.js-based pose visualization (HTML/JS/CSS)          │
│  ├── Gaussian splat renderer for signal field                 │
│  └── Served at http://localhost:8080/ when model is loaded    │
│  Use: Open the .rvf file → get a working UI with no install  │
└───────────────────────────────────────────────────────────────┘

RVF Loading Sequence

1. Read tail → find_latest_manifest() → SegmentDirectory
2. Load Manifest (seg 0) → validate magic, version, model ID
3. Load Profile (seg 8) → verify input/output spec compatibility
4. Load Crypto (seg 9) → verify Ed25519 signature chain
5. Load Quant (seg 4) → prepare quantization codebooks
6. Load Index Layer A (seg 2) → entry points ready (<5ms)
       ↓ (inference available at reduced accuracy)
7. Load Vec (seg 1) → hot weight partitions via Layer A routing
8. Load Index Layer B (seg 2) → hot adjacency ready (100ms)
       ↓ (inference at full accuracy for common poses)
9. Load Overlay (seg 3) → min-cut graphs, body skeleton
10. Load AggregateWeights (seg 7) → apply matching SONA profile
11. Load Index Layer C (seg 2) → complete graph loaded
       ↓ (full inference with all weight partitions)
12. Load Wasm (seg 10) → WASM runtime available (optional)
13. Load Dashboard (seg 11) → UI served (optional)

Progressive availability: Inference begins after step 6 (~5ms) with approximate results. Full accuracy is reached by step 9 (~500ms). This enables instant startup with gradually improving quality — critical for real-time applications.

RVF Build Pipeline

After training completes, the model is packaged into an .rvf file:

# Build the RVF container from trained checkpoint
cargo run -p wifi-densepose-train --bin build-rvf -- \
    --checkpoint checkpoints/best-pck.pt \
    --quantize int8,fp16 \
    --hnsw-build \
    --sign --key model-signing-key.pem \
    --include-wasm \
    --include-dashboard ../../ui \
    --output wifi-densepose-v1.rvf

# Verify the built container
cargo run -p wifi-densepose-train --bin verify-rvf -- \
    --input wifi-densepose-v1.rvf \
    --verify-signature \
    --verify-witness \
    --benchmark-inference

RVF Runtime Integration

The sensing server loads the .rvf container at startup:

# Load model from RVF container
./target/release/sensing-server \
    --model wifi-densepose-v1.rvf \
    --source auto \
    --ui-from-rvf  # serve Dashboard segment instead of --ui-path
// In sensing-server/src/main.rs
use rvf_runtime::RvfContainer;
use rvf_index::layers::IndexLayer;
use rvf_quant::QuantizedVec;

let container = RvfContainer::open("wifi-densepose-v1.rvf")?;

// Progressive load: Layer A first for instant startup
let index = container.load_index(IndexLayer::A)?;
let weights = container.load_vec_hot(&index)?;  // hot partitions only

// Full load in background
tokio::spawn(async move {
    container.load_index(IndexLayer::B).await?;
    container.load_index(IndexLayer::C).await?;
    container.load_vec_cold().await?;  // remaining partitions
});

// SONA environment adaptation
let sona_deltas = container.load_aggregate_weights("office-3f")?;
model.apply_lora_deltas(&sona_deltas);

// Serve embedded dashboard
let dashboard = container.load_dashboard()?;
// Mount at /ui/* routes in Axum

Implementation Plan

Phase 1: Dataset Loaders (2 weeks)

  • Implement MmFiDataset in wifi-densepose-train/src/dataset.rs
  • Read MM-Fi .npy files with antenna correction (1TX/3RX → 3×3 zero-padding)
  • Subcarrier resampling 114→56 via ruvector-solver::NeumannSolver
  • Phase sanitization via wifi-densepose-signal::phase_sanitizer
  • Implement WiPoseDataset for secondary dataset
  • Temporal windowing with ruvector-temporal-tensor
  • Deliverable: cargo test -p wifi-densepose-train with dataset loading tests

Phase 2: Graph Transformer Integration (2 weeks)

  • Add ruvector-graph-transformer dependency to wifi-densepose-train
  • Replace bottleneck self-attention in ModalityTranslator with proof-gated graph transformer
  • Build antenna topology graph (nodes = antenna pairs, edges = spatial/temporal proximity)
  • Add ruvector-gnn dependency for body graph reasoning
  • Build COCO body skeleton graph (17 nodes, 16 anatomical edges)
  • Implement GNN message passing in spatial decoder
  • Deliverable: Model forward pass produces correct output shapes with graph layers

Phase 3: Teacher-Student Label Generation (1 week)

  • Python script using Detectron2 DensePose to generate UV pseudo-labels from MM-Fi RGB frames
  • Cache labels as .npy for Rust loader consumption
  • Validate label quality on a random subset (visual inspection)
  • Deliverable: Complete UV label set for MM-Fi training split

Phase 4: Training Loop (3 weeks)

  • Implement WiFiDensePoseTrainer with full loss function (6 terms)
  • Add ruvector-math optimal transport loss term
  • Integrate GNN edge consistency loss
  • Training loop with cosine LR schedule, early stopping, checkpointing
  • Validation metrics: PCK@0.2, OKS mAP, DensePose GPS
  • Deterministic proof verification (proof.rs) with weight hash
  • Deliverable: Trained model checkpoint achieving PCK@0.2 >70% on MM-Fi validation

Phase 5: SONA Online Adaptation (2 weeks)

  • Integrate ruvector-sona into inference pipeline
  • Implement LoRA injection at translator, backbone, and DensePose head layers
  • Implement EWC++ Fisher information computation and regularization
  • Self-supervised temporal consistency loss for unsupervised adaptation
  • Calibration mode: 5-minute camera session for supervised fine-tuning
  • Drift detection: monitor rolling PCK on temporal consistency proxy
  • Deliverable: Adaptation converges in <50 gradient steps, PCK recovers within 10% of base

Phase 6: Sparse Inference and Edge Deployment (2 weeks)

  • Profile neuron activation frequencies on validation set
  • Apply ruvector-sparse-inference hot/cold neuron partitioning
  • INT8 quantization for backbone, FP16 for heads
  • ONNX export with quantized weights
  • Benchmark on x86 (target: <10ms) and ARM (target: <50ms)
  • WASM export via ruvector-sparse-inference-wasm for browser inference
  • Deliverable: Quantized ONNX model, benchmark results, WASM binary

Phase 7: RVF Container Build Pipeline (2 weeks)

  • Implement build-rvf binary in wifi-densepose-train
  • Serialize trained weights into Vec segment (SegmentType::Vec, 0x01)
  • Build HNSW index over weight partitions for sparse inference (SegmentType::Index, 0x02)
  • Serialize min-cut graph overlays: subcarrier partition, antenna topology, body skeleton (SegmentType::Overlay, 0x03)
  • Generate quantization codebooks via rvf-quant (SegmentType::Quant, 0x06)
  • Write training proof witness with Ed25519 signature (SegmentType::Witness, 0x0A)
  • Store model metadata, COCO keypoint schema, normalization stats (SegmentType::Meta, 0x07)
  • Store SONA LoRA adaptation deltas per environment (SegmentType::AggregateWeights, 0x36)
  • Write RVDNA domain profile for WiFi CSI DensePose (SegmentType::Profile, 0x0B)
  • Optionally embed WASM inference runtime (SegmentType::Wasm, 0x10)
  • Optionally embed Three.js dashboard (SegmentType::Dashboard, 0x11)
  • Build Level-1 manifest and segment directory (SegmentType::Manifest, 0x05)
  • Implement verify-rvf binary for container validation
  • Deliverable: wifi-densepose-v1.rvf single-file container, verifiable and self-contained

Phase 8: Integration with Sensing Server (1 week)

  • Load .rvf container in wifi-densepose-sensing-server via rvf-runtime
  • Progressive loading: Layer A first for instant startup, full graph in background
  • Replace derive_pose_from_sensing() heuristic with trained model inference
  • Add --model CLI flag accepting .rvf path (or legacy .onnx)
  • Apply SONA LoRA deltas from AggregateWeights segment based on --env flag
  • Serve embedded Dashboard segment at /ui/* when --ui-from-rvf is set
  • Graceful fallback to heuristic when no model file present
  • Update WebSocket protocol to include DensePose UV data
  • Deliverable: Sensing server serves trained model from single .rvf file

File Changes

New Files

File Purpose
rust-port/.../wifi-densepose-train/src/dataset_mmfi.rs MM-Fi dataset loader with subcarrier resampling
rust-port/.../wifi-densepose-train/src/dataset_wipose.rs Wi-Pose dataset loader
rust-port/.../wifi-densepose-train/src/graph_transformer.rs Graph transformer integration
rust-port/.../wifi-densepose-train/src/body_gnn.rs GNN body graph reasoning
rust-port/.../wifi-densepose-train/src/adaptation.rs SONA LoRA + EWC++ adaptation
rust-port/.../wifi-densepose-train/src/trainer.rs Training loop with multi-term loss
scripts/generate_densepose_labels.py Teacher-student UV label generation
scripts/benchmark_inference.py Inference latency benchmarking
rust-port/.../wifi-densepose-train/src/rvf_builder.rs RVF container build pipeline
rust-port/.../wifi-densepose-train/src/bin/build_rvf.rs CLI binary for building .rvf containers
rust-port/.../wifi-densepose-train/src/bin/verify_rvf.rs CLI binary for verifying .rvf containers

Modified Files

File Change
rust-port/.../wifi-densepose-train/Cargo.toml Add ruvector-gnn, graph-transformer, sona, sparse-inference, math, rvf-types, rvf-wire, rvf-manifest, rvf-index, rvf-quant, rvf-crypto, rvf-runtime deps
rust-port/.../wifi-densepose-train/src/model.rs Integrate graph transformer + GNN layers
rust-port/.../wifi-densepose-train/src/losses.rs Add optimal transport + GNN edge consistency loss terms
rust-port/.../wifi-densepose-train/src/config.rs Add training hyperparameters for new components
rust-port/.../sensing-server/Cargo.toml Add rvf-runtime, rvf-types, rvf-index, rvf-quant deps
rust-port/.../sensing-server/src/main.rs Add --model flag, load .rvf container, progressive startup, serve embedded dashboard

Consequences

Positive

  • Trained model produces accurate DensePose: Moves from heuristic keypoints to learned body surface estimation backed by public dataset evaluation
  • RuVector signal intelligence is a differentiator: Graph transformers on antenna topology and GNN body reasoning are novel — no prior WiFi pose system uses these techniques
  • SONA enables zero-shot deployment: New environments don't require full retraining — LoRA adaptation with <50 gradient steps converges in seconds
  • Sparse inference enables edge deployment: PowerInfer-style neuron partitioning brings DensePose inference to ESP32-class hardware
  • Graceful degradation: Server falls back to heuristic pose when no model file is present — existing functionality is preserved
  • Single-file deployment via RVF: Trained model, embeddings, HNSW index, quantization codebooks, SONA adaptation profiles, WASM runtime, and dashboard UI packaged in one .rvf file — deploy by copying a single file
  • Progressive loading: RVF Layer A loads in <5ms for instant startup; full accuracy reached in ~500ms as remaining segments load
  • Verifiable provenance: RVF Witness segment contains deterministic training proof with Ed25519 signature — anyone can re-run training and verify weight hash
  • Self-bootstrapping: RVF Wasm segment enables browser-based inference with no server-side dependencies
  • Open evaluation: PCK, OKS, GPS metrics on public MM-Fi dataset provide reproducible, comparable results

Negative

  • Training requires GPU: Initial model training needs RTX 3090 or better (~8 hours on A100). Not all developers will have access.
  • Teacher-student label generation requires Detectron2: One-time Python + CUDA dependency for generating UV pseudo-labels from RGB frames
  • MM-Fi CC BY-NC license: Weights trained on MM-Fi cannot be used commercially without collecting proprietary data
  • Environment-specific adaptation still required: SONA reduces the burden but a brief calibration session in each new environment is still recommended for best accuracy
  • 6 additional RuVector crate dependencies: Increases compile time and binary size. Mitigated by feature flags (e.g., --features trained-model).
  • Model size on disk: ~25MB (FP16) or ~12MB (INT8). Acceptable for server deployment, may need further pruning for WASM.

Risks and Mitigations

Risk Mitigation
MM-Fi 114→56 interpolation loses accuracy Train at native 114 as alternative; ESP32 mesh can collect 56-sub data natively
GNN overfits to training body types Augment with diverse body proportions; Wi-Pose adds subject diversity
SONA adaptation diverges in adversarial environments EWC++ regularization caps parameter drift; rollback to base weights on detection
Sparse inference degrades accuracy Benchmark INT8 vs FP16 vs FP32; fall back to full precision if quality drops
Training proof hash changes with RuVector version updates Pin ruvector crate versions in Cargo.toml; regenerate hash on version bumps

References

  • Geng et al., "DensePose From WiFi" (CMU, arXiv:2301.00250, 2023)
  • Yang et al., "MM-Fi: Multi-Modal Non-Intrusive 4D Human Dataset" (NeurIPS 2023, arXiv:2305.10345)
  • Hu et al., "LoRA: Low-Rank Adaptation of Large Language Models" (ICLR 2022)
  • Kirkpatrick et al., "Overcoming Catastrophic Forgetting in Neural Networks" (PNAS, 2017)
  • Song et al., "PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU" (2024)
  • ADR-005: SONA Self-Learning for Pose Estimation
  • ADR-015: Public Dataset Strategy for Trained Pose Estimation Model
  • ADR-016: RuVector Integration for Training Pipeline
  • ADR-020: Migrate AI/Model Inference to Rust with RuVector and ONNX Runtime

Appendix A: RuQu Consideration

ruQu ("Classical nervous system for quantum machines") provides real-time coherence assessment via dynamic min-cut. While primarily designed for quantum error correction (syndrome decoding, surface code arbitration), its core primitive — the CoherenceGate — is architecturally relevant to WiFi CSI processing:

  • CoherenceGate uses ruvector-mincut to make real-time gate/pass decisions on signal streams based on structural coherence thresholds. In quantum computing, this gates qubit syndrome streams. For WiFi CSI, the same mechanism could gate CSI subcarrier streams — passing only subcarriers whose coherence (phase stability across antennas) exceeds a dynamic threshold.

  • Syndrome filtering (filters.rs) implements Kalman-like adaptive filters that could be repurposed for CSI noise filtering — treating each subcarrier's amplitude drift as a "syndrome" stream.

  • Min-cut gated transformer integration (optional feature) provides coherence-optimized attention with 50% FLOP reduction — directly applicable to the ModalityTranslator bottleneck.

Decision: ruQu is not included in the initial pipeline (Phase 1-8) but is marked as a Phase 9 exploration candidate for coherence-gated CSI filtering. The CoherenceGate primitive maps naturally to subcarrier quality assessment, and the integration path is clean since ruQu already depends on ruvector-mincut.

Appendix B: Training Data Strategy

The pipeline supports three data sources for training, used in combination:

Source Subcarriers Pose Labels Volume Cost When
MM-Fi (public) 114 → 56 (interpolated) 17 COCO + DensePose UV 40 subjects, 320K frames Free (CC BY-NC) Phase 1 — bootstrap
Wi-Pose (public) 30 → 56 (zero-padded) 18 keypoints 12 subjects, 166K packets Free (research) Phase 1 — diversity
ESP32 self-collected 56 (native) Teacher-student from camera Unlimited, environment-specific Hardware only ($54) Phase 4+ — fine-tuning

Recommended approach: Both public + ESP32 data.

  1. Pre-train on MM-Fi + Wi-Pose (public data, Phase 1-4): Provides the base model with diverse subjects and actions. The 114→56 subcarrier interpolation is acceptable for learning general CSI-to-pose mappings.

  2. Fine-tune on ESP32 self-collected data (Phase 5+, SONA adaptation): Collect 5-30 minutes of paired ESP32 CSI + camera data in each target environment. The camera serves as the teacher model (Detectron2 generates pseudo-labels). SONA LoRA adaptation takes <50 gradient steps to converge.

  3. Continuous adaptation (runtime): SONA's self-supervised temporal consistency loss refines the model without any camera, using the assumption that poses change smoothly over short time windows.

This three-tier strategy gives you:

  • A working model from day one (public data)
  • Environment-specific accuracy (ESP32 fine-tuning)
  • Ongoing drift correction (SONA runtime adaptation)