torrust
diff --git a/‎Cargo.lock‎
Lines changed: 672 additions & 35 deletions b/‎Cargo.lock‎
Lines changed: 672 additions & 35 deletions
diff --git a/‎Cargo.toml‎
Lines changed: 1 addition & 1 deletion b/‎Cargo.toml‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎packages/sentinel/Cargo.toml‎
Lines changed: 39 additions & 0 deletions b/‎packages/sentinel/Cargo.toml‎
Lines changed: 39 additions & 0 deletions
diff --git a/‎packages/sentinel/README.md‎
Lines changed: 281 additions & 0 deletions b/‎packages/sentinel/README.md‎
Lines changed: 281 additions & 0 deletions
diff --git a/‎packages/sentinel/adr/001-measures-not-opinions.md‎
Lines changed: 54 additions & 0 deletions b/‎packages/sentinel/adr/001-measures-not-opinions.md‎
Lines changed: 54 additions & 0 deletions
@@ -1,5 +1,5 @@
 [workspace]
-members = [".", "packages/render-text-as-image", "packages/mudlark"]
+members = [".", "packages/render-text-as-image", "packages/mudlark", "packages/sentinel"]
 
 [package]
 default-run = "torrust-index"
 
@@ -0,0 +1,39 @@
+[package]
+categories = ["algorithms", "network-programming"]
+description = "Hierarchical online subspace anomaly detection for positionally structured observation streams."
+keywords = ["anomaly-detection", "online-learning", "spectral", "streaming", "subspace"]
+name = "torrust-sentinel"
+readme = "README.md"
+version = "0.1.0"
+
+authors.workspace = true
+documentation.workspace = true
+edition.workspace = true
+homepage.workspace = true
+license.workspace = true
+publish.workspace = true
+repository.workspace = true
+rust-version.workspace = true
+
+[lints]
+workspace = true
+
+[features]
+serde = ["dep:serde", "torrust-mudlark/serde"]
+
+[dependencies]
+faer = "0"
+rand = "0.10"
+rand_distr = "0.6"
+serde = { version = "1", features = ["derive"], optional = true }
+torrust-mudlark = { path = "../mudlark", default-features = false, features = ["dynamic-contour-tracking"] }
+tracing = "0"
+
+[dev-dependencies]
+criterion = { version = "0", features = ["html_reports"] }
+serde_json = "1"
+tracing-subscriber = { version = "0.3", features = ["registry", "env-filter"] }
+
+[[bench]]
+harness = false
+name = "sentinel"
@@ -0,0 +1,281 @@
+# Spectral Sentinel
+
+Hierarchical online subspace anomaly spectrometer for positionally structured observation streams, generic over coordinate type, accumulator type, and domain bit-width.
+
+The sentinel maintains low-rank linear subspace models of "normal" traffic per analysis cell and scores each incoming batch against those models using streaming thin SVD with exponential forgetting. It produces raw statistical measurements — never opinions, threat levels, or recommended actions.
+
+**The sentinel measures; the host decides.**
+
+## Use Case
+
+The sentinel analyses **hierarchical positional structure** in coordinate values — leading bits define coarse groupings and successive bits refine them. The default instantiation (`Sentinel128`) targets 128-bit domains such as IPv6 addresses; `Sentinel64` covers 64-bit domains.
+
+Pseudo-random values (cryptographic hashes, UUIDs, nonces) have no exploitable bit-positional structure and will not produce meaningful results.
+
+## Quick Start
+
+```rust
+use torrust_sentinel::Sentinel128;
+use torrust_sentinel::config::SentinelConfig;
+
+// Configure — small analysis budget for a demo
+let cfg = SentinelConfig::<u64> {
+    analysis_k: 4,
+    ..SentinelConfig::default()
+};
+
+// Create — the root tracker is automatically warmed with synthetic noise
+let mut sentinel = Sentinel128::new(cfg).unwrap();
+
+// Ingest observations
+let values: Vec<u128> = vec![
+    0xF000_0000_0000_0000_0000_0000_0000_0001,
+    0xF000_0000_0000_0000_0000_0000_0000_0002,
+    0x1000_0000_0000_0000_0000_0000_0000_0003,
+];
+let report = sentinel.ingest(&values);
+
+// Inspect results — root cell is always an ancestor
+for cell in report.cell_reports.iter().chain(report.ancestor_reports.iter()) {
+    let s = &cell.scores;
+    println!(
+        "cell depth {} [{:#034x}, {:#034x}) — novelty z={:.2}, displacement z={:.2}",
+        cell.depth, cell.start, cell.end,
+        s.novelty.max_z_score, s.displacement.max_z_score,
+    );
+}
+```
+
+## Architecture
+
+The sentinel implements a three-layer adaptive architecture backed by the G-V Graph spatial substrate (see [algorithm.md](docs/algorithm.md) for the full specification and [implementation.md](docs/implementation.md) for implementation status).
+
+### Three-Layer Design (§ALGO S-1.1)
+
+```
+Layer 1: G-V Graph  ──  pure volume tracking, Δ = 1 always
+         Adaptive spatial partitioning of [0, 2^N)
+         Competitive ranking by traffic volume
+              │
+              │ V-Tree depth ≤ cutoff → top-K selection
+              ▼
+Layer 2: Analysis Selector  ──  picks competitive cells, closes under G-ancestry
+              │
+              │ suffix bit vectors at every ancestor depth
+              ▼
+Layer 3: Analysis Engine  ──  SubspaceTrackers at competitive + ancestor cells
+              │                Hierarchical coordination (G-tree bottom-up)
+              ▼
+         BatchReport<C> → host
+```
+
+**G-V Graph spatial substrate.** The sentinel owns a `GvGraph<C, V, N>` that adaptively partitions the full `[0, 2^N)` domain. Each observation feeds the graph with Δ = 1 (pure volume counting — no feedback from anomaly scores). The graph splits, evicts, and rebalances cells autonomously. The default instantiation (`Sentinel128`) uses `GvGraph<u128, u64, 128>`; see [ADR-S-018](adr/018-generic-domain-parameters.md) for the generic parameter design.
+
+**Analysis selector (Layer 2).** After each observation pass, the analysis set is recomputed: the top `analysis_k` V-Tree entries by importance (with V-depth ≤ `analysis_depth_cutoff`) are selected as competitive cells, then closed under G-tree ancestry. Each cell in the full analysis set gets a `SubspaceTracker` at suffix width `w = N − depth`.
+
+**Coordination.** After per-cell scoring, hierarchical coordination contexts at G-tree internal nodes analyse cross-cell score patterns bottom-up to detect spatially coordinated anomalies that no single cell would flag. Coordination fires at a G-node when both its left and right subtrees contribute competitive cell scores.
+
+### Core Loop (Per Tracker)
+
+Each `SubspaceTracker` processes batches in five strictly ordered phases — **scoring precedes evolution** so the batch is always measured against the prior model:
+
+1. **Score** — project onto learned subspace, compute four anomaly axes
+2. **Evolve Subspace** — streaming thin SVD with exponential forgetting (λ)
+3. **Evolve Latent Distribution** — EWMA mean, variance, second-moment matrix
+4. **Update Baselines & CUSUM** — fast/slow EWMA per axis, one-sided Page's test
+5. **Adapt Rank** — energy-threshold rank selection, ±1 step per evaluation
+
+## The Four Scoring Axes
+
+All axes satisfy a **polarity invariant**: higher values = more anomalous.
+
+| Axis | Formula | Measures | Range |
+|------|---------|----------|-------|
+| **Novelty** | ‖residual‖² / (w − k) | Unexplained structure outside the learned subspace | [0, ∞) |
+| **Displacement** | ‖z‖² / (k + ‖z‖²) | Distance from the subspace centroid | [0, 1) |
+| **Surprise** | mean diagonal Mahalanobis | Per-dimension magnitude deviation | [0, ∞) |
+| **Coherence** | mean squared cross-product deviation | Unusual pairwise co-activation patterns | [0, ∞) |
+
+Together they decompose the full covariance structure (total energy, diagonal, off-diagonal) without assembling or inverting a dense matrix.
+
+## Baseline Tracking & Drift Detection
+
+Each scoring axis maintains three components:
+
+- **Fast EWMA** (decay λ) — running mean and variance for instantaneous z-scores, with upper-tail outlier filtering to resist baseline poisoning
+- **Slow EWMA** (decay λ_s > λ) — long-memory reference for CUSUM
+- **CUSUM accumulator** — one-sided Page's test detecting sustained upward drift of batch means from the slow baseline
+
+The dual-EWMA design avoids frozen checkpoints that would require manual resets after legitimate regime changes. The slow baseline adapts automatically, just slowly enough to catch attacks before absorption.
+
+## Configuration
+
+```rust
+use torrust_sentinel::config::NoiseSchedule;
+
+SentinelConfig {
+    max_rank: 16,             // rank ceiling per tracker
+    forgetting_factor: 0.99,  // EWMA λ — half-life ~69 batches
+    rank_update_interval: 100,// batches between rank adaptation
+    analysis_k: 1024,         // max competitive analysis cells (§ALGO S-13.2)
+    analysis_depth_cutoff: 6, // V-Tree depth cutoff for eligibility (§ALGO S-13.2)
+    energy_threshold: 0.90,   // cumulative variance target
+    eps: 1e-6,                // numerical stability
+    cusum_slow_decay: 0.999,  // slow EWMA λ_s — half-life ~693 batches
+    cusum_coord_slow_decay: 0.999,
+    cusum_allowance_sigmas: 0.5, // CUSUM noise tolerance (κ_σ)
+    clip_sigmas: 3.0,         // outlier clip width in σ units
+    clip_pressure_decay: 0.95, // clip-pressure EWMA decay (λ_ρ, §ALGO S-6.4)
+    per_sample_scores: false, // per-observation detail (expensive)
+    split_threshold: 100,     // G-V Graph: min observations before cell splits
+    d_create: 3,              // G-V Graph: max V-Tree depth for new splits
+    d_evict: 6,               // G-V Graph: min V-Tree depth for eviction
+    budget: 100_000,          // G-V Graph: hard ceiling on live G-node count
+    noise_schedule: NoiseSchedule::default(), // depth-tiered noise rounds (ADR-S-015)
+    noise_batch_size: 16,     // samples per synthetic noise batch
+    noise_seed: Some(42),     // deterministic RNG seed (None = system entropy)
+    background_warming: false, // warm cells on a background thread (ADR-S-017)
+    svd_strategy: Default::default(), // Brand's incremental SVD (ADR-S-016)
+}
+```
+
+Key tuning knobs:
+
+| Parameter | Effect |
+|-----------|--------|
+| `analysis_k` | Resource ceiling for analysis tier. Total trackers bounded by `2 × analysis_k` (Steiner bound). |
+| `analysis_depth_cutoff` | Only V-entries that have risen above this depth are eligible. Prevents ephemeral cells from entering the analysis set. |
+| `forgetting_factor` | Memory length. Lower = faster adaptation, shorter memory. |
+| `max_rank` | Model expressiveness ceiling. Higher = richer model, more memory. |
+| `energy_threshold` | How much variance the rank must capture. Higher → rank grows. |
+| `split_threshold` | G-V Graph cell split sensitivity. Lower → finer spatial resolution faster. |
+| `d_create` / `d_evict` | G-V Graph depth gates. Control tree growth and eviction eligibility. |
+| `budget` | G-V Graph hard ceiling on live nodes. Prevents unbounded spatial growth. |
+| `noise_schedule` | Depth-tiered noise injection schedule. `Geometric { root, decay, min }` or `Explicit(vec![...])`. Replaces the former flat `noise_rounds` (ADR-S-015). |
+| `noise_batch_size` | Samples per synthetic noise batch. |
+| `noise_seed` | Deterministic RNG seed for reproducibility (`None` = system entropy). |
+| `clip_pressure_decay` | Clip-pressure EWMA decay factor (λ_ρ). Controls how quickly per-axis clip-pressure adapts. Default 0.95 (§ALGO S-6.4). |
+| `background_warming` | Warm new cells on a background thread (`true`) or synchronously during `ingest()` (`false`, default). Production deployments should enable. |
+| `svd_strategy` | `Brand` (default, ~2–3× faster) or `Naive` (dense thin SVD). In debug builds both run as an oracle test (ADR-S-016). |
+
+## Automatic Noise Warm-Up
+
+Every newly created tracker is **automatically warmed** with synthetic noise before receiving real observations (§ALGO S-11.2, [ADR-S-007](adr/007-automatic-noise-injection.md)). No manual injection API exists — the sentinel owns the injection lifecycle entirely.
+
+- **Root tracker** — warmed at construction.
+- **New analysis cells** — enqueued into a staging area and warmed via a deferred pipeline ([ADR-S-017](adr/017-deferred-cell-warm-up.md)). Warm-up runs synchronously by default or on a background thread when `background_warming` is enabled.
+- **Coordination contexts** — warmed with Gamma-sampled synthetic score vectors (§ALGO S-9.8) when first activated.
+
+Noise parameters are configured via `SentinelConfig`:
+
+```rust
+use torrust_sentinel::config::NoiseSchedule;
+
+SentinelConfig {
+    // Depth-tiered: root gets 450 rounds, deeper cells taper via 0.5× decay, min 50.
+    noise_schedule: NoiseSchedule::default(),
+    noise_batch_size: 16,   // samples per synthetic batch (default)
+    noise_seed: Some(42),   // deterministic RNG seed (default)
+    ..
+}
+```
+
+The `NoiseSchedule` enum supports two variants:
+- `Geometric { root, decay, min }` — `rounds(d) = max(min, root × decay^d)`. Default: `{ root: 450, decay: 0.5, min: 50 }`.
+- `Explicit(Vec<u32>)` — per-depth round counts; last entry repeats for deeper cells.
+
+Each tracker reports a `noise_influence` (η) value that decays exponentially with real observations: η = λⁿ after n real batches. The host should treat scores from trackers with η > 0.5 as preliminary.
+
+## Report Structure
+
+`ingest()` returns a `BatchReport`:
+
+```
+BatchReport
+├── cell_reports: [CellReport]               // competitive cells only
+│   ├── gnode_id, start, end, depth, analysis_width
+│   ├── is_competitive (true), sample_count
+│   ├── rank, energy_ratio, top_singular_value
+│   ├── scores: AnomalyScores                // four axes with z-scores, baselines, CUSUM
+│   ├── maturity: TrackerMaturity             // η, observation counts
+│   ├── geometry: ScoringGeometry             // which axes are structurally active
+│   └── per_sample: Option<[SampleScore]>     // if per_sample_scores enabled
+│
+├── ancestor_reports: [CellReport]           // ancestor-only cells (incl. root)
+│
+├── coordination_reports: [CoordinationReport]  // hierarchical cross-cell analysis
+│   ├── gnode_id, start, end, depth, cells_reporting
+│   ├── rank, energy_ratio, top_singular_value
+│   ├── scores: AnomalyScores
+│   ├── maturity: TrackerMaturity
+│   ├── geometry: ScoringGeometry
+│   └── per_member: Option<[MemberScore]>    // per-cell identity + scores (if per_sample_scores)
+│
+├── contour: ContourSnapshot                 // spatial structure summary
+│   ├── plateau_count, cell_count
+│   └── total_importance
+│
+├── health: HealthReport                     // inline health snapshot
+│   ├── total_g_nodes, semi_internal_count, active_trackers
+│   ├── active_competitive_trackers, active_ancestor_trackers
+│   ├── active_coordination_contexts
+│   ├── investment_set_size, warming_trackers, warming_competitive_targets
+│   ├── lifetime_observations, cells_tracked
+│   ├── rank_distribution, maturity_distribution
+│   ├── geometry_distribution, coordination_health
+│   └── clip_pressure_distribution
+│
+└── analysis_set_summary: AnalysisSetSummary  // analysis set overview
+    ├── competitive_size, full_size, investment_set_size
+    ├── depth_range, importance_range, v_depth_range
+    └── degenerate_cells_skipped
+```
+
+Each `AnomalyScores` contains per-axis `ScoreDistribution` with min/max/mean raw scores, z-scores against the fast baseline, baseline snapshots, and CUSUM accumulator state.
+
+## Public API
+
+| Method | Description |
+|--------|-------------|
+| `SpectralSentinel::new(config)` | Create and validate a new sentinel |
+| `.ingest(&[u128])` | Process a batch, return `BatchReport` |
+| `.health()` | Snapshot of tracker counts, rank distributions, maturity |
+| `.inspect_cell(gnode)` | Deep inspection of a single cell's tracker |
+| `.cell_gnodes()` | List all cell `GNodeId`s in the full analysis set |
+| `.cells_tracked()` | Number of cells in the full analysis set |
+| `.lifetime_observations()` | Total real observations processed across the sentinel's lifetime |
+| `.degenerate_cells_skipped()` | Number of cells excluded for suffix width below `MIN_TRACKER_DIM` (ADR-S-011) |
+| `.config()` | Read-only access to the configuration |
+| `.analysis_set()` | Read-only access to the current analysis set |
+| `.graph()` | Read-only access to the G-V Graph spatial substrate |
+| `.decay(attenuation, q)` | Apply temporal decay to the entire G-V Graph |
+| `.decay_subtree(gnode, att, q)` | Apply temporal decay to a subtree |
+| `.reset()` | Destroy all state, return to fresh |
+
+## Features
+
+| Feature | Effect |
+|---------|--------|
+| `serde` | Enables `Serialize`/`Deserialize` on all config and report types |
+
+## Resource Considerations
+
+The total tracker count is bounded by `2 × analysis_k` (competitive + ancestor cells). At the default `analysis_k = 1024`, this is at most 2,048 trackers. Each tracker uses ~14–24 KB depending on suffix width.
+
+The G-V Graph's `budget` parameter caps the total number of live G-nodes. The `analysis_k` and `analysis_depth_cutoff` parameters control how many of those nodes get analysis trackers.
+
+## Documentation
+
+- [docs/algorithm.md](docs/algorithm.md) — full algorithm specification
+- [docs/implementation.md](docs/implementation.md) — implementation guide (source layout, architecture, design decisions)
+
+## Known Divergences from Spec
+
+The implementation conforms to the full specification. All architectural
+phases and the Chapter 14 report structure are complete.
+
+The deferred cell warm-up (ADR-S-017) is implemented: the staging area
+infrastructure and background warming thread are complete, and
+`NoiseSchedule` depth-tiered schedule (ADR-S-015) is in use. Timing
+protection (S2 adaptive pad, S3 equalization) is out of scope
+(§ALGO S-18.5).
@@ -0,0 +1,54 @@
+# ADR-S-001: Measures Not Opinions
+
+**Status:** Implemented
+**Date:** 2026-03-08
+**Spec:** §ALGO S-1.2 (layer responsibilities — "sentinel measures; host decides")
+
+## Context
+
+An anomaly detector can either:
+
+- **A)** Output raw statistical measurements and let the consumer
+  decide what they mean (library approach).
+- **B)** Output verdicts — threat levels, recommended actions,
+  block/allow decisions (appliance approach).
+
+The sentinel is a library embedded inside the Torrust Index. Different
+hosts have different risk tolerances, different action vocabularies
+(ban, throttle, flag, ignore), and different false-positive
+consequences. Baking policy into the sentinel would force every host
+into one policy model.
+
+## Decision
+
+**The sentinel outputs only raw statistical measurements. It never
+outputs opinions, threat levels, or recommended actions.**
+
+Concretely:
+
+- `BatchReport` contains `AnomalyScores` (per-axis mean, max, z-score,
+  CUSUM), `ScoringGeometry` (ADR-S-008), `TrackerMaturity`, rank,
+  energy ratios.
+- No field is named "threat", "risk", "anomaly_level", or "action".
+- No method returns a boolean "is anomalous" verdict.
+- No internal threshold triggers automatic remediation.
+
+The host reads the report and applies its own policy:
+
+```rust
+// Host policy — not sentinel code:
+if report.scores.novelty.z_score > 4.0 && report.maturity.noise_influence < 0.1 {
+    throttle(cell_id);
+}
+```
+
+## Consequences
+
+- The sentinel has no policy parameters (no "alert threshold", no
+  "sensitivity level").
+- Report types carry more fields than an appliance would expose, but
+  each field has a precise statistical definition.
+- Integration tests assert statistical properties, not verdicts.
+- Higher polarity = more anomalous is a uniform convention across
+  all four scoring axes (§ALGO S-6), ensuring the host can apply a
+  single threshold logic to any axis.