diff --git a/website/docs/proposals/hallucination-mitigation-milestone.md b/website/docs/proposals/hallucination-mitigation-milestone.md
new file mode 100644
index 000000000..d2ce7ce4c
--- /dev/null
+++ b/website/docs/proposals/hallucination-mitigation-milestone.md
@@ -0,0 +1,1382 @@
+# TruthLens: A Cognitive Immune System for Real-Time Hallucination Detection and Mitigation in Large Language Models
+
+**Version:** 1.0
+**Authors:** vLLM Semantic Router Team
+**Date:** December 2025
+
+---
+
+## Abstract
+
+Large Language Models (LLMs) have demonstrated remarkable capabilities, yet their tendency to generate hallucinations—fluent but factually incorrect or ungrounded content—remains a critical barrier to enterprise AI adoption. Industry surveys consistently show that hallucination risks are among the top concerns preventing organizations from deploying LLM-powered applications in production environments, particularly in high-stakes domains such as healthcare, finance, and legal services.
+
+We propose **TruthLens**, a real-time hallucination detection and mitigation framework integrated into the vLLM Semantic Router. By positioning hallucination control at the inference gateway layer, TruthLens provides a model-agnostic, centralized solution that addresses the "accuracy-latency-cost" triangle through configurable mitigation strategies. Users can select from three operational modes based on their tolerance for cost and accuracy trade-offs: (1) **Lightweight Mode**—single-round detection with warning injection, (2) **Standard Mode**—iterative self-refinement with the same model, and (3) **Premium Mode**—multi-model cross-verification and collaborative correction. This design enables organizations to deploy trustworthy AI systems while maintaining control over operational costs and response latency.
+
+---
+
+## 1. Introduction: The Hallucination Crisis in Enterprise AI
+
+### 1.1 The Core Problem
+
+Hallucinations represent the most significant barrier to enterprise AI adoption today. Unlike traditional software bugs, LLM hallucinations are:
+
+- **Unpredictable**: They occur randomly across different queries and contexts
+- **Convincing**: Hallucinated content often appears fluent, confident, and plausible
+- **High-stakes**: A single hallucination in medical, legal, or financial domains can cause irreversible harm
+- **Invisible**: Without specialized detection, users cannot distinguish hallucinations from accurate responses
+
+**Industry Impact by Domain:**
+
+| Domain | Hallucination Risk Tolerance | Typical Mitigation Approach |
+|--------|------------------------------|----------------------------|
+| Healthcare | Near-zero (life-critical) | Mandatory human verification, liability concerns |
+| Financial Services | Very low (regulatory) | Compliance-driven review processes |
+| Legal | Very low (liability) | Restricted to internal research and drafting |
+| Customer Support | Moderate | Escalation protocols for uncertain responses |
+| Creative/Marketing | High tolerance | Minimal intervention required |
+
+*Note: Based on enterprise deployment patterns observed across industry surveys (McKinsey 2024, Gartner 2024, Menlo Ventures 2024).*
+
+### 1.2 Why Existing Solutions Fall Short
+
+Current approaches to hallucination mitigation operate at the wrong layer of the AI stack:
+
+```mermaid
+flowchart TB
+ subgraph "Current Approaches (Fragmented)"
+ direction TB
+ A[RAG/Grounding] -->|Pre-generation| B[Reduces but doesn't eliminate]
+ C[Fine-tuning] -->|Training time| D[Expensive, model-specific]
+ E[Prompt Engineering] -->|Per-application| F[Inconsistent, no guarantees]
+ G[Offline Evaluation] -->|Post-hoc| H[Cannot prevent real-time harm]
+ end
+
+ subgraph "The Gap"
+ I[❌ No real-time detection at inference]
+ J[❌ No centralized control point]
+ K[❌ No cost-aware mitigation options]
+ end
+```
+
+### 1.3 Why vLLM Semantic Router is the Ideal Solution Point
+
+The vLLM Semantic Router occupies a unique position in the AI infrastructure stack that makes it ideally suited for hallucination mitigation:
+
+```mermaid
+flowchart LR
+ subgraph "Client Applications"
+ APP1[App 1]
+ APP2[App 2]
+ APP3[App N]
+ end
+
+ subgraph "vLLM Semantic Router"
+ direction TB
+ GW[Unified Gateway]
+
+ subgraph "Existing Capabilities"
+ SEC[Security Layer
PII, Jailbreak]
+ ROUTE[Intelligent Routing
Model Selection]
+ CACHE[Semantic Cache
Cost Optimization]
+ end
+
+ subgraph "NEW: TruthLens"
+ HALL[Hallucination
Detection & Mitigation]
+ end
+ end
+
+ subgraph "LLM Backends"
+ LLM1[GPT-4]
+ LLM2[Claude]
+ LLM3[Llama]
+ LLM4[Mistral]
+ end
+
+ APP1 --> GW
+ APP2 --> GW
+ APP3 --> GW
+
+ GW --> SEC
+ SEC --> ROUTE
+ ROUTE --> CACHE
+ CACHE --> HALL
+
+ HALL --> LLM1
+ HALL --> LLM2
+ HALL --> LLM3
+ HALL --> LLM4
+```
+
+**Key Advantages of Gateway-Level Hallucination Control:**
+
+| Advantage | Description |
+|-----------|-------------|
+| **Model-Agnostic** | Works with any LLM backend without modification |
+| **Centralized Policy** | Single configuration point for all applications |
+| **Cost Control** | Organization-wide visibility into accuracy vs. cost trade-offs |
+| **Incremental Adoption** | Enable per-decision, per-domain policies |
+| **Observability** | Unified metrics, logging, and alerting for hallucination events |
+| **Defense in Depth** | Complements (not replaces) RAG and prompt engineering |
+
+### 1.4 Formal Problem Definition
+
+We formalize hallucination detection in Retrieval-Augmented Generation (RAG) systems as a **token-level sequence labeling** problem.
+
+**Definition 1 (RAG Context).** Let a RAG interaction be defined as a tuple *(C, Q, R)* where:
+
+- *C = \{c₁, c₂, ..., cₘ\}* is the retrieved context (set of documents/passages)
+- *Q* is the user query
+- *R = (r₁, r₂, ..., rₙ)* is the generated response as a sequence of *n* tokens
+
+**Definition 2 (Grounded vs. Hallucinated Tokens).** A token *rᵢ* in response *R* is:
+
+- **Grounded** if there exists evidence in *C* that supports the claim containing *rᵢ*
+- **Hallucinated** if *rᵢ* contributes to a claim that:
+ - (a) Contradicts information in *C* (contradiction hallucination), or
+ - (b) Cannot be verified from *C* and is not common knowledge (ungrounded hallucination)
+
+**Definition 3 (Hallucination Detection Function).** The detection task is to learn a function:
+
+*f: (C, Q, R) → Y*
+
+where *Y = (y₁, y₂, ..., yₙ)* and *yᵢ ∈ \{0, 1\}* indicates whether token *rᵢ* is hallucinated.
+
+**Definition 4 (Hallucination Score).** Given predictions *Y* and confidence scores *P = (p₁, ..., pₙ)* where *pᵢ = P(yᵢ = 1)*, we define:
+
+- **Token-level score**: *s_token(rᵢ) = pᵢ*
+- **Span-level score**: For a contiguous span *S = (rᵢ, ..., rⱼ)*, *s_span(S) = max(pᵢ, ..., pⱼ)*
+- **Response-level score**: *s_response(R) = 1 - ∏(1 - pᵢ)* for all *i* where *pᵢ > τ_token*
+
+**Definition 5 (Mitigation Decision).** Given threshold *τ*, the system takes action:
+
+```text
+Action(R) =
+ PASS if s_response(R) < τ
+ MITIGATE if s_response(R) ≥ τ
+```
+
+---
+
+## 2. Related Work: State-of-the-Art in Hallucination Mitigation
+
+### 2.1 Taxonomy of Hallucination Types
+
+Before reviewing detection methods, we establish a taxonomy of hallucination types:
+
+**Type 1: Intrinsic Hallucination** — Generated content contradicts the provided context.
+
+*Example*: Context says "The meeting is on Tuesday." Response says "The meeting is scheduled for Wednesday."
+
+**Type 2: Extrinsic Hallucination** — Generated content cannot be verified from the context and is not common knowledge.
+
+*Example*: Context discusses a company's Q3 earnings. Response includes Q4 projections not mentioned anywhere.
+
+**Type 3: Fabrication** — Entirely invented entities, citations, or facts.
+
+*Example*: "According to Smith et al. (2023)..." where no such paper exists.
+
+| Type | Detection Difficulty | Mitigation Approach |
+|------|---------------------|---------------------|
+| Intrinsic | Easier (direct contradiction) | Context re-grounding |
+| Extrinsic | Medium (requires knowledge boundary) | Uncertainty expression |
+| Fabrication | Harder (requires external verification) | Cross-reference checking |
+
+### 2.2 Detection Methods
+
+| Category | Representative Work | Mechanism | Accuracy | Latency | Cost |
+|----------|---------------------|-----------|----------|---------|------|
+| **Encoder-Based** | LettuceDetect (2025), Luna (2025) | Token classification with ModernBERT/DeBERTa | F1: 75-79% | 15-35ms | Low |
+| **Self-Consistency** | SelfCheckGPT (2023) | Multiple sampling + consistency check | Varies | Nx base | High |
+| **Cross-Model** | Finch-Zk (2025) | Multi-model response comparison | F1: +6-39% | 2-3x base | High |
+| **Internal States** | MIND (ACL 2024) | Hidden layer activation analysis | High | \<10ms | Requires instrumentation |
+
+#### 2.2.1 Encoder-Based Detection (Deep Dive)
+
+**LettuceDetect** (Kovács et al., 2025) frames hallucination detection as **token-level sequence labeling**:
+
+- **Architecture**: ModernBERT-large (395M parameters) with classification head
+- **Input**: Concatenated [Context, Query, Response] with special tokens
+- **Output**: Per-token probability of hallucination
+- **Training**: Fine-tuned on RAGTruth dataset (18K examples)
+- **Key Innovation**: Long-context handling (8K tokens) enables full RAG context inclusion
+
+**Performance on RAGTruth Benchmark**:
+
+| Model | Token F1 | Example F1 | Latency |
+|-------|----------|------------|---------|
+| LettuceDetect-large | 79.22% | 74.8% | ~30ms |
+| LettuceDetect-base | 76.5% | 71.2% | ~15ms |
+| Luna (DeBERTa) | 73.1% | 68.9% | ~25ms |
+| GPT-4 (zero-shot) | 61.2% | 58.4% | ~2s |
+
+**Why Encoder-Based for TruthLens**: The combination of high accuracy, low latency, and fixed cost makes encoder-based detection ideal for gateway-level deployment.
+
+#### 2.2.2 Self-Consistency Methods
+
+**SelfCheckGPT** (Manakul et al., 2023) exploits the observation that hallucinations are inconsistent across samples:
+
+- **Mechanism**: Generate N responses, measure consistency
+- **Intuition**: Factual content is reproducible; hallucinations vary
+- **Limitation**: Requires N LLM calls (typically N=5-10)
+
+**Theoretical Basis**: If *P(fact)* is high, the fact appears in most samples. If *P(hallucination)* is low per-sample, it rarely repeats.
+
+#### 2.2.3 Cross-Model Verification
+
+**Finch-Zk** (2025) leverages model diversity:
+
+- **Mechanism**: Compare responses from different model families
+- **Key Insight**: Different models hallucinate differently
+- **Segment-Level Correction**: Replace inconsistent segments with higher-confidence version
+
+### 2.3 Mitigation Strategies
+
+| Strategy | Representative Work | Mechanism | Effectiveness | Overhead |
+|----------|---------------------|-----------|---------------|----------|
+| **Self-Refinement** | Self-Refine (NeurIPS 2023) | Iterative feedback loop | 40-60% reduction | 2-4x latency |
+| **Chain-of-Verification** | CoVe (ACL 2024) | Generate verification questions | 50-70% reduction | 3-5x latency |
+| **Multi-Agent Debate** | MAD (2024) | Multiple agents argue and converge | 60-80% reduction | 5-10x latency |
+| **Cross-Model Correction** | Finch-Zk (2025) | Targeted segment replacement | Up to 9% accuracy gain | 3x latency |
+
+#### 2.3.1 Self-Refinement (Deep Dive)
+
+**Self-Refine** (Madaan et al., NeurIPS 2023) demonstrates that LLMs can improve their own outputs:
+
+```text
+Loop:
+ 1. Generate initial response R₀
+ 2. Generate feedback F on R₀ (same model)
+ 3. Generate refined response R₁ using F
+ 4. Repeat until convergence or max iterations
+```
+
+**Key Findings**:
+
+- Works best when feedback is **specific** (not just "improve this")
+- Diminishing returns after 2-3 iterations
+- Requires the model to have the knowledge to correct itself
+
+**Limitation for Hallucination**: If the model lacks the correct knowledge, self-refinement may not help or may introduce new errors.
+
+#### 2.3.2 Chain-of-Verification (CoVe)
+
+**CoVe** (Dhuliawala et al., ACL 2024) generates verification questions:
+
+```text
+1. Generate response R
+2. Extract factual claims from R
+3. For each claim, generate verification question
+4. Answer verification questions using context
+5. Revise R based on verification results
+```
+
+**Advantage**: Explicit verification step catches subtle errors.
+**Disadvantage**: High latency (3-5x) due to multi-step process.
+
+#### 2.3.3 Multi-Agent Debate
+
+**Multi-Agent Debate** (Du et al., 2024) uses multiple LLM instances:
+
+```text
+1. Multiple agents generate responses
+2. Agents critique each other's responses
+3. Agents revise based on critiques
+4. Repeat for N rounds
+5. Synthesize final response
+```
+
+**Theoretical Advantage**: Diverse perspectives catch blind spots.
+**Practical Challenge**: High cost (5-10x) and latency.
+
+### 2.3 The Accuracy-Latency-Cost Triangle
+
+Research consistently shows a fundamental trade-off:
+
+```mermaid
+graph TD
+ subgraph "The Trade-off Triangle"
+ ACC[🎯 Accuracy]
+ LAT[⚡ Latency]
+ COST[💰 Cost]
+
+ ACC ---|Trade-off| LAT
+ LAT ---|Trade-off| COST
+ COST ---|Trade-off| ACC
+ end
+
+ subgraph "Strategy Positioning"
+ L[Lightweight Mode
⚡💰 Fast & Cheap
🎯 Moderate Accuracy]
+ S[Standard Mode
⚡🎯 Balanced
💰 Moderate Cost]
+ P[Premium Mode
🎯💰 High Accuracy
⚡ Higher Latency]
+ end
+
+ L --> LAT
+ L --> COST
+ S --> ACC
+ S --> LAT
+ P --> ACC
+```
+
+**Key Insight**: No single approach optimizes all three dimensions. TruthLens addresses this by offering **user-selectable operational modes** that let organizations choose their position on this trade-off triangle.
+
+---
+
+## 3. Theoretical Foundations
+
+This section establishes the theoretical basis for TruthLens's three-mode architecture, drawing from sequence labeling, iterative optimization, ensemble learning, and multi-agent systems theory.
+
+### 3.1 Hallucination Detection as Sequence Labeling
+
+#### 3.1.1 Token Classification Architecture
+
+Modern hallucination detection leverages transformer-based encoders fine-tuned for token classification. Given input sequence *X = [CLS] C [SEP] Q [SEP] R [SEP]*, the encoder produces contextualized representations:
+
+*H = Encoder(X) ∈ ℝ^(L×d)*
+
+where *L* is sequence length and *d* is hidden dimension. For each token *rᵢ* in the response, we compute:
+
+*P(yᵢ = 1|X) = σ(W · hᵢ + b)*
+
+where *W ∈ ℝ^d*, *b ∈ ℝ* are learned parameters and *σ* is the sigmoid function.
+
+#### 3.1.2 Why ModernBERT for Detection
+
+The choice of encoder architecture significantly impacts detection quality. We adopt ModernBERT (Warner et al., 2024) for the following theoretical advantages:
+
+| Property | ModernBERT | Traditional BERT | Impact on Detection |
+|----------|------------|------------------|---------------------|
+| **Context Length** | 8,192 tokens | 512 tokens | Handles full RAG context without truncation |
+| **Attention** | Rotary Position Embeddings (RoPE) | Absolute positional | Better long-range dependency modeling |
+| **Architecture** | GeGLU activations, no biases | GELU, with biases | Improved gradient flow for fine-grained classification |
+| **Efficiency** | Flash Attention, Unpadding | Standard attention | 2x inference speedup enables real-time detection |
+
+#### 3.1.3 Scoring Function Design
+
+The aggregation from token-level to response-level scores requires careful design. We propose a **noisy-OR** aggregation model:
+
+*s_response(R) = 1 - ∏ᵢ(1 - pᵢ · 𝟙[pᵢ > τ_token])*
+
+**Theoretical Justification**: The noisy-OR model assumes independence between hallucination events at different tokens. While this is an approximation, it provides:
+
+1. **Monotonicity**: Adding a hallucinated token never decreases the response score
+2. **Sensitivity**: Single high-confidence hallucination triggers detection
+3. **Calibration**: Score approximates *P(∃ hallucination in R)*
+
+**Alternative: Span-Based Aggregation**
+
+For correlated hallucinations (common in fabricated entities), we first group contiguous hallucinated tokens into spans, then aggregate:
+
+*s_response(R) = max\{s_span(S₁), s_span(S₂), ..., s_span(Sₖ)\}*
+
+This reduces sensitivity to tokenization artifacts and focuses on semantic units.
+
+#### 3.1.4 Threshold Selection Theory
+
+The detection threshold *τ* controls the precision-recall trade-off. From decision theory:
+
+**Proposition 1 (Optimal Threshold).** *Given cost ratio λ = C_FN / C_FP (cost of false negative vs. false positive), the optimal threshold satisfies:*
+
+*τ* = 1 / (1 + λ · (1-π)/π)*
+
+*where π is the prior probability of hallucination.*
+
+**Practical Implications**:
+
+| Domain | λ (Cost Ratio) | Recommended τ | Rationale |
+|--------|----------------|---------------|-----------|
+| Medical | 10-100 | 0.3-0.5 | Missing hallucination is catastrophic |
+| Financial | 5-20 | 0.4-0.6 | Regulatory risk from false information |
+| Customer Support | 1-2 | 0.6-0.7 | Balance user experience and accuracy |
+| Creative | 0.1-0.5 | 0.8-0.9 | Over-flagging harms creativity |
+
+### 3.2 Self-Refinement Theory
+
+#### 3.2.1 Iterative Refinement as Fixed-Point Iteration
+
+Standard Mode employs iterative self-refinement, which can be formalized as seeking a fixed point of a refinement operator.
+
+**Definition 6 (Refinement Operator).** Let *T: R → R* be the refinement operator where:
+
+*T(Rₜ) = LLM(Prompt_refine(C, Q, Rₜ, Detect(Rₜ)))*
+
+The iteration proceeds as: *R₀ → R₁ → R₂ → ... → R**
+
+**Theorem 1 (Convergence Conditions).** *The refinement sequence \{Rₜ\} converges to a fixed point R\* if:*
+
+1. *The hallucination score sequence \{s(Rₜ)\} is monotonically non-increasing*
+2. *The score is bounded below (s(R) ≥ 0)*
+3. *The LLM exhibits consistency: similar prompts yield similar outputs*
+
+**Proof Sketch**: Conditions 1 and 2 ensure the score sequence converges by the Monotone Convergence Theorem. Condition 3 (LLM consistency) ensures the response sequence itself converges, not just the scores.
+
+#### 3.2.2 Convergence Rate Analysis
+
+**Empirical Observation**: Self-refinement typically exhibits **sublinear convergence**:
+
+*s(Rₜ) - s(R*) ≤ O(1/t)*
+
+This is because:
+
+1. **Easy hallucinations** (explicit contradictions) are corrected in early iterations
+2. **Hard hallucinations** (subtle ungrounded claims) may persist or oscillate
+3. **Diminishing returns** after 2-3 iterations in practice
+
+```mermaid
+graph LR
+ subgraph "Convergence Pattern"
+ R0[R₀
s=0.8] -->|Iteration 1| R1[R₁
s=0.5]
+ R1 -->|Iteration 2| R2[R₂
s=0.35]
+ R2 -->|Iteration 3| R3[R₃
s=0.3]
+ R3 -.->|Diminishing returns| R4[R₄
s=0.28]
+ end
+```
+
+#### 3.2.3 Prompt Engineering Principles for Correction
+
+Effective refinement prompts must satisfy several theoretical properties:
+
+**Principle 1 (Specificity)**: The prompt must identify *which* spans are hallucinated, not just that hallucination exists.
+
+**Principle 2 (Grounding)**: The prompt must provide the original context *C* to enable fact-checking.
+
+**Principle 3 (Preservation)**: The prompt must instruct the model to preserve accurate content.
+
+**Principle 4 (Uncertainty)**: When correction is not possible, the model should express uncertainty rather than fabricate alternatives.
+
+**Refinement Prompt Template Structure**:
+
+```text
+Given:
+- Context: [Retrieved passages C]
+- Query: [User question Q]
+- Response: [Current response Rₜ with hallucinated spans marked]
+
+The following spans may be hallucinated: [List of (span, confidence)]
+
+Instructions:
+1. For each flagged span, verify against the context
+2. If contradicted: correct using context evidence
+3. If unverifiable and not common knowledge: remove or qualify with uncertainty
+4. Preserve all accurate, well-grounded content
+5. Maintain coherent narrative flow
+```
+
+### 3.3 Multi-Model Collaboration Theory
+
+Premium Mode leverages multiple LLMs for cross-verification. We ground this in ensemble learning and multi-agent debate theory.
+
+#### 3.3.1 Ensemble Learning Perspective
+
+**Theorem 2 (Diversity-Accuracy Trade-off).** *For an ensemble of M models with individual error rate ε and pairwise correlation ρ, the ensemble error rate under majority voting is:*
+
+*ε_ensemble ≈ ε · (1 + (M-1)ρ) / M* *when ε < 0.5*
+
+**Corollary**: Ensemble error approaches zero as M → ∞ only if ρ < 1 (models are diverse).
+
+**Implications for TruthLens**:
+
+| Model Combination | Expected Diversity (1-ρ) | Error Reduction |
+|-------------------|--------------------------|-----------------|
+| Same model family (GPT-4 variants) | Low (0.2-0.4) | 10-20% |
+| Different families (GPT-4 + Claude) | Medium (0.4-0.6) | 30-50% |
+| Different architectures (Transformer + other) | High (0.6-0.8) | 50-70% |
+
+#### 3.3.2 Multi-Agent Debate Framework
+
+Beyond simple voting, multi-agent debate enables models to **argue** about factual claims and converge on truth.
+
+**Definition 7 (Argumentation Framework).** An argumentation framework is a pair *AF = (A, →)* where:
+
+- *A* is a set of arguments (factual claims from each model)
+- *→ ⊆ A × A* is an attack relation (contradictions between claims)
+
+**Definition 8 (Grounded Extension).** The grounded extension *E* of AF is the maximal conflict-free set of arguments that defends itself against all attacks.
+
+**Multi-Agent Debate Protocol**:
+
+```mermaid
+sequenceDiagram
+ participant Q as Query
+ participant M1 as Model A
(Proponent)
+ participant M2 as Model B
(Critic)
+ participant J as Judge Model
+
+ Q->>M1: Generate response R₁
+ Q->>M2: Generate response R₂
+
+ M1->>M2: "Claim X is supported by context passage P"
+ M2->>M1: "Claim X contradicts passage Q"
+
+ loop Debate Rounds (max 3)
+ M1->>M2: Refine argument with evidence
+ M2->>M1: Counter-argument or concession
+ end
+
+ M1->>J: Final position + evidence
+ M2->>J: Final position + evidence
+ J->>Q: Synthesized response (grounded extension)
+```
+
+#### 3.3.3 Consensus Mechanisms
+
+**Mechanism 1: Majority Voting**
+
+*y_final(token) = argmax_y |\{m : f_m(token) = y\}|*
+
+- Simple, fast
+- Requires odd number of models
+- Does not account for model confidence
+
+**Mechanism 2: Weighted Confidence Aggregation**
+
+*p_final(token) = Σₘ wₘ · pₘ(token) / Σₘ wₘ*
+
+where *wₘ* is model m's calibrated reliability weight.
+
+- Accounts for varying model expertise
+- Requires calibrated confidence scores
+
+**Mechanism 3: Segment-Level Replacement (Finch-Zk)**
+
+For each claim segment *S* in response *R₁*:
+
+1. Check if *S* appears (semantically) in *R₂*
+2. If consistent: keep *S*
+3. If inconsistent: replace with version from more reliable model
+4. If only in *R₁*: flag as potentially hallucinated
+
+This mechanism achieves fine-grained correction without full response regeneration.
+
+### 3.4 Theoretical Justification for Three-Mode Architecture
+
+#### 3.4.1 Pareto Frontier Analysis
+
+The Accuracy-Latency-Cost space admits a Pareto frontier: points where improving one dimension requires sacrificing another.
+
+**Proposition 2 (Three Operating Points).** *The Pareto frontier in the A-L-C space has three natural "knee points" corresponding to:*
+
+1. **Cost-dominated regime** (Lightweight): Minimal intervention, detection-only
+2. **Balanced regime** (Standard): Moderate refinement, single-model
+3. **Accuracy-dominated regime** (Premium): Maximum verification, multi-model
+
+```mermaid
+graph TD
+ subgraph "Pareto Frontier Visualization"
+ direction LR
+
+ A[Accuracy] ---|Trade-off| L[Latency]
+ L ---|Trade-off| C[Cost]
+ C ---|Trade-off| A
+
+ subgraph "Operating Points"
+ L1[🟢 Lightweight
Low A, Low L, Low C]
+ S1[🟡 Standard
Med A, Med L, Med C]
+ P1[🔴 Premium
High A, High L, High C]
+ end
+ end
+```
+
+#### 3.4.2 Why Not Continuous Control?
+
+One might ask: why discrete modes rather than continuous parameters?
+
+**Argument 1 (Cognitive Load)**: Users cannot effectively reason about continuous trade-offs. Three discrete modes map to intuitive concepts: "fast/cheap," "balanced," "best quality."
+
+**Argument 2 (Operational Complexity)**: Each mode involves qualitatively different mechanisms (detection-only vs. iteration vs. multi-model). Intermediate points would require complex interpolation.
+
+**Argument 3 (Empirical Gaps)**: The Pareto frontier is not smooth—there are natural gaps where intermediate configurations offer little benefit over the nearest discrete mode.
+
+#### 3.4.3 Mode Selection as Online Learning
+
+In production, mode selection can be formulated as a **multi-armed bandit** problem:
+
+- **Arms**: \{Lightweight, Standard, Premium\}
+- **Reward**: User satisfaction (proxy: no negative feedback)
+- **Cost**: Latency + API costs
+
+**Thompson Sampling** approach: Maintain Beta distributions over success probability for each mode, sample and select, update based on outcome. This enables adaptive mode selection per query type.
+
+---
+
+## 4. System Architecture
+
+### 4.1 High-Level Architecture
+
+TruthLens integrates into the vLLM Semantic Router's ExtProc pipeline, creating a comprehensive request-response security boundary:
+
+```mermaid
+flowchart TB
+ subgraph "Client Layer"
+ C1[Enterprise App]
+ C2[Chatbot]
+ C3[RAG System]
+ end
+
+ subgraph "vLLM Semantic Router"
+ direction TB
+
+ subgraph "Request Phase (Existing)"
+ REQ[Request Processing]
+ SEC[Security Checks
PII Detection
Jailbreak Detection]
+ ROUTE[Intent Classification
Model Selection]
+ CACHE_R[Semantic Cache
Lookup]
+ end
+
+ subgraph "LLM Inference"
+ LLM1[Primary Model]
+ LLM2[Secondary Model]
+ LLM3[Verification Model]
+ end
+
+ subgraph "Response Phase (NEW: TruthLens)"
+ DET[Hallucination
Detection]
+ EVAL[Strategy
Evaluation]
+
+ subgraph "Mitigation Modes"
+ M1[Lightweight
Warning Only]
+ M2[Standard
Self-Refinement]
+ M3[Premium
Multi-Model]
+ end
+
+ FINAL[Response
Finalization]
+ end
+ end
+
+ subgraph "Observability"
+ METRICS[Metrics
Prometheus]
+ TRACE[Tracing
OpenTelemetry]
+ LOG[Logging
Structured]
+ end
+
+ C1 --> REQ
+ C2 --> REQ
+ C3 --> REQ
+
+ REQ --> SEC --> ROUTE --> CACHE_R
+ CACHE_R -->|Miss| LLM1
+ CACHE_R -->|Hit| DET
+
+ LLM1 --> DET
+ DET --> EVAL
+
+ EVAL -->|Lightweight| M1
+ EVAL -->|Standard| M2
+ EVAL -->|Premium| M3
+
+ M2 -->|Refine| LLM1
+ M3 -->|Cross-verify| LLM2
+ M3 -->|Cross-verify| LLM3
+
+ M1 --> FINAL
+ M2 --> FINAL
+ M3 --> FINAL
+
+ FINAL --> C1
+ FINAL --> C2
+ FINAL --> C3
+
+ DET -.-> METRICS
+ DET -.-> TRACE
+ DET -.-> LOG
+```
+
+### 4.2 Detection Flow
+
+The hallucination detection process operates on the complete context-query-response triple:
+
+```mermaid
+flowchart LR
+ subgraph "Input Assembly"
+ SYS[System Prompt
+ RAG Context]
+ HIST[Conversation
History]
+ QUERY[User Query]
+ RESP[LLM Response]
+ end
+
+ subgraph "Detection Engine"
+ ENCODE[Encoder Model
ModernBERT]
+ TOKEN[Token-Level
Classification]
+ AGG[Score
Aggregation]
+ end
+
+ subgraph "Output"
+ SCORE[Hallucination
Score: 0.0-1.0]
+ SPANS[Hallucinated
Spans]
+ META[Detection
Metadata]
+ end
+
+ SYS --> ENCODE
+ HIST --> ENCODE
+ QUERY --> ENCODE
+ RESP --> ENCODE
+
+ ENCODE --> TOKEN --> AGG
+
+ AGG --> SCORE
+ AGG --> SPANS
+ AGG --> META
+```
+
+---
+
+## 5. User Strategy Options: The Cost-Accuracy Spectrum
+
+TruthLens provides three operational modes that allow organizations to position themselves on the cost-accuracy trade-off spectrum based on their specific requirements.
+
+### 5.1 Strategy Overview
+
+```mermaid
+flowchart TB
+ subgraph "User Selection"
+ USER[Organization
Requirements]
+ end
+
+ subgraph "Mode Selection"
+ direction LR
+ L[🟢 Lightweight Mode
Cost Priority]
+ S[🟡 Standard Mode
Balanced]
+ P[🔴 Premium Mode
Accuracy Priority]
+ end
+
+ subgraph "Lightweight Mode"
+ L1[Single Detection Pass]
+ L2[Warning Injection Only]
+ L3[No Additional LLM Calls]
+ end
+
+ subgraph "Standard Mode"
+ S1[Detection + Self-Refinement]
+ S2[Same Model Iteration]
+ S3[Max 3-5 Iterations]
+ end
+
+ subgraph "Premium Mode"
+ P1[Multi-Model Detection]
+ P2[Cross-Verification]
+ P3[Collaborative Correction]
+ end
+
+ USER --> L
+ USER --> S
+ USER --> P
+
+ L --> L1 --> L2 --> L3
+ S --> S1 --> S2 --> S3
+ P --> P1 --> P2 --> P3
+```
+
+### 5.2 Mode Comparison Matrix
+
+| Dimension | 🟢 Lightweight | 🟡 Standard | 🔴 Premium |
+|-----------|---------------|-------------|------------|
+| **Primary Goal** | Cost efficiency | Balanced | Maximum accuracy |
+| **Detection Method** | Single encoder pass | Encoder + self-check | Multi-model cross-verification |
+| **Mitigation Action** | Warning injection | Iterative self-refinement | Multi-model collaborative correction |
+| **Latency Overhead** | +15-35ms | +200-500ms (2-4x) | +1-3s (5-10x) |
+| **Cost Multiplier** | 1.0x (detection only) | 1.5-2.5x | 3-5x |
+| **Hallucination Reduction** | Awareness only | 40-60% | 70-85% |
+| **Best For** | Internal tools, chatbots | Business applications | Medical, legal, financial |
+
+### 5.3 Lightweight Mode: Cost-Optimized Detection
+
+**Philosophy**: Minimize operational cost while providing hallucination awareness. This mode treats hallucination detection as an **information service** rather than an intervention system.
+
+#### 5.3.1 Theoretical Basis
+
+Lightweight Mode is grounded in **Bounded Rationality Theory** (Simon, 1955): when optimization costs exceed benefits, satisficing (accepting "good enough") is rational.
+
+**Cost-Benefit Analysis**:
+
+Let *C_detect* = cost of detection, *C_mitigate* = cost of mitigation, *p* = probability of hallucination, *L* = expected loss from undetected hallucination.
+
+Lightweight Mode is optimal when: *C_detect < p · L* but *C_detect + C_mitigate > p · L*
+
+In other words: detection is worth the cost, but full mitigation is not.
+
+#### 5.3.2 Mechanism
+
+```mermaid
+sequenceDiagram
+ participant C as Client
+ participant R as Router
+ participant D as Detector
(ModernBERT)
+ participant L as LLM Backend
+
+ C->>R: Request
+ R->>L: Forward Request
+ L->>R: Response
+ R->>D: Detect(context, query, response)
+ D->>R: Score + Spans
+
+ alt Score >= Threshold
+ R->>R: Inject Warning Banner
+ R->>R: Add Metadata Headers
+ end
+
+ R->>C: Response (with warning if detected)
+```
+
+**Characteristics**:
+
+- **No additional LLM calls** after initial generation
+- **Fixed detection cost** regardless of response length
+- **User-facing warning** empowers human verification
+- **Rich metadata** for downstream analytics
+
+#### 5.3.3 Theoretical Guarantees
+
+**Proposition 3 (Detection Latency Bound).** *For ModernBERT-large with sequence length L ≤ 8192:*
+
+*T_detect ≤ O(L²/chunk_size) + O(L · d)*
+
+*In practice: T_detect ≤ 35ms for L ≤ 4096 on modern GPUs.*
+
+**Proposition 4 (No False Negatives on Pass-Through).** *In Lightweight Mode, all hallucinations above threshold τ are flagged. The mode never suppresses detection results.*
+
+#### 5.3.4 Ideal Use Cases
+
+- Internal knowledge bases (users can verify)
+- Developer assistants (technical users)
+- Creative writing tools (hallucination may be desired)
+- Low-stakes customer interactions (human escalation available)
+
+---
+
+### 5.4 Standard Mode: Balanced Self-Refinement
+
+**Philosophy**: Leverage the same model to self-correct detected hallucinations through iterative refinement. This mode implements a **closed-loop feedback system** where the LLM serves as both generator and corrector.
+
+#### 5.4.1 Theoretical Basis
+
+Standard Mode is grounded in **Self-Consistency Theory** and **Iterative Refinement**:
+
+**Theorem 3 (Self-Refinement Effectiveness).** *If an LLM has learned the correct answer distribution for a query class, then prompting with explicit error feedback increases the probability of correct output:*
+
+*P(correct | feedback on error) > P(correct | no feedback)*
+
+*provided the feedback is accurate and actionable.*
+
+**Intuition**: LLMs often "know" the right answer but fail to produce it on first attempt due to:
+
+- Sampling noise (temperature > 0)
+- Attention to wrong context regions
+- Competing patterns in weights
+
+Explicit error feedback redirects attention and suppresses incorrect patterns.
+
+#### 5.4.2 Convergence Analysis
+
+**Definition 9 (Refinement Sequence).** The sequence *\{Rₜ\}* for *t = 0, 1, 2, ...* where:
+
+*R₀ = LLM(Q, C)* (initial response)
+*Rₜ₊₁ = LLM(Prompt_refine(Q, C, Rₜ, Detect(Rₜ)))* (refined response)
+
+**Lemma 1 (Monotonic Score Decrease).** *Under mild assumptions (consistent LLM, accurate detection), the hallucination score sequence is non-increasing:*
+
+*s(Rₜ₊₁) ≤ s(Rₜ)* with high probability
+
+**Empirical Convergence Pattern**:
+
+| Iteration | Typical Score Reduction | Marginal Improvement |
+|-----------|------------------------|----------------------|
+| 1 → 2 | 30-50% | High |
+| 2 → 3 | 15-25% | Medium |
+| 3 → 4 | 5-15% | Low |
+| 4+ | \<5% | Diminishing |
+
+This motivates the default *max_iterations = 3* setting.
+
+#### 5.4.3 Mechanism
+
+```mermaid
+sequenceDiagram
+ participant C as Client
+ participant R as Router
+ participant D as Detector
+ participant L as Primary LLM
+
+ C->>R: Request
+ R->>L: Forward Request
+ L->>R: Response₀
+
+ loop Max N Iterations
+ R->>D: Detect(context, query, responseᵢ)
+ D->>R: Score + Hallucinated Spans
+
+ alt Score >= Threshold
+ R->>R: Build Correction Prompt with:
• Original context
• Detected spans
• Correction instructions
+ R->>L: Correction Request
+ L->>R: Responseᵢ₊₁
+ else Score < Threshold
+ R->>C: Final Response (Verified)
+ end
+ end
+
+ Note over R,C: If max iterations reached,
return best response with disclaimer
+```
+
+**Characteristics**:
+
+- **Iterative improvement** through self-reflection
+- **Same model** maintains consistency
+- **Bounded iterations** control costs
+- **Graceful degradation** if convergence fails
+
+**Research Foundation**: Based on Self-Refine (NeurIPS 2023) and Chain-of-Verification (ACL 2024) principles.
+
+**Ideal Use Cases**:
+
+- Business intelligence reports
+- Customer support (escalated queries)
+- Educational content
+- Technical documentation
+
+### 5.5 Premium Mode: Multi-Model Collaborative Verification
+
+**Philosophy**: Maximum accuracy through diverse model perspectives and collaborative error correction. This mode implements **ensemble verification** and **adversarial debate** mechanisms.
+
+#### 5.5.1 Theoretical Basis: Ensemble Learning
+
+Premium Mode is grounded in **Condorcet's Jury Theorem** (1785) and modern **ensemble learning** theory:
+
+**Theorem 4 (Condorcet's Jury Theorem, adapted).** *For M independent models each with accuracy p > 0.5 on a binary decision, the majority vote accuracy approaches 1 as M → ∞:*
+
+*P(majority correct) = Σ(k=⌈M/2⌉ to M) C(M,k) · pᵏ · (1-p)^(M-k) → 1*
+
+**Corollary (Diversity Requirement)**: The theorem requires **independence**. Correlated models (same training data, architecture) provide diminishing returns.
+
+**Practical Diversity Sources**:
+
+| Diversity Type | Example | Independence Level |
+|----------------|---------|-------------------|
+| Training data | GPT vs Claude | High |
+| Architecture | Transformer vs Mamba | Very High |
+| Fine-tuning | Base vs Instruct | Medium |
+| Prompting | Different system prompts | Low |
+
+#### 5.5.2 Theoretical Basis: Multi-Agent Debate
+
+Beyond voting, **debate** enables models to refine each other's reasoning:
+
+**Definition 10 (Debate Protocol).** A debate between models M₁, M₂ with judge J consists of:
+
+1. **Generation Phase**: Both models produce responses R₁, R₂
+2. **Critique Phase**: Each model critiques the other's response
+3. **Defense Phase**: Models defend their claims with evidence
+4. **Synthesis Phase**: Judge J produces final response based on arguments
+
+**Theorem 5 (Debate Improves Grounding).** *When models must justify claims with evidence from context C, the debate process filters ungrounded claims:*
+
+*An ungrounded claim in R₁ will be challenged by M₂ if M₂ cannot find supporting evidence in C.*
+
+**Information-Theoretic View**: Debate acts as a **lossy compression** of the argument space, preserving only claims that survive cross-examination.
+
+#### 5.5.3 Mechanism
+
+```mermaid
+sequenceDiagram
+ participant C as Client
+ participant R as Router
+ participant D as Detector
+ participant L1 as Primary LLM
(e.g., GPT-4)
+ participant L2 as Verifier LLM
(e.g., Claude)
+ participant L3 as Judge LLM
(e.g., Llama-3)
+
+ C->>R: Request
+ R->>L1: Forward Request
+ L1->>R: Response₁
+
+ par Cross-Model Verification
+ R->>L2: Same Request
+ L2->>R: Response₂
+ and
+ R->>D: Detect(context, query, response₁)
+ D->>R: Initial Detection
+ end
+
+ R->>R: Compare Response₁ vs Response₂
Identify Discrepancies
+
+ alt Significant Discrepancies Found
+ R->>L3: Arbitration Request:
• Context + Query
• Response₁ + Response₂
• Discrepancy Analysis
+ L3->>R: Synthesized Response
+ R->>D: Final Verification
+ D->>R: Final Score
+ end
+
+ R->>C: Verified Response with
Confidence Metadata
+```
+
+#### 5.5.4 Consensus Mechanisms
+
+**Mechanism 1: Segment-Level Voting**
+
+For each claim segment *S*:
+
+*vote(S) = Σₘ 𝟙[S ∈ Rₘ] / M*
+
+Accept *S* if *vote(S) > 0.5* (majority agreement).
+
+**Mechanism 2: Confidence-Weighted Fusion**
+
+*R_final = argmax_R Σₘ wₘ · sim(R, Rₘ)*
+
+where *wₘ* is model m's calibrated confidence and *sim* is semantic similarity.
+
+**Mechanism 3: Fine-Grained Replacement (Finch-Zk)**
+
+1. Segment R₁ into claims \{S₁, S₂, ..., Sₖ\}
+2. For each Sᵢ, check consistency with R₂
+3. If inconsistent: replace Sᵢ with version from more reliable model
+4. Output: hybrid response with highest-confidence segments
+
+#### 5.5.5 Cost-Accuracy Trade-off Analysis
+
+| Configuration | Models | Expected Accuracy Gain | Cost Multiplier |
+|---------------|--------|----------------------|-----------------|
+| Dual-model voting | 2 | +15-25% | 2x |
+| Triple-model voting | 3 | +25-35% | 3x |
+| Dual + Judge | 2+1 | +30-40% | 3x |
+| Full debate (3 rounds) | 2+1 | +40-50% | 5-6x |
+
+#### 5.5.6 Ideal Use Cases
+
+- **Medical diagnosis assistance**: Life-critical decisions
+- **Legal document analysis**: Liability implications
+- **Financial advisory**: Regulatory compliance required
+- **Safety-critical systems**: Aerospace, nuclear, etc.
+
+### 5.6 Mode Selection Decision Tree
+
+```mermaid
+flowchart TD
+ START[New Application] --> Q1{Regulatory
Requirements?}
+
+ Q1 -->|Healthcare/Finance/Legal| P[🔴 Premium Mode]
+ Q1 -->|None/Low| Q2{User Impact
of Errors?}
+
+ Q2 -->|High| Q3{Budget
Constraints?}
+ Q2 -->|Low| L[🟢 Lightweight Mode]
+
+ Q3 -->|Flexible| S[🟡 Standard Mode]
+ Q3 -->|Tight| Q4{Can Users
Verify?}
+
+ Q4 -->|Yes| L
+ Q4 -->|No| S
+
+ P --> CONFIG_P[Configure:
• Multi-model backends
• Max iterations: 5-10
• Threshold: 0.3-0.5]
+
+ S --> CONFIG_S[Configure:
• Self-refinement
• Max iterations: 3-5
• Threshold: 0.5-0.7]
+
+ L --> CONFIG_L[Configure:
• Warning template
• Threshold: 0.6-0.8
• Metadata headers]
+```
+
+---
+
+## 6. Configuration Design
+
+### 6.1 Global Configuration
+
+```yaml
+# Global hallucination detection settings
+hallucination:
+ enabled: true
+
+ # Detection model (ModernBERT-based)
+ model_id: "models/lettucedetect-large-modernbert-en-v1"
+ use_cpu: false
+
+ # Default operational mode
+ default_mode: "standard" # lightweight | standard | premium
+
+ # Detection threshold (0.0 - 1.0)
+ # Lower = more strict, Higher = more lenient
+ threshold: 0.6
+
+ # Warning template for lightweight mode
+ warning_template: |
+ ⚠️ **Notice**: This response may contain information that could not be
+ fully verified against the provided context. Please verify critical facts
+ before taking action.
+
+ # Standard mode settings
+ standard:
+ max_iterations: 3
+ convergence_threshold: 0.4 # Stop if score drops below this
+
+ # Premium mode settings
+ premium:
+ verification_models:
+ - "claude-3-sonnet"
+ - "gpt-4-turbo"
+ judge_model: "llama-3.1-70b"
+ max_iterations: 5
+ require_consensus: true
+```
+
+### 6.2 Per-Decision Plugin Configuration
+
+```yaml
+decisions:
+ # Healthcare domain - Maximum accuracy required
+ - name: "medical_assistant"
+ description: "Medical information queries"
+ priority: 100
+ rules:
+ operator: "OR"
+ conditions:
+ - type: "domain"
+ name: "healthcare"
+ - type: "keyword"
+ name: "medical_terms"
+ modelRefs:
+ - model: "gpt-4-turbo"
+ plugins:
+ - type: "hallucination"
+ configuration:
+ enabled: true
+ mode: "premium"
+ threshold: 0.3 # Very strict
+ max_iterations: 5
+ require_disclaimer: true
+
+ # Financial services - High accuracy
+ - name: "financial_advisor"
+ description: "Financial analysis and advice"
+ priority: 90
+ rules:
+ operator: "OR"
+ conditions:
+ - type: "domain"
+ name: "finance"
+ plugins:
+ - type: "hallucination"
+ configuration:
+ enabled: true
+ mode: "standard"
+ threshold: 0.5
+ max_iterations: 4
+
+ # General customer support - Balanced
+ - name: "customer_support"
+ description: "General customer inquiries"
+ priority: 50
+ rules:
+ operator: "OR"
+ conditions:
+ - type: "domain"
+ name: "support"
+ plugins:
+ - type: "hallucination"
+ configuration:
+ enabled: true
+ mode: "standard"
+ threshold: 0.6
+ max_iterations: 2
+
+ # Internal tools - Cost optimized
+ - name: "internal_assistant"
+ description: "Internal knowledge base queries"
+ priority: 30
+ rules:
+ operator: "OR"
+ conditions:
+ - type: "domain"
+ name: "internal"
+ plugins:
+ - type: "hallucination"
+ configuration:
+ enabled: true
+ mode: "lightweight"
+ threshold: 0.7
+
+ # Creative writing - Detection disabled
+ - name: "creative_writing"
+ description: "Creative content generation"
+ priority: 20
+ rules:
+ operator: "OR"
+ conditions:
+ - type: "domain"
+ name: "creative"
+ plugins:
+ - type: "hallucination"
+ configuration:
+ enabled: false # "Hallucination" is a feature here
+```
+
+### 6.3 Response Headers
+
+The following headers are added to all responses when hallucination detection is enabled:
+
+| Header | Description | Example Values |
+|--------|-------------|----------------|
+| `X-TruthLens-Enabled` | Whether detection was performed | `true`, `false` |
+| `X-TruthLens-Mode` | Operational mode used | `lightweight`, `standard`, `premium` |
+| `X-TruthLens-Score` | Hallucination confidence score | `0.0` - `1.0` |
+| `X-TruthLens-Detected` | Whether hallucination exceeded threshold | `true`, `false` |
+| `X-TruthLens-Iterations` | Number of refinement iterations | `0`, `1`, `2`, ... |
+| `X-TruthLens-Latency-Ms` | Detection/mitigation latency | `35`, `450`, `2100` |
+
+### 6.4 Metrics and Observability
+
+**Prometheus Metrics:**
+
+| Metric | Type | Labels | Description |
+|--------|------|--------|-------------|
+| `truthlens_detections_total` | Counter | `decision`, `mode`, `detected` | Total detection operations |
+| `truthlens_score` | Histogram | `decision`, `mode` | Score distribution |
+| `truthlens_latency_seconds` | Histogram | `mode`, `operation` | Processing latency |
+| `truthlens_iterations` | Histogram | `decision`, `mode` | Refinement iteration count |
+| `truthlens_models_used` | Counter | `model`, `role` | Model usage in premium mode |
+
+---
+
+## 7. References
+
+1. Kovács, Á., & Recski, G. (2025). *LettuceDetect: A Hallucination Detection Framework for RAG Applications*. arXiv:2502.17125
+
+2. Goel, A., Schwartz, D., & Qi, Y. (2025). *Finch-Zk: Zero-knowledge LLM hallucination detection and mitigation through fine-grained cross-model consistency*. arXiv:2508.14314
+
+3. Lin, Z., Niu, Z., Wang, Z., & Xu, Y. (2024). *Interpreting and Mitigating Hallucination in MLLMs through Multi-agent Debate*. arXiv:2407.20505
+
+4. Tran, K.T., et al. (2025). *Multi-Agent Collaboration Mechanisms: A Survey of LLMs*. arXiv:2501.06322
+
+5. Manakul, P., Liusie, A., & Gales, M.J. (2023). *SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models*. arXiv:2303.08896
+
+6. Tang, L., et al. (2024). *MiniCheck: Efficient Fact-Checking of LLMs on Grounding Documents*. EMNLP 2024
+
+7. Madaan, A., et al. (2023). *Self-Refine: Iterative Refinement with Self-Feedback*. NeurIPS 2023
+
+8. Dhuliawala, S., et al. (2024). *Chain-of-Verification Reduces Hallucination in Large Language Models*. ACL Findings 2024
+
+9. Su, W., et al. (2024). *Unsupervised Real-Time Hallucination Detection based on LLM Internal States (MIND)*. ACL Findings 2024
+
+10. Belyi, M., et al. (2025). *Luna: A Lightweight Evaluation Model to Catch Language Model Hallucinations*. COLING 2025
+
+---
+
+## Appendix A: Full System Flow Diagram
+
+```mermaid
+flowchart TB
+ subgraph "Client Layer"
+ CLIENT[Client Application]
+ end
+
+ subgraph "Gateway Layer"
+ ENVOY[Envoy Proxy]
+ end
+
+ subgraph "vLLM Semantic Router - ExtProc"
+ direction TB
+
+ subgraph "Request Processing"
+ REQ_H[handleRequestHeaders]
+ REQ_B[handleRequestBody]
+
+ subgraph "Request Security"
+ PII_REQ[PII Detection]
+ JAIL[Jailbreak Detection]
+ end
+
+ subgraph "Routing"
+ CLASS[Intent Classification]
+ DECISION[Decision Engine]
+ MODEL_SEL[Model Selection]
+ end
+
+ CACHE_CHK[Semantic Cache Check]
+ end
+
+ subgraph "Response Processing"
+ RES_H[handleResponseHeaders]
+ RES_B[handleResponseBody]
+
+ subgraph "TruthLens"
+ DETECT[Hallucination
Detector]
+ SCORE[Score
Evaluation]
+
+ subgraph "Mitigation"
+ WARN[Warning
Injection]
+ REFINE[Iterative
Refinement]
+ end
+ end
+
+ CACHE_UPD[Cache Update]
+ METRICS[Metrics Recording]
+ end
+ end
+
+ subgraph "Backend Layer"
+ VLLM1[vLLM Instance 1]
+ VLLM2[vLLM Instance 2]
+ VLLMn[vLLM Instance N]
+ end
+
+ subgraph "Storage Layer"
+ REDIS[(Redis Cache)]
+ MODELS[(Model Files)]
+ end
+
+ CLIENT --> ENVOY
+ ENVOY <--> REQ_H
+ REQ_H --> REQ_B
+ REQ_B --> PII_REQ
+ PII_REQ --> JAIL
+ JAIL --> CLASS
+ CLASS --> DECISION
+ DECISION --> MODEL_SEL
+ MODEL_SEL --> CACHE_CHK
+
+ CACHE_CHK -->|Cache Hit| RES_H
+ CACHE_CHK -->|Cache Miss| VLLM1
+ CACHE_CHK -->|Cache Miss| VLLM2
+ CACHE_CHK -->|Cache Miss| VLLMn
+
+ VLLM1 --> RES_H
+ VLLM2 --> RES_H
+ VLLMn --> RES_H
+
+ RES_H --> RES_B
+ RES_B --> DETECT
+ DETECT --> SCORE
+
+ SCORE -->|Below Threshold| CACHE_UPD
+ SCORE -->|Above Threshold| WARN
+ SCORE -->|Above Threshold| REFINE
+
+ WARN --> CACHE_UPD
+ REFINE -->|Retry| VLLM1
+ REFINE -->|Converged| CACHE_UPD
+
+ CACHE_UPD --> METRICS
+ METRICS --> ENVOY
+ ENVOY --> CLIENT
+
+ REDIS <-.-> CACHE_CHK
+ REDIS <-.-> CACHE_UPD
+ MODELS <-.-> DETECT
+```
+
+---
+
+## Appendix B: Glossary
+
+| Term | Definition |
+|------|------------|
+| **Hallucination** | LLM-generated content that is factually incorrect or unsupported by context |
+| **Intrinsic Hallucination** | Fabricated facts from the model's internal parametric knowledge |
+| **Extrinsic Hallucination** | Content not grounded in the provided context (common in RAG) |
+| **ExtProc** | Envoy External Processor - enables request/response modification at the gateway |
+| **Token-Level Detection** | Identifying specific tokens/spans that are hallucinated |
+| **Self-Refinement** | Iterative process where the same model corrects its own hallucinations |
+| **Cross-Model Verification** | Using multiple different models to verify factual consistency |
+| **Multi-Agent Debate** | Multiple LLM agents argue positions to converge on factual truth |
+| **RAG** | Retrieval-Augmented Generation - grounding LLMs with retrieved documents |
+| **ModernBERT** | State-of-the-art encoder architecture with 8K context support |
+| **Accuracy-Latency-Cost Triangle** | Fundamental trade-off in hallucination mitigation strategies |
+| **Convergence Threshold** | Score below which hallucination is considered resolved |
+
+---
+
+**Document Version:** 1.0 | **Last Updated:** December 2025
diff --git a/website/sidebars.ts b/website/sidebars.ts
index 790b265fd..31d64d218 100644
--- a/website/sidebars.ts
+++ b/website/sidebars.ts
@@ -131,6 +131,7 @@ const sidebars: SidebarsConfig = {
type: 'category',
label: 'Proposals',
items: [
+ 'proposals/hallucination-mitigation-milestone',
'proposals/prompt-classification-routing',
'proposals/nvidia-dynamo-integration',
'proposals/production-stack-integration',