docs: emphasize fusion routing in examples instead of simple classification

Xunzhuo · Xunzhuo · commit be24cd5d5857 · 2025-10-09T00:03:07.000+08:00
- Update 'Concrete Example' to show 3-signal fusion routing:
  * Keyword matching (fast path)
  * Similarity search (semantic concepts)
  * BERT classification (deep understanding)
  * Multi-signal consensus for final decision
- Update 'Step 4: Intent Classification' to 'Step 4: Fusion Routing'
  * Show all three signals in the processing pipeline
  * Emphasize adaptive latency based on signal used
- Maintain consistency with fusion routing strategy described in section 2.2.2
- Ensure examples reflect the actual multi-signal routing implementation

This change clarifies that the system uses intelligent fusion routing
rather than relying solely on ModernBERT classification, which better
represents the actual architecture and provides better accuracy and
performance characteristics.

Signed-off-by: bitliu &lt;bitliu@tencent.com&gt;
diff --git a/website/docs/proposals/nvidia-dynamo-integration.md b/website/docs/proposals/nvidia-dynamo-integration.md
@@ -191,7 +191,7 @@ Enriched Request → [Worker Selection] → KV Cache Optimization → GPU Schedu
 | **Fusion Routing** | ✅ BERT + keyword + similarity fusion | ❌ KV-aware only | ✅ Multi-signal intelligent routing |
 | **Caching** | ✅ Semantic similarity (Milvus) | ✅ KV cache reuse | ✅✅ **Dual-layer caching** |
 | **Security** | ✅ PII + jailbreak | ❌ No security layer | ✅ Pre-inference filtering |
-| **Cost Optimization** | ✅ Model-level | ✅ Infrastructure-level | ✅✅ **End-to-end optimization** |
+| **Cost Optimization** | ✅ Cross-Model-level | ✅ Infrastructure-level | ✅✅ **End-to-end optimization** |
 | **Latency** | Adaptive (fusion routing) | Low routing overhead | **Parallel execution** |
 
 **Concrete Example:**
@@ -202,7 +202,12 @@ Query: "Explain the proof of Fermat's Last Theorem step-by-step"
 ┌─────────────────────────────────────────────────────────────────┐
 │ Semantic Router Layer                                           │
 ├─────────────────────────────────────────────────────────────────┤
-│ 1. Classification: "math" category (confidence: 0.92)           │
+│ 1. Fusion Routing (3-signal analysis):                          │
+│    a) Keyword Match: "theorem", "proof" → math (confidence: 0.8)│
+│    b) Similarity Search: matches "mathematical proofs" concept  │
+│       (similarity: 0.87)                                         │
+│    c) BERT Classification: "math" category (confidence: 0.92)   │
+│    → Final Decision: "math" (multi-signal consensus)            │
 │ 2. Model Selection: deepseek-v31 (best for math reasoning)      │
 │ 3. System Prompt Injection:                                     │
 │    "You are a mathematics expert. Provide step-by-step          │
@@ -667,11 +672,13 @@ graph TB
 │   - Action: Return cached response if HIT                                   │
 │   - Latency: Very low (cache hit), Low (cache miss)                         │
 │                                                                              │
-│ Step 4: Intent Classification                                               │
-│   - ModernBERT classification (10 categories)                               │
+│ Step 4: Fusion Routing (Multi-Signal Classification)                        │
+│   - Signal 1: Keyword matching (fast path)                                  │
+│   - Signal 2: Similarity search (semantic concepts)                         │
+│   - Signal 3: BERT classification (deep understanding)                      │
 │   - Entropy-based reasoning decision                                        │
 │   - Category: math, code, reasoning, creative, etc.                         │
-│   - Latency: Moderate                                                       │
+│   - Latency: Adaptive (keyword: minimal, similarity: low, BERT: moderate)   │
 │                                                                              │
 │ Step 5: Model Selection                                                     │
 │   - Lookup category → model scores mapping                                  │