Skip to content

Commit be24cd5

Browse files
committed
docs: emphasize fusion routing in examples instead of simple classification
- Update 'Concrete Example' to show 3-signal fusion routing: * Keyword matching (fast path) * Similarity search (semantic concepts) * BERT classification (deep understanding) * Multi-signal consensus for final decision - Update 'Step 4: Intent Classification' to 'Step 4: Fusion Routing' * Show all three signals in the processing pipeline * Emphasize adaptive latency based on signal used - Maintain consistency with fusion routing strategy described in section 2.2.2 - Ensure examples reflect the actual multi-signal routing implementation This change clarifies that the system uses intelligent fusion routing rather than relying solely on ModernBERT classification, which better represents the actual architecture and provides better accuracy and performance characteristics. Signed-off-by: bitliu <[email protected]>
1 parent 3761959 commit be24cd5

File tree

1 file changed

+12
-5
lines changed

1 file changed

+12
-5
lines changed

website/docs/proposals/nvidia-dynamo-integration.md

Lines changed: 12 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -191,7 +191,7 @@ Enriched Request → [Worker Selection] → KV Cache Optimization → GPU Schedu
191191
| **Fusion Routing** | ✅ BERT + keyword + similarity fusion | ❌ KV-aware only | ✅ Multi-signal intelligent routing |
192192
| **Caching** | ✅ Semantic similarity (Milvus) | ✅ KV cache reuse | ✅✅ **Dual-layer caching** |
193193
| **Security** | ✅ PII + jailbreak | ❌ No security layer | ✅ Pre-inference filtering |
194-
| **Cost Optimization** | ✅ Model-level | ✅ Infrastructure-level | ✅✅ **End-to-end optimization** |
194+
| **Cost Optimization** |Cross-Model-level | ✅ Infrastructure-level | ✅✅ **End-to-end optimization** |
195195
| **Latency** | Adaptive (fusion routing) | Low routing overhead | **Parallel execution** |
196196

197197
**Concrete Example:**
@@ -202,7 +202,12 @@ Query: "Explain the proof of Fermat's Last Theorem step-by-step"
202202
┌─────────────────────────────────────────────────────────────────┐
203203
│ Semantic Router Layer │
204204
├─────────────────────────────────────────────────────────────────┤
205-
│ 1. Classification: "math" category (confidence: 0.92) │
205+
│ 1. Fusion Routing (3-signal analysis): │
206+
│ a) Keyword Match: "theorem", "proof" → math (confidence: 0.8)│
207+
│ b) Similarity Search: matches "mathematical proofs" concept │
208+
│ (similarity: 0.87) │
209+
│ c) BERT Classification: "math" category (confidence: 0.92) │
210+
│ → Final Decision: "math" (multi-signal consensus) │
206211
│ 2. Model Selection: deepseek-v31 (best for math reasoning) │
207212
│ 3. System Prompt Injection: │
208213
│ "You are a mathematics expert. Provide step-by-step │
@@ -667,11 +672,13 @@ graph TB
667672
│ - Action: Return cached response if HIT │
668673
│ - Latency: Very low (cache hit), Low (cache miss) │
669674
│ │
670-
│ Step 4: Intent Classification │
671-
│ - ModernBERT classification (10 categories) │
675+
│ Step 4: Fusion Routing (Multi-Signal Classification) │
676+
│ - Signal 1: Keyword matching (fast path) │
677+
│ - Signal 2: Similarity search (semantic concepts) │
678+
│ - Signal 3: BERT classification (deep understanding) │
672679
│ - Entropy-based reasoning decision │
673680
│ - Category: math, code, reasoning, creative, etc. │
674-
│ - Latency: Moderate
681+
│ - Latency: Adaptive (keyword: minimal, similarity: low, BERT: moderate)
675682
│ │
676683
│ Step 5: Model Selection │
677684
│ - Lookup category → model scores mapping │

0 commit comments

Comments
 (0)