Skip to content

Commit 0364191

Browse files
committed
docs: optimize architecture diagram layout with vertical flow
- Reorganize diagram to show clear top-to-bottom flow: ① vLLM Semantic Router Layer (top) ② NVIDIA Dynamo Layer (middle) ③ Execution Layer - Worker Pools (bottom) - Place Storage Layer on the side for better space utilization - Add numbered labels (①②③) to emphasize layer hierarchy - Wrap main processing layers in a 'Main Processing Flow' subgraph - Improve visual clarity with consistent direction indicators - Update step numbers in flow annotations (1-7) This layout better reflects the actual request flow from client through semantic intelligence, infrastructure routing, to execution. Signed-off-by: bitliu <[email protected]>
1 parent 48a5a02 commit 0364191

File tree

1 file changed

+51
-46
lines changed

1 file changed

+51
-46
lines changed

website/docs/proposals/nvidia-dynamo-integration.md

Lines changed: 51 additions & 46 deletions
Original file line numberDiff line numberDiff line change
@@ -519,57 +519,61 @@ prompt_guard:
519519
graph TB
520520
Client[LLM Application<br/>OpenAI SDK]
521521
522-
subgraph SIL["Semantic Intelligence Layer"]
522+
subgraph Main["Main Processing Flow"]
523523
direction TB
524-
Gateway[Envoy Gateway :8080]
525-
ExtProc[Semantic Router ExtProc :50051]
526-
527-
subgraph SC["Semantic Components"]
528-
direction LR
529-
Classifier[BERT Classifier]
530-
PIIDetector[PII Detector]
531-
JailbreakGuard[Jailbreak Guard]
532-
end
533524
534-
SemanticCache[Semantic Cache]
535-
ToolSelector[Tool Selector]
536-
end
525+
subgraph SIL["① vLLM Semantic Router Layer"]
526+
direction TB
527+
Gateway[Envoy Gateway :8080]
528+
ExtProc[Semantic Router ExtProc :50051]
537529
538-
subgraph DL["NVIDIA Dynamo Layer"]
539-
direction TB
540-
DynamoFrontend[Dynamo Frontend :8000]
530+
subgraph SC["Semantic Components"]
531+
direction LR
532+
Classifier[BERT Classifier]
533+
PIIDetector[PII Detector]
534+
JailbreakGuard[Jailbreak Guard]
535+
end
541536
542-
subgraph DR["Routing & Management"]
543-
direction LR
544-
DynamoRouter[KV Router]
545-
KVBM[KV Block Manager]
537+
SemanticCache[Semantic Cache]
538+
ToolSelector[Tool Selector]
546539
end
547540
548-
Planner[Planner - Dynamic Scaling]
549-
end
541+
subgraph DL["② NVIDIA Dynamo Layer"]
542+
direction TB
543+
DynamoFrontend[Dynamo Frontend :8000]
550544
551-
subgraph EL["Execution Layer - Worker Pools"]
552-
direction TB
545+
subgraph DR["Routing & Management"]
546+
direction LR
547+
DynamoRouter[KV Router]
548+
KVBM[KV Block Manager]
549+
end
553550
554-
subgraph MP1["deepseek-v31"]
555-
direction LR
556-
W1[Prefill Worker]
557-
W2[Decode Worker]
551+
Planner[Planner - Dynamic Scaling]
558552
end
559553
560-
subgraph MP2["phi4"]
561-
direction LR
562-
W3[Prefill Worker]
563-
W4[Decode Worker]
564-
end
554+
subgraph EL["③ Execution Layer - Worker Pools"]
555+
direction TB
556+
557+
subgraph MP1["Model Pool: deepseek-v31"]
558+
direction LR
559+
W1[Prefill Worker]
560+
W2[Decode Worker]
561+
end
565562
566-
subgraph MP3["qwen3"]
567-
W5[Worker - SGLang]
563+
subgraph MP2["Model Pool: phi4"]
564+
direction LR
565+
W3[Prefill Worker]
566+
W4[Decode Worker]
567+
end
568+
569+
subgraph MP3["Model Pool: qwen3"]
570+
W5[Worker - SGLang]
571+
end
568572
end
569573
end
570574
571575
subgraph SL["Storage Layer"]
572-
direction LR
576+
direction TB
573577
Milvus[(Milvus<br/>Semantic Cache)]
574578
SystemMem[(System Memory<br/>KV Offload)]
575579
NVMe[(NVMe<br/>Cold Cache)]
@@ -587,30 +591,31 @@ graph TB
587591
DynamoFrontend --> DynamoRouter
588592
DynamoRouter <--> KVBM
589593
590-
DynamoRouter --> W1
591-
DynamoRouter --> W2
594+
DynamoRouter -->|4. Worker Selection| W1
595+
DynamoRouter -->|4. Worker Selection| W2
592596
DynamoRouter -.-> W3
593597
DynamoRouter -.-> W4
594598
DynamoRouter -.-> W5
595599
596-
Planner -.-> W1
597-
Planner -.-> W2
598-
Planner -.-> W3
599-
Planner -.-> W4
600-
Planner -.-> W5
600+
Planner -.->|Scaling| W1
601+
Planner -.->|Scaling| W2
602+
Planner -.->|Scaling| W3
603+
Planner -.->|Scaling| W4
604+
Planner -.->|Scaling| W5
601605
602606
SemanticCache <--> Milvus
603607
KVBM <--> SystemMem
604608
KVBM <--> NVMe
605609
606-
W1 -->|4. Response| DynamoFrontend
607-
DynamoFrontend --> Gateway
608-
Gateway --> Client
610+
W1 -->|5. Response| DynamoFrontend
611+
DynamoFrontend -->|6. Response| Gateway
612+
Gateway -->|7. Response| Client
609613
610614
style ExtProc fill:#e1f5ff
611615
style DynamoRouter fill:#c8e6c9
612616
style SemanticCache fill:#fff9c4
613617
style KVBM fill:#fff9c4
618+
style SL fill:#f5f5f5
614619
```
615620

616621
**Architecture Layers:**

0 commit comments

Comments
 (0)