You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: components/README.md
+7-15Lines changed: 7 additions & 15 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -19,25 +19,17 @@ limitations under the License.
19
19
20
20
This directory contains the core components that make up the Dynamo inference framework. Each component serves a specific role in the distributed LLM serving architecture, enabling high-throughput, low-latency inference across multiple nodes and GPUs.
21
21
22
-
## Supported Inference Engines
23
-
24
-
Dynamo supports multiple inference engines (with a focus on SGLang, vLLM, and TensorRT-LLM), each with their own deployment configurations and capabilities:
25
-
26
-
-**[vLLM](/docs/backends/vllm/README.md)** - High-performance LLM inference with native KV cache events and NIXL-based transfer mechanisms
27
-
-**[SGLang](/docs/backends/sglang/README.md)** - Structured generation language framework with ZMQ-based communication
28
-
-**[TensorRT-LLM](/docs/backends/trtllm/README.md)** - NVIDIA's optimized LLM inference engine with TensorRT acceleration
29
-
30
-
Each engine provides launch scripts for different deployment patterns in their respective `/launch` & `/deploy` directories.
31
-
32
22
## Core Components
33
23
34
-
### [Backends](backends/)
24
+
### Backends
25
+
26
+
Dynamo supports multiple inference engines, each with their own deployment configurations and capabilities:
35
27
36
-
The backends directory contains inference engine integrations and implementations, with a key focus on:
28
+
-**[vLLM](/docs/backends/vllm/README.md)** - Full-featured vLLM integration with disaggregated serving, KV-aware routing, SLA-based planning, native KV cache events, and NIXL-based transfer mechanisms
29
+
-**[SGLang](/docs/backends/sglang/README.md)** - SGLang engine integration with ZMQ-based communication, supporting disaggregated serving and KV-aware routing
30
+
-**[TensorRT-LLM](/docs/backends/trtllm/README.md)** - TensorRT-LLM integration with disaggregated serving capabilities and TensorRT acceleration
37
31
38
-
-**vLLM** - Full-featured vLLM integration with disaggregated serving, KV-aware routing, and SLA-based planning
0 commit comments