Dynamo Feature Compatibility Matrices

This document provides a comprehensive compatibility matrix for key Dynamo features across the supported backends.

Updated for Dynamo v0.9.0

Legend:

✅ : Supported
🚧 : Work in Progress / Experimental / Limited

Quick Comparison

Feature	vLLM	TensorRT-LLM	SGLang	Source
Disaggregated Serving	✅	✅	✅	Design Doc
KV-Aware Routing	✅	✅	✅	Router Doc
SLA-Based Planner	✅	✅	✅	Planner Doc
KV Block Manager	✅	✅	🚧	KVBM Doc
Multimodal (Image)	✅	✅	✅	Multimodal Doc
Multimodal (Video)	✅			Multimodal Doc
Multimodal (Audio)	🚧			Multimodal Doc
Request Migration	✅	🚧	✅	Migration Doc
Request Cancellation	✅	✅	🚧	Backend READMEs
LoRA	✅			K8s Guide
Tool Calling	✅	✅	✅	Tool Calling Doc
Speculative Decoding	✅	✅	🚧	Backend READMEs

1. vLLM Backend

vLLM offers the broadest feature coverage in Dynamo, with full support for disaggregated serving, KV-aware routing, KV block management, LoRA adapters, and multimodal inference including video and audio.

Source: docs/backends/vllm/README.md

Feature	Disaggregated Serving	KV-Aware Routing	SLA-Based Planner	KV Block Manager	Multimodal	Request Migration	Request Cancellation	LoRA	Tool Calling	Speculative Decoding
Disaggregated Serving	—
KV-Aware Routing	✅	—
SLA-Based Planner	✅	✅	—
KV Block Manager	✅	✅	✅	—
Multimodal	✅	¹	—	✅	—
Request Migration	✅	✅	✅	✅	✅	—
Request Cancellation	✅	✅	✅	✅	✅	✅	—
LoRA	✅	✅²	—	✅	—	✅	✅	—
Tool Calling	✅	✅	✅	✅	✅	✅	✅	✅	—
Speculative Decoding	✅	✅	—	✅	—	✅	✅	—	✅	—

Notes:

Multimodal + KV-Aware Routing: The KV router uses token-based hashing and does not yet support image/video hashes, so it falls back to random/round-robin routing. (Source)

KV-Aware LoRA Routing: vLLM supports routing requests based on LoRA adapter affinity.

Audio Support: vLLM supports audio models like Qwen2-Audio (experimental). (Source)

Video Support: vLLM supports video input with frame sampling. (Source)

Speculative Decoding: Eagle3 support documented. (Source)

2. SGLang Backend

SGLang is optimized for high-throughput serving with fast primitives, providing robust support for disaggregated serving, KV-aware routing, and request migration.

Source: docs/backends/sglang/README.md

Feature	Disaggregated Serving	KV-Aware Routing	SLA-Based Planner	KV Block Manager	Multimodal	Request Migration	Request Cancellation	LoRA	Tool Calling	Speculative Decoding
Disaggregated Serving	—
KV-Aware Routing	✅	—
SLA-Based Planner	✅	✅	—
KV Block Manager	🚧	🚧	🚧	—
Multimodal	✅²	¹	—	🚧	—
Request Migration	✅	✅	✅	🚧	✅	—
Request Cancellation	🚧³	✅	✅	🚧	🚧	✅	—
LoRA				🚧				—
Tool Calling	✅	✅	✅	🚧	✅	✅	✅		—
Speculative Decoding	🚧	🚧	—	🚧	—	🚧	—		🚧	—

Notes:

Multimodal + KV-Aware Routing: Not supported. (Source)

Multimodal Patterns: Supports E/PD and E/P/D only (requires separate vision encoder). Does not support simple Aggregated (EPD) or Traditional Disagg (EP/D). (Source)

Request Cancellation: Cancellation during the remote prefill phase is not supported in disaggregated mode. (Source)

Speculative Decoding: Code hooks exist (spec_decode_stats in publisher), but no examples or documentation yet.

3. TensorRT-LLM Backend

TensorRT-LLM delivers maximum inference performance and optimization, with full KVBM integration and robust disaggregated serving support.

Source: docs/backends/trtllm/README.md

Feature	Disaggregated Serving	KV-Aware Routing	SLA-Based Planner	KV Block Manager	Multimodal	Request Migration	Request Cancellation	LoRA	Tool Calling	Speculative Decoding
Disaggregated Serving	—
KV-Aware Routing	✅	—
SLA-Based Planner	✅	✅	—
KV Block Manager	✅	✅	✅	—
Multimodal	✅¹	²	—	✅	—
Request Migration	✅	✅	✅	✅	🚧	—
Request Cancellation	✅³	✅³	✅³	✅³	✅³	✅³	—
LoRA								—
Tool Calling	✅	✅	✅	✅	✅	✅	✅		—
Speculative Decoding	✅	✅	—	✅	—	✅	✅		✅	—

Notes:

Multimodal Disaggregation: Fully supports EP/D (Traditional) pattern. E/P/D (Full Disaggregation) is WIP and currently supports pre-computed embeddings only. (Source)

Multimodal + KV-Aware Routing: Not supported. The KV router currently tracks token-based blocks only. (Source)

Request Cancellation: Due to known issues, the TensorRT-LLM engine is temporarily not notified of request cancellations, meaning allocated resources for cancelled requests are not freed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dynamo Feature Compatibility Matrices

Quick Comparison

1. vLLM Backend

2. SGLang Backend

3. TensorRT-LLM Backend

Source References

FilesExpand file tree

feature-matrix.md

Latest commit

History

feature-matrix.md

File metadata and controls

Dynamo Feature Compatibility Matrices

Quick Comparison

1. vLLM Backend

2. SGLang Backend

3. TensorRT-LLM Backend

Source References