| title |
|---|
Examples |
For quick start instructions, see the TensorRT-LLM README. This document provides all deployment patterns for running TensorRT-LLM with Dynamo, including single-node, multi-node, and Kubernetes deployments.
For local/bare-metal development, start etcd and optionally NATS using Docker Compose:
docker compose -f deploy/docker-compose.yml up -dFor detailed information about the architecture and how KV-aware routing works, see the Router Guide.
cd $DYNAMO_HOME/examples/backends/trtllm
./launch/agg.shcd $DYNAMO_HOME/examples/backends/trtllm
./launch/agg_router.shcd $DYNAMO_HOME/examples/backends/trtllm
./launch/disagg.shcd $DYNAMO_HOME/examples/backends/trtllm
./launch/disagg_router.shcd $DYNAMO_HOME/examples/backends/trtllm
export AGG_ENGINE_ARGS=./engine_configs/deepseek-r1/agg/mtp/mtp_agg.yaml
export SERVED_MODEL_NAME="nvidia/DeepSeek-R1-FP4"
# nvidia/DeepSeek-R1-FP4 is a large model
export MODEL_PATH="nvidia/DeepSeek-R1-FP4"
./launch/agg.shFor comprehensive instructions on multinode serving, see the Multinode Examples guide. It provides step-by-step deployment examples and configuration tips for running Dynamo with TensorRT-LLM across multiple nodes. While the walkthrough uses DeepSeek-R1 as the model, you can easily adapt the process for any supported model by updating the relevant configuration files. You can see the Llama4 + Eagle guide to learn how to use these scripts when a single worker fits on a single node.
- Gemma3 with Sliding Window Attention
- GPT-OSS-120b — Reasoning model with tool calling support
For complete Kubernetes deployment instructions, configurations, and troubleshooting, see the TensorRT-LLM Kubernetes Deployment Guide.
For detailed instructions on running comprehensive performance sweeps across both aggregated and disaggregated serving configurations, see the TensorRT-LLM Benchmark Scripts for DeepSeek R1 model.
See the client section to learn how to send requests to the deployment.
To send a request to a multi-node deployment, target the node which is running `python3 -m dynamo.frontend `.To benchmark your deployment with AIPerf, see this utility script, configuring the
model name and host based on your deployment: perf.sh