Model Recipes

Quick Start for Popular Models

The table below contains trtllm-serve commands that can be used to easily deploy popular models including DeepSeek-R1, gpt-oss, Llama 4, Qwen3, and more.

We maintain LLM API configuration files for these models containing recommended performance settings in two locations:

Curated Examples: examples/configs/curated - Hand-picked configurations for common scenarios.
Comprehensive Database: examples/configs/database - A more comprehensive set of known-good configurations for various GPUs and traffic patterns.

The TensorRT LLM Docker container makes these config files available at /app/tensorrt_llm/examples/configs/curated and /app/tensorrt_llm/examples/configs/database respectively. You can reference them as needed:

export TRTLLM_DIR="/app/tensorrt_llm" # path to the TensorRT LLM repo in your local environment

This table is designed to provide a straightforward starting point; for detailed model-specific deployment guides, check out the guides below.

Model Name	GPU	Inference Scenario	Config	Command
Nemotron v3 Super (NVFP4)	B200, GB200	Max Throughput	nemotron-3-super-throughput.yaml	`trtllm-serve nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4 --config ${TRTLLM_DIR}/examples/configs/curated/nemotron-3-super-throughput.yaml`
DeepSeek-R1	H100, H200	Max Throughput	deepseek-r1-throughput.yaml	`trtllm-serve deepseek-ai/DeepSeek-R1-0528 --config ${TRTLLM_DIR}/examples/configs/curated/deepseek-r1-throughput.yaml`
DeepSeek-R1	B200, GB200	Max Throughput	deepseek-r1-deepgemm.yaml	`trtllm-serve deepseek-ai/DeepSeek-R1-0528 --config ${TRTLLM_DIR}/examples/configs/curated/deepseek-r1-deepgemm.yaml`
DeepSeek-R1 (NVFP4)	B200, GB200	Max Throughput	deepseek-r1-throughput.yaml	`trtllm-serve nvidia/DeepSeek-R1-FP4 --config ${TRTLLM_DIR}/examples/configs/curated/deepseek-r1-throughput.yaml`
DeepSeek-R1 (NVFP4)	B200, GB200	Min Latency	deepseek-r1-latency.yaml	`trtllm-serve nvidia/DeepSeek-R1-FP4-v2 --config ${TRTLLM_DIR}/examples/configs/curated/deepseek-r1-latency.yaml`
gpt-oss-120b	Any	Max Throughput	gpt-oss-120b-throughput.yaml	`trtllm-serve openai/gpt-oss-120b --config ${TRTLLM_DIR}/examples/configs/curated/gpt-oss-120b-throughput.yaml`
gpt-oss-120b	Any	Min Latency	gpt-oss-120b-latency.yaml	`trtllm-serve openai/gpt-oss-120b --config ${TRTLLM_DIR}/examples/configs/curated/gpt-oss-120b-latency.yaml`
Qwen3-Next-80B-A3B-Thinking	Any	Max Throughput	qwen3-next.yaml	`trtllm-serve Qwen/Qwen3-Next-80B-A3B-Thinking --config ${TRTLLM_DIR}/examples/configs/curated/qwen3-next.yaml`
Qwen3 family (e.g. Qwen3-30B-A3B)	Any	Max Throughput	qwen3.yaml	`trtllm-serve Qwen/Qwen3-30B-A3B --config ${TRTLLM_DIR}/examples/configs/curated/qwen3.yaml` (swap to another Qwen3 model name as needed)
Llama-3.3-70B (FP8)	Any	Max Throughput	llama-3.3-70b.yaml	`trtllm-serve nvidia/Llama-3.3-70B-Instruct-FP8 --config ${TRTLLM_DIR}/examples/configs/curated/llama-3.3-70b.yaml`
Llama 4 Scout (FP8)	Any	Max Throughput	llama-4-scout.yaml	`trtllm-serve nvidia/Llama-4-Scout-17B-16E-Instruct-FP8 --config ${TRTLLM_DIR}/examples/configs/curated/llama-4-scout.yaml`
Kimi-K2-Thinking (NVFP4)	B200, GB200	Max Throughput	kimi-k2-thinking.yaml	`trtllm-serve nvidia/Kimi-K2-Thinking-NVFP4 --config ${TRTLLM_DIR}/examples/configs/curated/kimi-k2-thinking.yaml`

Model-Specific Deployment Guides

The deployment guides below provide more detailed instructions for serving specific models with TensorRT LLM.

.. toctree::
   :maxdepth: 1
   :name: Deployment Guides

   deployment-guide-for-nemotron-3-super-on-trtllm.md
   deployment-guide-for-deepseek-r1-on-trtllm.md
   deployment-guide-for-llama3.3-70b-on-trtllm.md
   deployment-guide-for-llama4-scout-on-trtllm.md
   deployment-guide-for-gpt-oss-on-trtllm.md
   deployment-guide-for-qwen3-on-trtllm.md
   deployment-guide-for-qwen3-next-on-trtllm.md
   deployment-guide-for-kimi-k2-thinking-on-trtllm.md

Preconfigured Recipes

Recipe selector

.. trtllm_config_selector::

Recipe database

The table below lists all available pre-configured model scenarios in the TensorRT LLM configuration database. Each row represents a specific model, GPU, and performance profile combination with recommended request settings.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model Recipes

Quick Start for Popular Models

Model-Specific Deployment Guides

Preconfigured Recipes

Recipe selector

Recipe database

FilesExpand file tree

index.rst

Latest commit

History

index.rst

File metadata and controls

Model Recipes

Quick Start for Popular Models

Model-Specific Deployment Guides

Preconfigured Recipes

Recipe selector

Recipe database