Skip to content

Commit 26901e4

Browse files
chang-lQiJuneclaudezhenhuaw-me
authored
[TRTLLM-10612][feat] Initial support of AIGV models in TRTLLM (#11462)
Signed-off-by: Chang Liu (Enterprise Products) <liuc@nvidia.com> Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com> Signed-off-by: Zhenhua Wang <zhenhuaw@nvidia.com> Co-authored-by: Freddy Qi <junq@nvidia.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Zhenhua Wang <zhenhuaw@nvidia.com>
1 parent 19a3031 commit 26901e4

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

75 files changed

+19370
-199
lines changed

README.md

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,9 +5,6 @@ TensorRT LLM
55
<h4>TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports
66
state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs.</h4>
77

8-
🌟 TensorRT LLM is experimenting with Image&Video Generation models in [TensorRT-LLM/feat/visual_gen](https://github.com/NVIDIA/TensorRT-LLM/tree/feat/visual_gen/tensorrt_llm/visual_gen) branch.
9-
This branch is a prototype and not stable for production use. PRs are not accepted.
10-
118
[![Documentation](https://img.shields.io/badge/docs-latest-brightgreen.svg?style=flat)](https://nvidia.github.io/TensorRT-LLM/)
129
[![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/NVIDIA/TensorRT-LLM)
1310
[![python](https://img.shields.io/badge/python-3.12-green)](https://www.python.org/downloads/release/python-3123/)

examples/visual_gen/README.md

Lines changed: 172 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,172 @@
1+
# Visual Generation Examples
2+
3+
Quick reference for running visual generation models (WAN).
4+
5+
## Prerequisites
6+
7+
```bash
8+
# Install dependencies (from repository root)
9+
pip install -r requirements-dev.txt
10+
pip install git+https://github.com/huggingface/diffusers.git
11+
pip install av
12+
```
13+
14+
## Quick Start
15+
16+
```bash
17+
# Set MODEL_ROOT to your model directory (required for examples)
18+
export MODEL_ROOT=/llm-models
19+
# Optional: PROJECT_ROOT defaults to repo root when run from examples/visual_gen
20+
21+
# Run all examples (auto-detects GPUs)
22+
cd examples/visual_gen
23+
./visual_gen_examples.sh
24+
```
25+
26+
27+
## Environment Variables
28+
29+
| Variable | Default | Description |
30+
|----------|---------|-------------|
31+
| `PROJECT_ROOT` | Auto-detected | Path to repository root (set when running from `examples/visual_gen`) |
32+
| `MODEL_ROOT` | `/llm-models` | Path to model directory |
33+
| `TLLM_LOG_LEVEL` | `INFO` | Logging level |
34+
35+
---
36+
37+
## WAN (Text-to-Video)
38+
39+
### Basic Usage
40+
41+
**Single GPU:**
42+
```bash
43+
python visual_gen_wan_t2v.py \
44+
--model_path ${MODEL_ROOT}/Wan2.1-T2V-1.3B-Diffusers \
45+
--prompt "A cute cat playing piano" \
46+
--height 480 --width 832 --num_frames 33 \
47+
--output_path output.mp4
48+
```
49+
50+
**With TeaCache:**
51+
```bash
52+
python visual_gen_wan_t2v.py \
53+
--model_path ${MODEL_ROOT}/Wan2.1-T2V-1.3B-Diffusers \
54+
--prompt "A cute cat playing piano" \
55+
--height 480 --width 832 --num_frames 33 \
56+
--enable_teacache \
57+
--output_path output.mp4
58+
```
59+
60+
### Multi-GPU Parallelism
61+
62+
WAN supports two parallelism modes that can be combined:
63+
- **CFG Parallelism**: Split positive/negative prompts across GPUs
64+
- **Ulysses Parallelism**: Split sequence across GPUs for longer sequences
65+
66+
67+
**Ulysses Only (2 GPUs):**
68+
```bash
69+
python visual_gen_wan_t2v.py \
70+
--model_path ${MODEL_ROOT}/Wan2.1-T2V-1.3B-Diffusers \
71+
--prompt "A cute cat playing piano" \
72+
--height 480 --width 832 --num_frames 33 \
73+
--attention_backend TRTLLM \
74+
--cfg_size 1 --ulysses_size 2 \
75+
--output_path output.mp4
76+
```
77+
GPU Layout: GPU 0-1 share sequence (6 heads each)
78+
79+
**CFG Only (2 GPUs):**
80+
```bash
81+
python visual_gen_wan_t2v.py \
82+
--model_path ${MODEL_ROOT}/Wan2.1-T2V-1.3B-Diffusers \
83+
--prompt "A cute cat playing piano" \
84+
--height 480 --width 832 --num_frames 33 \
85+
--attention_backend TRTLLM \
86+
--cfg_size 2 --ulysses_size 1 \
87+
--output_path output.mp4
88+
```
89+
GPU Layout: GPU 0 (positive) | GPU 1 (negative)
90+
91+
**CFG + Ulysses (4 GPUs):**
92+
```bash
93+
python visual_gen_wan_t2v.py \
94+
--model_path ${MODEL_ROOT}/Wan2.1-T2V-1.3B-Diffusers \
95+
--prompt "A cute cat playing piano" \
96+
--height 480 --width 832 --num_frames 33 \
97+
--attention_backend TRTLLM \
98+
--cfg_size 2 --ulysses_size 2 \
99+
--output_path output.mp4
100+
```
101+
GPU Layout: GPU 0-1 (positive, Ulysses) | GPU 2-3 (negative, Ulysses)
102+
103+
**Large-Scale (8 GPUs):**
104+
```bash
105+
python visual_gen_wan_t2v.py \
106+
--model_path ${MODEL_ROOT}/Wan2.1-T2V-1.3B-Diffusers \
107+
--prompt "A cute cat playing piano" \
108+
--height 480 --width 832 --num_frames 33 \
109+
--attention_backend TRTLLM \
110+
--cfg_size 2 --ulysses_size 4 \
111+
--output_path output.mp4
112+
```
113+
GPU Layout: GPU 0-3 (positive) | GPU 4-7 (negative)
114+
115+
---
116+
117+
## Common Arguments
118+
119+
| Argument | WAN | Default | Description |
120+
|----------|-----|---------|-------------|
121+
| `--height` || 720 | Output height |
122+
| `--width` || 1280 | Output width |
123+
| `--num_frames` || 81 | Number of frames |
124+
| `--steps` || 50 | Denoising steps |
125+
| `--guidance_scale` || 5.0 | CFG guidance strength |
126+
| `--seed` || 42 | Random seed |
127+
| `--enable_teacache` || False | Cache optimization |
128+
| `--teacache_thresh` || 0.2 | TeaCache similarity threshold |
129+
| `--attention_backend` || VANILLA | VANILLA or TRTLLM |
130+
| `--cfg_size` || 1 | CFG parallelism |
131+
| `--ulysses_size` || 1 | Sequence parallelism |
132+
| `--linear_type` || default | Quantization type |
133+
134+
## Troubleshooting
135+
136+
**Out of Memory:**
137+
- Use quantization: `--linear_type trtllm-fp8-blockwise`
138+
- Reduce resolution or frames
139+
- Enable TeaCache: `--enable_teacache`
140+
- Use Ulysses parallelism with more GPUs
141+
142+
**Slow Inference:**
143+
- Enable TeaCache: `--enable_teacache`
144+
- Use TRTLLM backend: `--attention_backend TRTLLM`
145+
- Use multi-GPU: `--cfg_size 2` or `--ulysses_size 2`
146+
147+
**Import Errors:**
148+
- Run from repository root
149+
- Install necessary dependencies, e.g., `pip install -r requirements-dev.txt`
150+
151+
**Ulysses Errors:**
152+
- `ulysses_size` must divide 12 (WAN heads)
153+
- Total GPUs = `cfg_size × ulysses_size`
154+
- Sequence length must be divisible by `ulysses_size`
155+
156+
## Output Formats
157+
158+
- **WAN**: `.mp4` (video), `.gif` (animated), `.png` (single frame)
159+
160+
## Baseline Validation
161+
162+
Compare with official HuggingFace Diffusers implementation:
163+
164+
```bash
165+
# Run HuggingFace baselines
166+
./hf_examples.sh
167+
168+
# Or run individual models
169+
python hf_wan.py --model_path ${MODEL_ROOT}/Wan2.1-T2V-1.3B-Diffusers
170+
```
171+
172+
Compare outputs with same seed for correctness verification.

examples/visual_gen/cat_piano.png

445 KB
Loading

examples/visual_gen/hf_examples.sh

Lines changed: 128 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,128 @@
1+
#!/bin/bash
2+
# HuggingFace Baseline Tests - Official Diffusers Implementation
3+
#
4+
# Usage:
5+
# export PROJECT_ROOT=/path/to/tekit
6+
# export MODEL_ROOT=/path/to/models
7+
# ./hf_examples.sh
8+
#
9+
# Or inline:
10+
# PROJECT_ROOT=/workspace/gitlab/tekit-b200 MODEL_ROOT=/llm-models ./hf_examples.sh
11+
12+
set -e # Exit on error
13+
14+
# Environment variables with defaults
15+
PROJECT_ROOT=${PROJECT_ROOT:-"/workspace/gitlab/tekit-b200"}
16+
MODEL_ROOT=${MODEL_ROOT:-"/llm-models"}
17+
18+
# Log configuration
19+
export TLLM_LOG_LEVEL=${TLLM_LOG_LEVEL:-"INFO"}
20+
21+
echo "============================================"
22+
echo "HuggingFace Diffusers Baseline Tests"
23+
echo "============================================"
24+
echo "PROJECT_ROOT: $PROJECT_ROOT"
25+
echo "MODEL_ROOT: $MODEL_ROOT"
26+
echo "LOG_LEVEL: $TLLM_LOG_LEVEL"
27+
echo ""
28+
echo "Purpose: Establish baseline results using"
29+
echo " official diffusers implementations"
30+
echo "============================================"
31+
echo ""
32+
33+
# Check Python dependencies
34+
echo "Checking dependencies..."
35+
MISSING_DEPS=""
36+
37+
if ! python -c "import diffusers" 2>/dev/null; then
38+
echo "❌ ERROR: diffusers not found"
39+
MISSING_DEPS="$MISSING_DEPS diffusers"
40+
fi
41+
42+
if ! python -c "import torch" 2>/dev/null; then
43+
echo "❌ ERROR: torch not found"
44+
MISSING_DEPS="$MISSING_DEPS torch"
45+
fi
46+
47+
if [ -n "$MISSING_DEPS" ]; then
48+
echo ""
49+
echo "❌ Missing required dependencies:$MISSING_DEPS"
50+
echo "Install with: pip install$MISSING_DEPS"
51+
exit 1
52+
fi
53+
54+
echo "✅ All required dependencies found"
55+
echo ""
56+
57+
# Detect GPU
58+
if command -v nvidia-smi &> /dev/null; then
59+
GPU_COUNT=$(nvidia-smi --query-gpu=name --format=csv,noheader | wc -l)
60+
echo "Detected $GPU_COUNT GPU(s)"
61+
GPU_NAME=$(nvidia-smi --query-gpu=name --format=csv,noheader | head -1)
62+
echo "GPU: $GPU_NAME"
63+
else
64+
echo "⚠️ WARNING: nvidia-smi not found"
65+
echo " Continuing with CPU (very slow!)"
66+
GPU_COUNT=0
67+
fi
68+
echo ""
69+
70+
# Create output directory (in current directory)
71+
OUTPUT_DIR="./baseline_outputs"
72+
mkdir -p "$OUTPUT_DIR"
73+
echo "Output directory: $OUTPUT_DIR ($(pwd)/baseline_outputs)"
74+
echo ""
75+
76+
#############################################
77+
# WAN (Wan2.1) Baseline Test
78+
#############################################
79+
80+
echo "============================================"
81+
echo "1/1: WAN Baseline Test"
82+
echo "============================================"
83+
echo ""
84+
85+
WAN_MODEL="${MODEL_ROOT}/Wan2.1-T2V-1.3B-Diffusers/"
86+
WAN_OUTPUT="${OUTPUT_DIR}/wan_baseline.gif"
87+
88+
if [ -d "$WAN_MODEL" ]; then
89+
echo "Testing WAN with official diffusers..."
90+
python ${PROJECT_ROOT}/examples/visual_gen/hf_wan.py \
91+
--model_path "$WAN_MODEL" \
92+
--output_path "$WAN_OUTPUT" \
93+
--prompt "A cute cat playing piano" \
94+
--height 480 \
95+
--width 832 \
96+
--num_frames 33 \
97+
--steps 50 \
98+
--guidance_scale 7.0 \
99+
--seed 42
100+
echo ""
101+
echo "✅ WAN baseline test completed"
102+
echo " Output: $WAN_OUTPUT"
103+
else
104+
echo "⚠️ SKIPPED: WAN model not found at $WAN_MODEL"
105+
fi
106+
107+
echo ""
108+
109+
#############################################
110+
# Summary
111+
#############################################
112+
113+
echo "============================================"
114+
echo "Baseline Tests Complete!"
115+
echo "============================================"
116+
echo ""
117+
echo "Output files saved to: $OUTPUT_DIR"
118+
echo ""
119+
ls -lh "$OUTPUT_DIR" 2>/dev/null || echo "No outputs generated"
120+
echo ""
121+
echo "Next Steps:"
122+
echo " 1. Verify outputs are correct (images/videos generated)"
123+
echo " 2. Compare with custom implementation outputs"
124+
echo " 3. Use these as reference/baseline for debugging"
125+
echo ""
126+
echo "Comparison command:"
127+
echo " diff -r $OUTPUT_DIR <custom_implementation_outputs>"
128+
echo "============================================"

0 commit comments

Comments
 (0)