NVIDIA
diff --git a/‎README.md‎
Lines changed: 0 additions & 3 deletions b/‎README.md‎
Lines changed: 0 additions & 3 deletions
diff --git a/‎examples/visual_gen/README.md‎
Lines changed: 172 additions & 0 deletions b/‎examples/visual_gen/README.md‎
Lines changed: 172 additions & 0 deletions
diff --git a/‎examples/visual_gen/cat_piano.png‎
445 KB b/‎examples/visual_gen/cat_piano.png‎
445 KB
diff --git a/‎examples/visual_gen/hf_examples.sh‎
Lines changed: 128 additions & 0 deletions b/‎examples/visual_gen/hf_examples.sh‎
Lines changed: 128 additions & 0 deletions
@@ -5,9 +5,6 @@ TensorRT LLM
 <h4>TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports
 state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs.</h4>
 
-🌟 TensorRT LLM is experimenting with Image&Video Generation models in [TensorRT-LLM/feat/visual_gen](https://github.com/NVIDIA/TensorRT-LLM/tree/feat/visual_gen/tensorrt_llm/visual_gen) branch.
-This branch is a prototype and not stable for production use. PRs are not accepted.
-
 [![Documentation](https://img.shields.io/badge/docs-latest-brightgreen.svg?style=flat)](https://nvidia.github.io/TensorRT-LLM/)
 [![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/NVIDIA/TensorRT-LLM)
 [![python](https://img.shields.io/badge/python-3.12-green)](https://www.python.org/downloads/release/python-3123/)
 
@@ -0,0 +1,172 @@
+# Visual Generation Examples
+
+Quick reference for running visual generation models (WAN).
+
+## Prerequisites
+
+```bash
+# Install dependencies (from repository root)
+pip install -r requirements-dev.txt
+pip install git+https://github.com/huggingface/diffusers.git
+pip install av
+```
+
+## Quick Start
+
+```bash
+# Set MODEL_ROOT to your model directory (required for examples)
+export MODEL_ROOT=/llm-models
+# Optional: PROJECT_ROOT defaults to repo root when run from examples/visual_gen
+
+# Run all examples (auto-detects GPUs)
+cd examples/visual_gen
+./visual_gen_examples.sh
+```
+
+
+## Environment Variables
+
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `PROJECT_ROOT` | Auto-detected | Path to repository root (set when running from `examples/visual_gen`) |
+| `MODEL_ROOT` | `/llm-models` | Path to model directory |
+| `TLLM_LOG_LEVEL` | `INFO` | Logging level |
+
+---
+
+## WAN (Text-to-Video)
+
+### Basic Usage
+
+**Single GPU:**
+```bash
+python visual_gen_wan_t2v.py \
+    --model_path ${MODEL_ROOT}/Wan2.1-T2V-1.3B-Diffusers \
+    --prompt "A cute cat playing piano" \
+    --height 480 --width 832 --num_frames 33 \
+    --output_path output.mp4
+```
+
+**With TeaCache:**
+```bash
+python visual_gen_wan_t2v.py \
+    --model_path ${MODEL_ROOT}/Wan2.1-T2V-1.3B-Diffusers \
+    --prompt "A cute cat playing piano" \
+    --height 480 --width 832 --num_frames 33 \
+    --enable_teacache \
+    --output_path output.mp4
+```
+
+### Multi-GPU Parallelism
+
+WAN supports two parallelism modes that can be combined:
+- **CFG Parallelism**: Split positive/negative prompts across GPUs
+- **Ulysses Parallelism**: Split sequence across GPUs for longer sequences
+
+
+**Ulysses Only (2 GPUs):**
+```bash
+python visual_gen_wan_t2v.py \
+    --model_path ${MODEL_ROOT}/Wan2.1-T2V-1.3B-Diffusers \
+    --prompt "A cute cat playing piano" \
+    --height 480 --width 832 --num_frames 33 \
+    --attention_backend TRTLLM \
+    --cfg_size 1 --ulysses_size 2 \
+    --output_path output.mp4
+```
+GPU Layout: GPU 0-1 share sequence (6 heads each)
+
+**CFG Only (2 GPUs):**
+```bash
+python visual_gen_wan_t2v.py \
+    --model_path ${MODEL_ROOT}/Wan2.1-T2V-1.3B-Diffusers \
+    --prompt "A cute cat playing piano" \
+    --height 480 --width 832 --num_frames 33 \
+    --attention_backend TRTLLM \
+    --cfg_size 2 --ulysses_size 1 \
+    --output_path output.mp4
+```
+GPU Layout: GPU 0 (positive) | GPU 1 (negative)
+
+**CFG + Ulysses (4 GPUs):**
+```bash
+python visual_gen_wan_t2v.py \
+    --model_path ${MODEL_ROOT}/Wan2.1-T2V-1.3B-Diffusers \
+    --prompt "A cute cat playing piano" \
+    --height 480 --width 832 --num_frames 33 \
+    --attention_backend TRTLLM \
+    --cfg_size 2 --ulysses_size 2 \
+    --output_path output.mp4
+```
+GPU Layout: GPU 0-1 (positive, Ulysses) | GPU 2-3 (negative, Ulysses)
+
+**Large-Scale (8 GPUs):**
+```bash
+python visual_gen_wan_t2v.py \
+    --model_path ${MODEL_ROOT}/Wan2.1-T2V-1.3B-Diffusers \
+    --prompt "A cute cat playing piano" \
+    --height 480 --width 832 --num_frames 33 \
+    --attention_backend TRTLLM \
+    --cfg_size 2 --ulysses_size 4 \
+    --output_path output.mp4
+```
+GPU Layout: GPU 0-3 (positive) | GPU 4-7 (negative)
+
+---
+
+## Common Arguments
+
+| Argument | WAN | Default | Description |
+|----------|-----|---------|-------------|
+| `--height` | ✓ | 720 | Output height |
+| `--width` | ✓ | 1280 | Output width |
+| `--num_frames` | ✓ | 81 | Number of frames |
+| `--steps` | ✓ | 50 | Denoising steps |
+| `--guidance_scale` | ✓ | 5.0 | CFG guidance strength |
+| `--seed` | ✓ | 42 | Random seed |
+| `--enable_teacache` | ✓ | False | Cache optimization |
+| `--teacache_thresh` | ✓ | 0.2 | TeaCache similarity threshold |
+| `--attention_backend` | ✓ | VANILLA | VANILLA or TRTLLM |
+| `--cfg_size` | ✓ | 1 | CFG parallelism |
+| `--ulysses_size` | ✓ | 1 | Sequence parallelism |
+| `--linear_type` | ✓ | default | Quantization type |
+
+## Troubleshooting
+
+**Out of Memory:**
+- Use quantization: `--linear_type trtllm-fp8-blockwise`
+- Reduce resolution or frames
+- Enable TeaCache: `--enable_teacache`
+- Use Ulysses parallelism with more GPUs
+
+**Slow Inference:**
+- Enable TeaCache: `--enable_teacache`
+- Use TRTLLM backend: `--attention_backend TRTLLM`
+- Use multi-GPU: `--cfg_size 2` or `--ulysses_size 2`
+
+**Import Errors:**
+- Run from repository root
+- Install necessary dependencies, e.g., `pip install -r requirements-dev.txt`
+
+**Ulysses Errors:**
+- `ulysses_size` must divide 12 (WAN heads)
+- Total GPUs = `cfg_size × ulysses_size`
+- Sequence length must be divisible by `ulysses_size`
+
+## Output Formats
+
+- **WAN**: `.mp4` (video), `.gif` (animated), `.png` (single frame)
+
+## Baseline Validation
+
+Compare with official HuggingFace Diffusers implementation:
+
+```bash
+# Run HuggingFace baselines
+./hf_examples.sh
+
+# Or run individual models
+python hf_wan.py --model_path ${MODEL_ROOT}/Wan2.1-T2V-1.3B-Diffusers
+```
+
+Compare outputs with same seed for correctness verification.
@@ -0,0 +1,128 @@
+#!/bin/bash
+# HuggingFace Baseline Tests - Official Diffusers Implementation
+#
+# Usage:
+#   export PROJECT_ROOT=/path/to/tekit
+#   export MODEL_ROOT=/path/to/models
+#   ./hf_examples.sh
+#
+# Or inline:
+#   PROJECT_ROOT=/workspace/gitlab/tekit-b200 MODEL_ROOT=/llm-models ./hf_examples.sh
+
+set -e  # Exit on error
+
+# Environment variables with defaults
+PROJECT_ROOT=${PROJECT_ROOT:-"/workspace/gitlab/tekit-b200"}
+MODEL_ROOT=${MODEL_ROOT:-"/llm-models"}
+
+# Log configuration
+export TLLM_LOG_LEVEL=${TLLM_LOG_LEVEL:-"INFO"}
+
+echo "============================================"
+echo "HuggingFace Diffusers Baseline Tests"
+echo "============================================"
+echo "PROJECT_ROOT: $PROJECT_ROOT"
+echo "MODEL_ROOT:   $MODEL_ROOT"
+echo "LOG_LEVEL:    $TLLM_LOG_LEVEL"
+echo ""
+echo "Purpose: Establish baseline results using"
+echo "         official diffusers implementations"
+echo "============================================"
+echo ""
+
+# Check Python dependencies
+echo "Checking dependencies..."
+MISSING_DEPS=""
+
+if ! python -c "import diffusers" 2>/dev/null; then
+    echo "❌ ERROR: diffusers not found"
+    MISSING_DEPS="$MISSING_DEPS diffusers"
+fi
+
+if ! python -c "import torch" 2>/dev/null; then
+    echo "❌ ERROR: torch not found"
+    MISSING_DEPS="$MISSING_DEPS torch"
+fi
+
+if [ -n "$MISSING_DEPS" ]; then
+    echo ""
+    echo "❌ Missing required dependencies:$MISSING_DEPS"
+    echo "Install with: pip install$MISSING_DEPS"
+    exit 1
+fi
+
+echo "✅ All required dependencies found"
+echo ""
+
+# Detect GPU
+if command -v nvidia-smi &> /dev/null; then
+    GPU_COUNT=$(nvidia-smi --query-gpu=name --format=csv,noheader | wc -l)
+    echo "Detected $GPU_COUNT GPU(s)"
+    GPU_NAME=$(nvidia-smi --query-gpu=name --format=csv,noheader | head -1)
+    echo "GPU: $GPU_NAME"
+else
+    echo "⚠️  WARNING: nvidia-smi not found"
+    echo "   Continuing with CPU (very slow!)"
+    GPU_COUNT=0
+fi
+echo ""
+
+# Create output directory (in current directory)
+OUTPUT_DIR="./baseline_outputs"
+mkdir -p "$OUTPUT_DIR"
+echo "Output directory: $OUTPUT_DIR ($(pwd)/baseline_outputs)"
+echo ""
+
+#############################################
+# WAN (Wan2.1) Baseline Test
+#############################################
+
+echo "============================================"
+echo "1/1: WAN Baseline Test"
+echo "============================================"
+echo ""
+
+WAN_MODEL="${MODEL_ROOT}/Wan2.1-T2V-1.3B-Diffusers/"
+WAN_OUTPUT="${OUTPUT_DIR}/wan_baseline.gif"
+
+if [ -d "$WAN_MODEL" ]; then
+    echo "Testing WAN with official diffusers..."
+    python ${PROJECT_ROOT}/examples/visual_gen/hf_wan.py \
+        --model_path "$WAN_MODEL" \
+        --output_path "$WAN_OUTPUT" \
+        --prompt "A cute cat playing piano" \
+        --height 480 \
+        --width 832 \
+        --num_frames 33 \
+        --steps 50 \
+        --guidance_scale 7.0 \
+        --seed 42
+    echo ""
+    echo "✅ WAN baseline test completed"
+    echo "   Output: $WAN_OUTPUT"
+else
+    echo "⚠️  SKIPPED: WAN model not found at $WAN_MODEL"
+fi
+
+echo ""
+
+#############################################
+# Summary
+#############################################
+
+echo "============================================"
+echo "Baseline Tests Complete!"
+echo "============================================"
+echo ""
+echo "Output files saved to: $OUTPUT_DIR"
+echo ""
+ls -lh "$OUTPUT_DIR" 2>/dev/null || echo "No outputs generated"
+echo ""
+echo "Next Steps:"
+echo "  1. Verify outputs are correct (images/videos generated)"
+echo "  2. Compare with custom implementation outputs"
+echo "  3. Use these as reference/baseline for debugging"
+echo ""
+echo "Comparison command:"
+echo "  diff -r $OUTPUT_DIR <custom_implementation_outputs>"
+echo "============================================"