vllm-project · congw729 · Feb 13, 2026 · chatgpt-codex-connector · Feb 13, 2026 · chatgpt-codex-connector
diff --git a/docs/.nav.yml b/docs/.nav.yml
@@ -25,11 +25,13 @@ nav:
     - Online Serving:
       - BAGEL-7B-MoT: user_guide/examples/online_serving/bagel.md
       - Image-To-Image: user_guide/examples/online_serving/image_to_image.md
+      - Image-To-Video: user_guide/examples/online_serving/image_to_video.md
       - LoRA Inference(Diffusion): user_guide/examples/online_serving/lora_inference.md
       - Qwen2.5-Omni: user_guide/examples/online_serving/qwen2_5_omni.md
       - Qwen3-Omni: user_guide/examples/online_serving/qwen3_omni.md
       - Qwen3-TTS: user_guide/examples/online_serving/qwen3_tts.md
       - Text-To-Image: user_guide/examples/online_serving/text_to_image.md
+      - Text-To-Video: user_guide/examples/online_serving/text_to_video.md
   - General:
     - usage/*
   - Configuration:
@@ -48,6 +50,7 @@ nav:
         - FP8: user_guide/diffusion/quantization/fp8.md
       - Parallelism Acceleration: user_guide/diffusion/parallelism_acceleration.md
       - CPU Offloading: user_guide/diffusion/cpu_offload_diffusion.md
+      - Custom Pipeline: features/custom_pipeline.md
     - ComfyUI: features/comfyui.md
 - Developer Guide:
   - General:

diff --git a/docs/api/README.md b/docs/api/README.md
@@ -92,8 +92,10 @@ Configuration classes.
 Worker classes and model runners for distributed inference.
 
 - [vllm_omni.diffusion.worker.diffusion_model_runner.DiffusionModelRunner][]
+- [vllm_omni.diffusion.worker.diffusion_worker.CustomPipelineWorkerExtension][]
 - [vllm_omni.diffusion.worker.diffusion_worker.DiffusionWorker][]
 - [vllm_omni.diffusion.worker.diffusion_worker.WorkerProc][]
+- [vllm_omni.diffusion.worker.diffusion_worker.WorkerWrapperBase][]
 - [vllm_omni.platforms.npu.worker.npu_ar_model_runner.ExecuteModelState][]
 - [vllm_omni.platforms.npu.worker.npu_ar_model_runner.NPUARModelRunner][]
 - [vllm_omni.platforms.npu.worker.npu_ar_worker.NPUARWorker][]

diff --git a/docs/user_guide/examples/offline_inference/bagel.md b/docs/user_guide/examples/offline_inference/bagel.md
@@ -154,6 +154,24 @@ The default yaml configuration deploys Thinker and DiT on the same GPU. You can
 
 ------
 
+#### Tensor Parallelism (TP)
+
+For larger models or multi-GPU environments, you can enable Tensor Parallelism (TP) by modifying the stage configuration (e.g., [`bagel.yaml`](https://github.com/vllm-project/vllm-omni/tree/main/vllm_omni/model_executor/stage_configs/bagel.yaml)).
+
+1. **Set `tensor_parallel_size`**: Increase this value (e.g., to `2` or `4`).
+2. **Set `devices`**: Specify the comma-separated GPU IDs to be used for the stage (e.g., `"0,1"`).
+
+Example configuration for TP=2 on GPUs 0 and 1:
+```yaml
+    engine_args:
+      tensor_parallel_size: 2
+      ...
+    runtime:
+      devices: "0,1"
+```
+
+------
+
 #### 🔗 Runtime Configuration
 
 | Parameter             | Value   | Description                      |

diff --git a/docs/user_guide/examples/online_serving/bagel.md b/docs/user_guide/examples/online_serving/bagel.md
@@ -25,12 +25,29 @@ cd /workspace/vllm-omni/examples/online_serving/bagel
 bash run_server.sh
 ```
 
-If you have a custom stage configs file, launch the server with the command below:
-
 ```bash
 vllm serve ByteDance-Seed/BAGEL-7B-MoT --omni --port 8091 --stage-configs-path /path/to/stage_configs_file
 ```
 
+#### 🚀 Tensor Parallelism (TP)
+
+For larger models or multi-GPU environments, you can enable Tensor Parallelism (TP) for the server.
+
+1. **Modify Stage Config**: Create or modify a stage configuration yaml (e.g., [`bagel.yaml`](https://github.com/vllm-project/vllm-omni/tree/main/vllm_omni/model_executor/stage_configs/bagel.yaml)). Set `tensor_parallel_size` to `2` (or more) and update `devices` to include multiple GPU IDs (e.g., `"0,1"`).
+
+```yaml
+    engine_args:
+      tensor_parallel_size: 2
+      ...
+    runtime:
+      devices: "0,1"
+```
+
+2. **Launch Server**:
+```bash
+vllm serve ByteDance-Seed/BAGEL-7B-MoT --omni --port 8091 --stage-configs-path /path/to/your/custom_bagel.yaml
+```
+
 ### Send Multi-modal Request
 
 Get into the bagel folder:

diff --git a/docs/user_guide/examples/online_serving/image_to_video.md b/docs/user_guide/examples/online_serving/image_to_video.md
@@ -0,0 +1,97 @@
+# Image-To-Video
+
+Source <https://github.com/vllm-project/vllm-omni/tree/main/examples/online_serving/image_to_video>.
+
+
+This example demonstrates how to deploy the Wan2.2 image-to-video model for online video generation using vLLM-Omni.
+
+## Start Server
+
+### Basic Start
+
+```bash
+vllm serve Wan-AI/Wan2.2-I2V-A14B-Diffusers --omni --port 8091
+```
+
+### Start with Parameters
+
+Or use the startup script:
+
+```bash
+bash run_server.sh
+```
+
+The script allows overriding:
+- `MODEL` (default: `Wan-AI/Wan2.2-I2V-A14B-Diffusers`)
+- `PORT` (default: `8091`)
+- `BOUNDARY_RATIO` (default: `0.875`)
+- `FLOW_SHIFT` (default: `12.0`)
+- `CACHE_BACKEND` (default: `none`)
+- `ENABLE_CACHE_DIT_SUMMARY` (default: `0`)
+
+## API Calls
+
+### Method 1: Using curl
+
+```bash
+# Basic image-to-video generation
+bash run_curl_image_to_video.sh
+
+# Or execute directly (OpenAI-style multipart)
+curl -X POST http://localhost:8091/v1/videos \
+  -H "Accept: application/json" \
+  -F "prompt=A bear playing with yarn, smooth motion" \
+  -F "negative_prompt=low quality, blurry, static" \
+  -F "input_reference=@/path/to/qwen-bear.png" \
+  -F "width=832" \
+  -F "height=480" \
+  -F "num_frames=33" \
+  -F "fps=16" \
+  -F "num_inference_steps=40" \
+  -F "guidance_scale=1.0" \
+  -F "guidance_scale_2=1.0" \
+  -F "boundary_ratio=0.875" \
+  -F "flow_shift=12.0" \
+  -F "seed=42" | jq -r '.data[0].b64_json' | base64 -d > wan22_i2v_output.mp4
+```
+
+## Request Format
+
+### Required Fields
+
+```bash
+curl -X POST http://localhost:8091/v1/videos \
+  -F "prompt=A bear playing with yarn, smooth motion" \
+  -F "negative_prompt=low quality, blurry, static" \
+  -F "input_reference=@/path/to/qwen-bear.png"
+```
+
+### Generation with Parameters
+
+```bash
+curl -X POST http://localhost:8091/v1/videos \
+  -F "prompt=A bear playing with yarn, smooth motion" \
+  -F "negative_prompt=low quality, blurry, static" \
+  -F "input_reference=@/path/to/qwen-bear.png" \
+  -F "width=832" \
+  -F "height=480" \
+  -F "num_frames=33" \
+  -F "fps=16" \
+  -F "num_inference_steps=40" \
+  -F "guidance_scale=1.0" \
+  -F "guidance_scale_2=1.0" \
+  -F "boundary_ratio=0.875" \
+  -F "flow_shift=12.0" \
+  -F "seed=42"
+```
+
+## Example materials
+
+??? abstract "run_curl_image_to_video.sh"
+    ``````sh
+    --8<-- "examples/online_serving/image_to_video/run_curl_image_to_video.sh"
+    ``````
+??? abstract "run_server.sh"
+    ``````sh
+    --8<-- "examples/online_serving/image_to_video/run_server.sh"
+    ``````
diff --git a/docs/user_guide/examples/online_serving/qwen2_5_omni.md b/docs/user_guide/examples/online_serving/qwen2_5_omni.md
@@ -30,7 +30,7 @@ cd examples/online_serving/qwen2_5_omni
 #### Send request via python
 
 ```bash
-python openai_chat_completion_client_for_multimodal_generation.py --query-type mixed_modalities
+python openai_chat_completion_client_for_multimodal_generation.py --query-type mixed_modalities --port 8091 --host "localhost"
 ```
 
 The Python client supports the following command-line arguments:

diff --git a/docs/user_guide/examples/online_serving/qwen3_omni.md b/docs/user_guide/examples/online_serving/qwen3_omni.md
@@ -36,7 +36,7 @@ cd examples/online_serving/qwen3_omni
 ####  Send request via python
 
 ```bash
-python openai_chat_completion_client_for_multimodal_generation.py --query-type use_image
+python openai_chat_completion_client_for_multimodal_generation.py --query-type use_image --port 8091 --host "localhost"
 ```
 
 The Python client supports the following command-line arguments: