diff --git a/docs/.nav.yml b/docs/.nav.yml
index 07db1b4651..73edba1f40 100644
--- a/docs/.nav.yml
+++ b/docs/.nav.yml
@@ -25,11 +25,13 @@ nav:
     - Online Serving:
       - BAGEL-7B-MoT: user_guide/examples/online_serving/bagel.md
       - Image-To-Image: user_guide/examples/online_serving/image_to_image.md
+      - Image-To-Video: user_guide/examples/online_serving/image_to_video.md
       - LoRA Inference(Diffusion): user_guide/examples/online_serving/lora_inference.md
       - Qwen2.5-Omni: user_guide/examples/online_serving/qwen2_5_omni.md
       - Qwen3-Omni: user_guide/examples/online_serving/qwen3_omni.md
       - Qwen3-TTS: user_guide/examples/online_serving/qwen3_tts.md
       - Text-To-Image: user_guide/examples/online_serving/text_to_image.md
+      - Text-To-Video: user_guide/examples/online_serving/text_to_video.md
   - General:
     - usage/*
   - Configuration:
@@ -48,6 +50,7 @@ nav:
         - FP8: user_guide/diffusion/quantization/fp8.md
       - Parallelism Acceleration: user_guide/diffusion/parallelism_acceleration.md
       - CPU Offloading: user_guide/diffusion/cpu_offload_diffusion.md
+      - Custom Pipeline: features/custom_pipeline.md
     - ComfyUI: features/comfyui.md
 - Developer Guide:
   - General:
diff --git a/docs/api/README.md b/docs/api/README.md
index b4ec398f24..85beca0260 100644
--- a/docs/api/README.md
+++ b/docs/api/README.md
@@ -92,8 +92,10 @@ Configuration classes.
 Worker classes and model runners for distributed inference.
 
 - [vllm_omni.diffusion.worker.diffusion_model_runner.DiffusionModelRunner][]
+- [vllm_omni.diffusion.worker.diffusion_worker.CustomPipelineWorkerExtension][]
 - [vllm_omni.diffusion.worker.diffusion_worker.DiffusionWorker][]
 - [vllm_omni.diffusion.worker.diffusion_worker.WorkerProc][]
+- [vllm_omni.diffusion.worker.diffusion_worker.WorkerWrapperBase][]
 - [vllm_omni.platforms.npu.worker.npu_ar_model_runner.ExecuteModelState][]
 - [vllm_omni.platforms.npu.worker.npu_ar_model_runner.NPUARModelRunner][]
 - [vllm_omni.platforms.npu.worker.npu_ar_worker.NPUARWorker][]
diff --git a/docs/user_guide/examples/offline_inference/bagel.md b/docs/user_guide/examples/offline_inference/bagel.md
index 8bab929779..84e2e703c8 100644
--- a/docs/user_guide/examples/offline_inference/bagel.md
+++ b/docs/user_guide/examples/offline_inference/bagel.md
@@ -154,6 +154,24 @@ The default yaml configuration deploys Thinker and DiT on the same GPU. You can
 
 ------
 
+#### Tensor Parallelism (TP)
+
+For larger models or multi-GPU environments, you can enable Tensor Parallelism (TP) by modifying the stage configuration (e.g., [`bagel.yaml`](https://github.com/vllm-project/vllm-omni/tree/main/vllm_omni/model_executor/stage_configs/bagel.yaml)).
+
+1. **Set `tensor_parallel_size`**: Increase this value (e.g., to `2` or `4`).
+2. **Set `devices`**: Specify the comma-separated GPU IDs to be used for the stage (e.g., `"0,1"`).
+
+Example configuration for TP=2 on GPUs 0 and 1:
+```yaml
+    engine_args:
+      tensor_parallel_size: 2
+      ...
+    runtime:
+      devices: "0,1"
+```
+
+------
+
 #### 🔗 Runtime Configuration
 
 | Parameter             | Value   | Description                      |
diff --git a/docs/user_guide/examples/online_serving/bagel.md b/docs/user_guide/examples/online_serving/bagel.md
index f29c095f4f..cc1973e6cf 100644
--- a/docs/user_guide/examples/online_serving/bagel.md
+++ b/docs/user_guide/examples/online_serving/bagel.md
@@ -25,12 +25,29 @@ cd /workspace/vllm-omni/examples/online_serving/bagel
 bash run_server.sh
 ```
 
-If you have a custom stage configs file, launch the server with the command below:
-
 ```bash
 vllm serve ByteDance-Seed/BAGEL-7B-MoT --omni --port 8091 --stage-configs-path /path/to/stage_configs_file
 ```
 
+#### 🚀 Tensor Parallelism (TP)
+
+For larger models or multi-GPU environments, you can enable Tensor Parallelism (TP) for the server.
+
+1. **Modify Stage Config**: Create or modify a stage configuration yaml (e.g., [`bagel.yaml`](https://github.com/vllm-project/vllm-omni/tree/main/vllm_omni/model_executor/stage_configs/bagel.yaml)). Set `tensor_parallel_size` to `2` (or more) and update `devices` to include multiple GPU IDs (e.g., `"0,1"`).
+
+```yaml
+    engine_args:
+      tensor_parallel_size: 2
+      ...
+    runtime:
+      devices: "0,1"
+```
+
+2. **Launch Server**:
+```bash
+vllm serve ByteDance-Seed/BAGEL-7B-MoT --omni --port 8091 --stage-configs-path /path/to/your/custom_bagel.yaml
+```
+
 ### Send Multi-modal Request
 
 Get into the bagel folder:
diff --git a/docs/user_guide/examples/online_serving/image_to_video.md b/docs/user_guide/examples/online_serving/image_to_video.md
new file mode 100644
index 0000000000..dcbb100671
--- /dev/null
+++ b/docs/user_guide/examples/online_serving/image_to_video.md
@@ -0,0 +1,97 @@
+# Image-To-Video
+
+Source <https://github.com/vllm-project/vllm-omni/tree/main/examples/online_serving/image_to_video>.
+
+
+This example demonstrates how to deploy the Wan2.2 image-to-video model for online video generation using vLLM-Omni.
+
+## Start Server
+
+### Basic Start
+
+```bash
+vllm serve Wan-AI/Wan2.2-I2V-A14B-Diffusers --omni --port 8091
+```
+
+### Start with Parameters
+
+Or use the startup script:
+
+```bash
+bash run_server.sh
+```
+
+The script allows overriding:
+- `MODEL` (default: `Wan-AI/Wan2.2-I2V-A14B-Diffusers`)
+- `PORT` (default: `8091`)
+- `BOUNDARY_RATIO` (default: `0.875`)
+- `FLOW_SHIFT` (default: `12.0`)
+- `CACHE_BACKEND` (default: `none`)
+- `ENABLE_CACHE_DIT_SUMMARY` (default: `0`)
+
+## API Calls
+
+### Method 1: Using curl
+
+```bash
+# Basic image-to-video generation
+bash run_curl_image_to_video.sh
+
+# Or execute directly (OpenAI-style multipart)
+curl -X POST http://localhost:8091/v1/videos \
+  -H "Accept: application/json" \
+  -F "prompt=A bear playing with yarn, smooth motion" \
+  -F "negative_prompt=low quality, blurry, static" \
+  -F "input_reference=@/path/to/qwen-bear.png" \
+  -F "width=832" \
+  -F "height=480" \
+  -F "num_frames=33" \
+  -F "fps=16" \
+  -F "num_inference_steps=40" \
+  -F "guidance_scale=1.0" \
+  -F "guidance_scale_2=1.0" \
+  -F "boundary_ratio=0.875" \
+  -F "flow_shift=12.0" \
+  -F "seed=42" | jq -r '.data[0].b64_json' | base64 -d > wan22_i2v_output.mp4
+```
+
+## Request Format
+
+### Required Fields
+
+```bash
+curl -X POST http://localhost:8091/v1/videos \
+  -F "prompt=A bear playing with yarn, smooth motion" \
+  -F "negative_prompt=low quality, blurry, static" \
+  -F "input_reference=@/path/to/qwen-bear.png"
+```
+
+### Generation with Parameters
+
+```bash
+curl -X POST http://localhost:8091/v1/videos \
+  -F "prompt=A bear playing with yarn, smooth motion" \
+  -F "negative_prompt=low quality, blurry, static" \
+  -F "input_reference=@/path/to/qwen-bear.png" \
+  -F "width=832" \
+  -F "height=480" \
+  -F "num_frames=33" \
+  -F "fps=16" \
+  -F "num_inference_steps=40" \
+  -F "guidance_scale=1.0" \
+  -F "guidance_scale_2=1.0" \
+  -F "boundary_ratio=0.875" \
+  -F "flow_shift=12.0" \
+  -F "seed=42"
+```
+
+## Example materials
+
+??? abstract "run_curl_image_to_video.sh"
+    ``````sh
+    --8<-- "examples/online_serving/image_to_video/run_curl_image_to_video.sh"
+    ``````
+??? abstract "run_server.sh"
+    ``````sh
+    --8<-- "examples/online_serving/image_to_video/run_server.sh"
+    ``````
diff --git a/docs/user_guide/examples/online_serving/qwen2_5_omni.md b/docs/user_guide/examples/online_serving/qwen2_5_omni.md
index 361044ed8f..976d39a966 100644
--- a/docs/user_guide/examples/online_serving/qwen2_5_omni.md
+++ b/docs/user_guide/examples/online_serving/qwen2_5_omni.md
@@ -30,7 +30,7 @@ cd examples/online_serving/qwen2_5_omni
 #### Send request via python
 
 ```bash
-python openai_chat_completion_client_for_multimodal_generation.py --query-type mixed_modalities
+python openai_chat_completion_client_for_multimodal_generation.py --query-type mixed_modalities --port 8091 --host "localhost"
 ```
 
 The Python client supports the following command-line arguments:
diff --git a/docs/user_guide/examples/online_serving/qwen3_omni.md b/docs/user_guide/examples/online_serving/qwen3_omni.md
index d262465339..c903215f2c 100644
--- a/docs/user_guide/examples/online_serving/qwen3_omni.md
+++ b/docs/user_guide/examples/online_serving/qwen3_omni.md
@@ -36,7 +36,7 @@ cd examples/online_serving/qwen3_omni
 ####  Send request via python
 
 ```bash
-python openai_chat_completion_client_for_multimodal_generation.py --query-type use_image
+python openai_chat_completion_client_for_multimodal_generation.py --query-type use_image --port 8091 --host "localhost"
 ```
 
 The Python client supports the following command-line arguments:
diff --git a/docs/user_guide/examples/online_serving/qwen3_tts.md b/docs/user_guide/examples/online_serving/qwen3_tts.md
index f899e362ee..ca4ad6e36f 100644
--- a/docs/user_guide/examples/online_serving/qwen3_tts.md
+++ b/docs/user_guide/examples/online_serving/qwen3_tts.md
@@ -3,9 +3,7 @@
 Source <https://github.com/vllm-project/vllm-omni/tree/main/examples/online_serving/qwen3_tts>.
 
 
-## 🛠️ Installation
-
-Please refer to [README.md](https://github.com/vllm-project/vllm-omni/tree/main/README.md)
+This directory contains examples for running Qwen3-TTS models with vLLM-Omni's online serving API.
 
 ## Supported Models
 
@@ -17,62 +15,29 @@ Please refer to [README.md](https://github.com/vllm-project/vllm-omni/tree/main/
 | `Qwen/Qwen3-TTS-12Hz-0.6B-CustomVoice` | CustomVoice | Smaller/faster variant |
 | `Qwen/Qwen3-TTS-12Hz-0.6B-Base` | Base | Smaller/faster variant for voice cloning |
 
-## Run examples (Qwen3-TTS)
+## Quick Start
 
-### Launch the Server
+### 1. Start the Server
 
 ```bash
-# CustomVoice model (predefined speakers)
-vllm serve Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice \
-    --stage-configs-path vllm_omni/model_executor/stage_configs/qwen3_tts.yaml \
-    --omni \
-    --port 8091 \
-    --trust-remote-code \
-    --enforce-eager
+# CustomVoice model (default)
+./run_server.sh
 
-# VoiceDesign model
-vllm serve Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign \
-    --stage-configs-path vllm_omni/model_executor/stage_configs/qwen3_tts.yaml \
-    --omni \
-    --port 8091 \
-    --trust-remote-code \
-    --enforce-eager
-
-# Base model (voice cloning)
-vllm serve Qwen/Qwen3-TTS-12Hz-1.7B-Base \
-    --stage-configs-path vllm_omni/model_executor/stage_configs/qwen3_tts.yaml \
-    --omni \
-    --port 8091 \
-    --trust-remote-code \
-    --enforce-eager
+# Or specify task type
+./run_server.sh CustomVoice
+./run_server.sh VoiceDesign
+./run_server.sh Base
 ```
 
-If you have custom stage configs file, launch the server with command below
-```bash
-vllm serve Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice \
-    --stage-configs-path /path/to/stage_configs_file \
-    --omni \
-    --port 8091 \
-    --trust-remote-code \
-    --enforce-eager
-```
+Or launch directly with vllm serve:
 
-Alternatively, use the convenience script:
 ```bash
-./run_server.sh                  # Default: CustomVoice model
-./run_server.sh CustomVoice      # CustomVoice model
-./run_server.sh VoiceDesign      # VoiceDesign model
-./run_server.sh Base             # Base (voice clone) model
-```
-
-### Send TTS Request
-
-Get into the example folder
-```bash
-cd examples/online_serving/qwen3_tts
+vllm serve Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice \
+    --stage-configs-path vllm_omni/model_executor/stage_configs/qwen3_tts.yaml \
+    --omni --port 8091 --trust-remote-code --enforce-eager
 ```
 
-####  Send request via python
+### 2. Run the Client
 
 ```bash
 # CustomVoice: Use predefined speaker
@@ -103,21 +68,7 @@ python openai_speech_client.py \
     --ref-text "Original transcript of the reference audio"
 ```
 
-The Python client supports the following command-line arguments:
-
-- `--api-base`: API base URL (default: `http://localhost:8091`)
-- `--model` (or `-m`): Model name/path (default: `Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice`)
-- `--task-type` (or `-t`): TTS task type. Options: `CustomVoice`, `VoiceDesign`, `Base`
-- `--text`: Text to synthesize (required)
-- `--voice`: Speaker/voice name (default: `vivian`). Options: `vivian`, `ryan`, `aiden`, etc.
-- `--language`: Language. Options: `Auto`, `Chinese`, `English`, `Japanese`, `Korean`, `German`, `French`, `Russian`, `Portuguese`, `Spanish`, `Italian`
-- `--instructions`: Voice style/emotion instructions
-- `--ref-audio`: Reference audio file path or URL for voice cloning (Base task)
-- `--ref-text`: Reference audio transcript for voice cloning (Base task)
-- `--response-format`: Audio output format (default: `wav`). Options: `wav`, `mp3`, `flac`, `pcm`, `aac`, `opus`
-- `--output` (or `-o`): Output audio file path (default: `tts_output.wav`)
-
-####  Send request via curl
+### 3. Using curl
 
 ```bash
 # Simple TTS request
@@ -142,56 +93,12 @@ curl -X POST http://localhost:8091/v1/audio/speech \
 curl http://localhost:8091/v1/audio/voices
 ```
 
-### Using OpenAI SDK
-
-```python
-from openai import OpenAI
-
-client = OpenAI(base_url="http://localhost:8091/v1", api_key="none")
-
-response = client.audio.speech.create(
-    model="Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice",
-    voice="vivian",
-    input="Hello, how are you?",
-)
-
-response.stream_to_file("output.wav")
-```
-
-### Using Python httpx
-
-```python
-import httpx
-
-response = httpx.post(
-    "http://localhost:8091/v1/audio/speech",
-    json={
-        "input": "Hello, how are you?",
-        "voice": "vivian",
-        "language": "English",
-    },
-    timeout=300.0,
-)
-
-with open("output.wav", "wb") as f:
-    f.write(response.content)
-```
-
-### FAQ
-
-If you encounter error about backend of librosa, try to install ffmpeg with command below.
-```
-sudo apt update
-sudo apt install ffmpeg
-```
-
 ## API Reference
 
 ### Endpoint
 
 ```
 POST /v1/audio/speech
-Content-Type: application/json
 ```
 
 This endpoint follows the [OpenAI Audio Speech API](https://platform.openai.com/docs/api-reference/audio/createSpeech) format with additional Qwen3-TTS parameters.
@@ -231,7 +138,7 @@ Lists available voices for the loaded model:
 
 ### Response
 
-Returns binary audio data with appropriate `Content-Type` header (e.g., `audio/wav`).
+Returns audio data in the requested format (default: WAV).
 
 ## Parameters
 
@@ -258,11 +165,48 @@ Returns binary audio data with appropriate `Content-Type` header (e.g., `audio/w
 
 ### Voice Clone Parameters (Base task)
 
-| Parameter | Type | Required | Description |
+| Parameter | Type | Default | Description |
 |-----------|------|----------|-------------|
-| `ref_audio` | string | **Yes** | Reference audio (URL or base64 data URL) |
-| `ref_text` | string | No | Transcript of reference audio (for ICL mode) |
-| `x_vector_only_mode` | bool | No | Use speaker embedding only (no ICL) |
+| `ref_audio` | string | null | Reference audio (URL or base64 data URL) |
+| `ref_text` | string | null | Transcript of reference audio |
+| `x_vector_only_mode` | bool | null | Use speaker embedding only (no ICL) |
+
+## Python Usage
+
+### Using OpenAI SDK
+
+```python
+from openai import OpenAI
+
+client = OpenAI(base_url="http://localhost:8091/v1", api_key="none")
+
+response = client.audio.speech.create(
+    model="Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice",
+    voice="vivian",
+    input="Hello, how are you?",
+)
+
+response.stream_to_file("output.wav")
+```
+
+### Using httpx
+
+```python
+import httpx
+
+response = httpx.post(
+    "http://localhost:8091/v1/audio/speech",
+    json={
+        "input": "Hello, how are you?",
+        "voice": "vivian",
+        "language": "English",
+    },
+    timeout=300.0,
+)
+
+with open("output.wav", "wb") as f:
+    f.write(response.content)
+```
 
 ## Limitations
 
@@ -271,7 +215,7 @@ Returns binary audio data with appropriate `Content-Type` header (e.g., `audio/w
 
 ## Troubleshooting
 
-1. **TTS model did not produce audio output**: Ensure you're using the correct model variant for your task type (CustomVoice task → CustomVoice model, etc.)
+1. **"TTS model did not produce audio output"**: Ensure you're using the correct model variant for your task type (CustomVoice task → CustomVoice model, etc.)
 2. **Connection refused**: Make sure the server is running on the correct port
 3. **Out of memory**: Use smaller model variant (`Qwen/Qwen3-TTS-12Hz-0.6B-CustomVoice`) or reduce `--gpu-memory-utilization`
 4. **Unsupported speaker**: Use `/v1/audio/voices` to list available voices for the loaded model
diff --git a/docs/user_guide/examples/online_serving/text_to_video.md b/docs/user_guide/examples/online_serving/text_to_video.md
new file mode 100644
index 0000000000..a632c49a73
--- /dev/null
+++ b/docs/user_guide/examples/online_serving/text_to_video.md
@@ -0,0 +1,130 @@
+# Text-To-Video
+
+Source <https://github.com/vllm-project/vllm-omni/tree/main/examples/online_serving/text_to_video>.
+
+
+This example demonstrates how to deploy the Wan2.2 text-to-video model for online video generation using vLLM-Omni.
+
+## Start Server
+
+### Basic Start
+
+```bash
+vllm serve Wan-AI/Wan2.2-T2V-A14B-Diffusers --omni --port 8091
+```
+
+### Start with Parameters
+
+Or use the startup script:
+
+```bash
+bash run_server.sh
+```
+
+The script allows overriding:
+- `MODEL` (default: `Wan-AI/Wan2.2-T2V-A14B-Diffusers`)
+- `PORT` (default: `8091`)
+- `BOUNDARY_RATIO` (default: `0.875`)
+- `FLOW_SHIFT` (default: `5.0`)
+- `CACHE_BACKEND` (default: `none`)
+- `ENABLE_CACHE_DIT_SUMMARY` (default: `0`)
+
+## API Calls
+
+### Method 1: Using curl
+
+```bash
+# Basic text-to-video generation
+bash run_curl_text_to_video.sh
+
+# Or execute directly (OpenAI-style multipart)
+curl -s http://localhost:8091/v1/videos \
+  -H "Accept: application/json" \
+  -F "prompt=Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage." \
+  -F "width=832" \
+  -F "height=480" \
+  -F "num_frames=33" \
+  -F "negative_prompt=色调艳丽 ，过曝，静态，细节模糊不清，字幕，风格，作品，画作，画面，静止，整体发灰，最差质量，低质量，JPEG压缩残留，丑陋的，残缺的，多余的手指，画得不好的手部，画得不好的脸部，畸形的，毁容的，形态畸形的肢体，手指融合，静止不动的画面，杂乱的背景，三条腿，背景人很多，倒着走" \
+  -F "fps=16" \
+  -F "num_inference_steps=40" \
+  -F "guidance_scale=4.0" \
+  -F "guidance_scale_2=4.0" \
+  -F "boundary_ratio=0.875" \
+  -F "seed=42" | jq -r '.data[0].b64_json' | base64 -d > wan22_output.mp4
+```
+
+## Request Format
+
+### Simple Text-to-Video Generation
+
+```bash
+curl -X POST http://localhost:8091/v1/videos \
+  -F "prompt=A cinematic view of a futuristic city at sunset"
+```
+
+### Generation with Parameters
+
+```bash
+curl -X POST http://localhost:8091/v1/videos \
+  -F "prompt=A cinematic view of a futuristic city at sunset" \
+  -F "width=832" \
+  -F "height=480" \
+  -F "num_frames=33" \
+  -F "negative_prompt=low quality, blurry, static" \
+  -F "fps=16" \
+  -F "num_inference_steps=40" \
+  -F "guidance_scale=4.0" \
+  -F "guidance_scale_2=4.0" \
+  -F "boundary_ratio=0.875" \
+  -F "flow_shift=5.0" \
+  -F "seed=42"
+```
+
+## Generation Parameters
+
+| Parameter             | Type   | Default | Description                                      |
+| --------------------- | ------ | ------- | ------------------------------------------------ |
+| `prompt`              | str    | -       | Text description of the desired video            |
+| `negative_prompt`     | str    | None    | Negative prompt                                  |
+| `n`                   | int    | 1       | Number of videos to generate                     |
+| `width`               | int    | None    | Video width in pixels                            |
+| `height`              | int    | None    | Video height in pixels                           |
+| `num_frames`          | int    | None    | Number of frames to generate                     |
+| `fps`                 | int    | None    | Frames per second for output video               |
+| `num_inference_steps` | int    | None    | Number of denoising steps                        |
+| `guidance_scale`      | float  | None    | CFG guidance scale (low-noise stage)             |
+| `guidance_scale_2`    | float  | None    | CFG guidance scale (high-noise stage, Wan2.2)     |
+| `boundary_ratio`      | float  | None    | Boundary split ratio for low/high DiT (Wan2.2)   |
+| `flow_shift`          | float  | None    | Scheduler flow shift (Wan2.2)                    |
+| `seed`                | int    | None    | Random seed (reproducible)                       |
+| `lora`                | object | None    | LoRA configuration                               |
+| `extra_body`          | object | None    | Model-specific extra parameters                  |
+
+## Response Format
+
+```json
+{
+  "created": 1234567890,
+  "data": [
+    { "b64_json": "<base64-mp4>" }
+  ]
+}
+```
+
+## Extract Video
+
+```bash
+# Extract base64 from response and decode to video
+cat response.json | jq -r '.data[0].b64_json' | base64 -d > wan22_output.mp4
+```
+
+## Example materials
+
+??? abstract "run_curl_text_to_video.sh"
+    ``````sh
+    --8<-- "examples/online_serving/text_to_video/run_curl_text_to_video.sh"
+    ``````
+??? abstract "run_server.sh"
+    ``````sh
+    --8<-- "examples/online_serving/text_to_video/run_server.sh"
+    ``````