Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions docs/.nav.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,11 +25,13 @@ nav:
- Online Serving:
- BAGEL-7B-MoT: user_guide/examples/online_serving/bagel.md
- Image-To-Image: user_guide/examples/online_serving/image_to_image.md
- Image-To-Video: user_guide/examples/online_serving/image_to_video.md
- LoRA Inference(Diffusion): user_guide/examples/online_serving/lora_inference.md
- Qwen2.5-Omni: user_guide/examples/online_serving/qwen2_5_omni.md
- Qwen3-Omni: user_guide/examples/online_serving/qwen3_omni.md
- Qwen3-TTS: user_guide/examples/online_serving/qwen3_tts.md
- Text-To-Image: user_guide/examples/online_serving/text_to_image.md
- Text-To-Video: user_guide/examples/online_serving/text_to_video.md
- General:
- usage/*
- Configuration:
Expand All @@ -48,6 +50,7 @@ nav:
- FP8: user_guide/diffusion/quantization/fp8.md
- Parallelism Acceleration: user_guide/diffusion/parallelism_acceleration.md
- CPU Offloading: user_guide/diffusion/cpu_offload_diffusion.md
- Custom Pipeline: features/custom_pipeline.md
- ComfyUI: features/comfyui.md
- Developer Guide:
- General:
Expand Down
2 changes: 2 additions & 0 deletions docs/api/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -92,8 +92,10 @@ Configuration classes.
Worker classes and model runners for distributed inference.

- [vllm_omni.diffusion.worker.diffusion_model_runner.DiffusionModelRunner][]
- [vllm_omni.diffusion.worker.diffusion_worker.CustomPipelineWorkerExtension][]
- [vllm_omni.diffusion.worker.diffusion_worker.DiffusionWorker][]
- [vllm_omni.diffusion.worker.diffusion_worker.WorkerProc][]
- [vllm_omni.diffusion.worker.diffusion_worker.WorkerWrapperBase][]
- [vllm_omni.platforms.npu.worker.npu_ar_model_runner.ExecuteModelState][]
- [vllm_omni.platforms.npu.worker.npu_ar_model_runner.NPUARModelRunner][]
- [vllm_omni.platforms.npu.worker.npu_ar_worker.NPUARWorker][]
Expand Down
18 changes: 18 additions & 0 deletions docs/user_guide/examples/offline_inference/bagel.md
Original file line number Diff line number Diff line change
Expand Up @@ -154,6 +154,24 @@ The default yaml configuration deploys Thinker and DiT on the same GPU. You can

------

#### Tensor Parallelism (TP)

For larger models or multi-GPU environments, you can enable Tensor Parallelism (TP) by modifying the stage configuration (e.g., [`bagel.yaml`](https://github.com/vllm-project/vllm-omni/tree/main/vllm_omni/model_executor/stage_configs/bagel.yaml)).

1. **Set `tensor_parallel_size`**: Increase this value (e.g., to `2` or `4`).
2. **Set `devices`**: Specify the comma-separated GPU IDs to be used for the stage (e.g., `"0,1"`).

Example configuration for TP=2 on GPUs 0 and 1:
```yaml
engine_args:
tensor_parallel_size: 2
...
runtime:
devices: "0,1"
```

------

#### 🔗 Runtime Configuration

| Parameter | Value | Description |
Expand Down
21 changes: 19 additions & 2 deletions docs/user_guide/examples/online_serving/bagel.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,12 +25,29 @@ cd /workspace/vllm-omni/examples/online_serving/bagel
bash run_server.sh
```

If you have a custom stage configs file, launch the server with the command below:

```bash
vllm serve ByteDance-Seed/BAGEL-7B-MoT --omni --port 8091 --stage-configs-path /path/to/stage_configs_file
```

#### 🚀 Tensor Parallelism (TP)

For larger models or multi-GPU environments, you can enable Tensor Parallelism (TP) for the server.

1. **Modify Stage Config**: Create or modify a stage configuration yaml (e.g., [`bagel.yaml`](https://github.com/vllm-project/vllm-omni/tree/main/vllm_omni/model_executor/stage_configs/bagel.yaml)). Set `tensor_parallel_size` to `2` (or more) and update `devices` to include multiple GPU IDs (e.g., `"0,1"`).

```yaml
engine_args:
tensor_parallel_size: 2
...
runtime:
devices: "0,1"
```

2. **Launch Server**:
```bash
vllm serve ByteDance-Seed/BAGEL-7B-MoT --omni --port 8091 --stage-configs-path /path/to/your/custom_bagel.yaml
```

### Send Multi-modal Request

Get into the bagel folder:
Expand Down
97 changes: 97 additions & 0 deletions docs/user_guide/examples/online_serving/image_to_video.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
# Image-To-Video

Source <https://github.com/vllm-project/vllm-omni/tree/main/examples/online_serving/image_to_video>.


This example demonstrates how to deploy the Wan2.2 image-to-video model for online video generation using vLLM-Omni.

## Start Server

### Basic Start

```bash
vllm serve Wan-AI/Wan2.2-I2V-A14B-Diffusers --omni --port 8091
```

### Start with Parameters

Or use the startup script:

```bash
bash run_server.sh
```

The script allows overriding:
- `MODEL` (default: `Wan-AI/Wan2.2-I2V-A14B-Diffusers`)
- `PORT` (default: `8091`)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Correct image-to-video script default port in docs

The document states run_server.sh defaults PORT to 8091, but the real script uses 8099 (examples/online_serving/image_to_video/run_server.sh) and the example curl script posts to 8099, so following this page causes users to send requests to the wrong port after launching with the provided script.

Useful? React with 👍 / 👎.

- `BOUNDARY_RATIO` (default: `0.875`)
- `FLOW_SHIFT` (default: `12.0`)
Comment on lines +27 to +28

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Remove unsupported image-to-video env override claims

This section claims run_server.sh supports BOUNDARY_RATIO and FLOW_SHIFT overrides, but the script does not read or pass either value (it only wires model/port/cache flags), so users setting those env vars will get silently ignored behavior and non-reproducible tuning attempts.

Useful? React with 👍 / 👎.

- `CACHE_BACKEND` (default: `none`)
- `ENABLE_CACHE_DIT_SUMMARY` (default: `0`)

## API Calls

### Method 1: Using curl

```bash
# Basic image-to-video generation
bash run_curl_image_to_video.sh

# Or execute directly (OpenAI-style multipart)
curl -X POST http://localhost:8091/v1/videos \
-H "Accept: application/json" \
-F "prompt=A bear playing with yarn, smooth motion" \
-F "negative_prompt=low quality, blurry, static" \
-F "input_reference=@/path/to/qwen-bear.png" \
-F "width=832" \
-F "height=480" \
-F "num_frames=33" \
-F "fps=16" \
-F "num_inference_steps=40" \
-F "guidance_scale=1.0" \
-F "guidance_scale_2=1.0" \
-F "boundary_ratio=0.875" \
-F "flow_shift=12.0" \
-F "seed=42" | jq -r '.data[0].b64_json' | base64 -d > wan22_i2v_output.mp4
```

## Request Format

### Required Fields

```bash
curl -X POST http://localhost:8091/v1/videos \
-F "prompt=A bear playing with yarn, smooth motion" \
-F "negative_prompt=low quality, blurry, static" \
-F "input_reference=@/path/to/qwen-bear.png"
```

### Generation with Parameters

```bash
curl -X POST http://localhost:8091/v1/videos \
-F "prompt=A bear playing with yarn, smooth motion" \
-F "negative_prompt=low quality, blurry, static" \
-F "input_reference=@/path/to/qwen-bear.png" \
-F "width=832" \
-F "height=480" \
-F "num_frames=33" \
-F "fps=16" \
-F "num_inference_steps=40" \
-F "guidance_scale=1.0" \
-F "guidance_scale_2=1.0" \
-F "boundary_ratio=0.875" \
-F "flow_shift=12.0" \
-F "seed=42"
```

## Example materials

??? abstract "run_curl_image_to_video.sh"
``````sh
--8<-- "examples/online_serving/image_to_video/run_curl_image_to_video.sh"
``````
??? abstract "run_server.sh"
``````sh
--8<-- "examples/online_serving/image_to_video/run_server.sh"
``````
2 changes: 1 addition & 1 deletion docs/user_guide/examples/online_serving/qwen2_5_omni.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ cd examples/online_serving/qwen2_5_omni
#### Send request via python

```bash
python openai_chat_completion_client_for_multimodal_generation.py --query-type mixed_modalities
python openai_chat_completion_client_for_multimodal_generation.py --query-type mixed_modalities --port 8091 --host "localhost"
```

The Python client supports the following command-line arguments:
Expand Down
2 changes: 1 addition & 1 deletion docs/user_guide/examples/online_serving/qwen3_omni.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ cd examples/online_serving/qwen3_omni
#### Send request via python

```bash
python openai_chat_completion_client_for_multimodal_generation.py --query-type use_image
python openai_chat_completion_client_for_multimodal_generation.py --query-type use_image --port 8091 --host "localhost"
```

The Python client supports the following command-line arguments:
Expand Down
Loading