Skip to content

Commit 5bb122e

Browse files
committed
Merge remote-tracking branch 'origin/main' into device_memory_monitor
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
2 parents e2c4a03 + 11f59f6 commit 5bb122e

File tree

122 files changed

+16771
-517
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

122 files changed

+16771
-517
lines changed

.buildkite/pipeline.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ steps:
33
key: image-build
44
commands:
55
- "aws ecr-public get-login-password --region us-east-1 | docker login --username AWS --password-stdin public.ecr.aws/q9t5s3a7"
6-
- "docker build --file docker/Dockerfile.ci -t vllm-omni-ci ."
6+
- "docker build --progress=plain --file docker/Dockerfile.ci -t vllm-omni-ci ."
77
- "docker tag vllm-omni-ci public.ecr.aws/q9t5s3a7/vllm-ci-test-repo:$BUILDKITE_COMMIT"
88
- "docker push public.ecr.aws/q9t5s3a7/vllm-ci-test-repo:$BUILDKITE_COMMIT"
99
agents:

.buildkite/test-nightly.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ steps:
1313
- kubernetes:
1414
podSpec:
1515
containers:
16-
- image: public.ecr.aws/q9t5s3a7/vllm-ci-test-repo:$BUILDKITE_COMMIT
16+
- image: 936637512419.dkr.ecr.us-west-2.amazonaws.com/vllm-ci-pull-through-cache/q9t5s3a7/vllm-ci-test-repo:$BUILDKITE_COMMIT
1717
resources:
1818
limits:
1919
nvidia.com/gpu: 2
@@ -69,7 +69,7 @@ steps:
6969
- kubernetes:
7070
podSpec:
7171
containers:
72-
- image: public.ecr.aws/q9t5s3a7/vllm-ci-test-repo:$BUILDKITE_COMMIT
72+
- image: 936637512419.dkr.ecr.us-west-2.amazonaws.com/vllm-ci-pull-through-cache/q9t5s3a7/vllm-ci-test-repo:$BUILDKITE_COMMIT
7373
resources:
7474
limits:
7575
nvidia.com/gpu: 2

.buildkite/test-ready.yml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -132,7 +132,7 @@ steps:
132132
# - kubernetes:
133133
# podSpec:
134134
# containers:
135-
# - image: public.ecr.aws/q9t5s3a7/vllm-ci-test-repo:$BUILDKITE_COMMIT
135+
# - image: 936637512419.dkr.ecr.us-west-2.amazonaws.com/vllm-ci-pull-through-cache/q9t5s3a7/vllm-ci-test-repo:$BUILDKITE_COMMIT
136136
# resources:
137137
# limits:
138138
# nvidia.com/gpu: 2
@@ -192,7 +192,7 @@ steps:
192192
# - kubernetes:
193193
# podSpec:
194194
# containers:
195-
# - image: public.ecr.aws/q9t5s3a7/vllm-ci-test-repo:$BUILDKITE_COMMIT
195+
# - image: 936637512419.dkr.ecr.us-west-2.amazonaws.com/vllm-ci-pull-through-cache/q9t5s3a7/vllm-ci-test-repo:$BUILDKITE_COMMIT
196196
# resources:
197197
# limits:
198198
# nvidia.com/gpu: 2
@@ -251,7 +251,7 @@ steps:
251251
# - kubernetes:
252252
# podSpec:
253253
# containers:
254-
# - image: public.ecr.aws/q9t5s3a7/vllm-ci-test-repo:$BUILDKITE_COMMIT
254+
# - image: 936637512419.dkr.ecr.us-west-2.amazonaws.com/vllm-ci-pull-through-cache/q9t5s3a7/vllm-ci-test-repo:$BUILDKITE_COMMIT
255255
# resources:
256256
# limits:
257257
# nvidia.com/gpu: 1
@@ -288,7 +288,7 @@ steps:
288288
# - kubernetes:
289289
# podSpec:
290290
# containers:
291-
# - image: public.ecr.aws/q9t5s3a7/vllm-ci-test-repo:$BUILDKITE_COMMIT
291+
# - image: 936637512419.dkr.ecr.us-west-2.amazonaws.com/vllm-ci-pull-through-cache/q9t5s3a7/vllm-ci-test-repo:$BUILDKITE_COMMIT
292292
# resources:
293293
# limits:
294294
# nvidia.com/gpu: 1

.github/PULL_REQUEST_TEMPLATE.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -12,10 +12,10 @@ PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTT
1212
<summary> Essential Elements of an Effective PR Description Checklist </summary>
1313

1414
- [ ] The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
15-
- [ ] The test plan. Please providing the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the [test style doc](https://docs.vllm.ai/projects/vllm-omni/en/latest/contributing/ci/tests_style/)
16-
- [ ] The test results. Please pasting the results comparison before and after, or e2e results.
15+
- [ ] The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the [test style doc](https://docs.vllm.ai/projects/vllm-omni/en/latest/contributing/ci/tests_style/)
16+
- [ ] The test results. Please paste the results comparison before and after, or the e2e results.
1717
- [ ] (Optional) The necessary documentation update, such as updating `supported_models.md` and `examples` for a new model. **Please run `mkdocs serve` to sync the documentation editions to `./docs`.**
18-
- [ ] (Optional) Release notes update. If your change is user facing, please update the release notes draft.
18+
- [ ] (Optional) Release notes update. If your change is user-facing, please update the release notes draft.
1919
</details>
2020

2121
**BEFORE SUBMITTING, PLEASE READ <https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md>** (anything written below this line will be removed by GitHub Actions)

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -243,5 +243,7 @@ Dockerfile.dev
243243
discussion
244244
tmp_test
245245

246+
# Auto-generated version file (created by setuptools_scm during build)
247+
vllm_omni/_version.py
246248
# output files
247249
*.wav

docker/Dockerfile.ci

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ COPY . .
88

99
# Install system dependencies
1010
RUN apt-get update && \
11-
apt-get install -y ffmpeg sox libsox-fmt-all jq && \
11+
apt-get install -y ffmpeg git sox libsox-fmt-all jq && \
1212
apt-get clean && \
1313
rm -rf /var/lib/apt/lists/*
1414

docker/Dockerfile.rocm

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ WORKDIR ${COMMON_WORKDIR}
1919

2020
# Step 1: Setup - Install system dependencies
2121
RUN apt-get update && \
22-
apt-get install -y ffmpeg && \
22+
apt-get install -y ffmpeg git sox libsox-fmt-all jq && \
2323
apt-get clean && \
2424
rm -rf /var/lib/apt/lists/*
2525

docs/.nav.yml

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,9 +13,9 @@ nav:
1313
- examples/README.md
1414
- Offline Inference:
1515
- BAGEL-7B-MoT: user_guide/examples/offline_inference/bagel.md
16+
- GLM-Image Multistage End-to-End Inference: user_guide/examples/offline_inference/glm_image.md
1617
- Image-To-Image: user_guide/examples/offline_inference/image_to_image.md
1718
- Image-To-Video: user_guide/examples/offline_inference/image_to_video.md
18-
- LoRA Inference(Diffusion): user_guide/examples/offline_inference/lora_inference.md
1919
- Qwen2.5-Omni: user_guide/examples/offline_inference/qwen2_5_omni.md
2020
- Qwen3-Omni: user_guide/examples/offline_inference/qwen3_omni.md
2121
- Qwen3-TTS: user_guide/examples/offline_inference/qwen3_tts.md
@@ -24,12 +24,14 @@ nav:
2424
- Text-To-Video: user_guide/examples/offline_inference/text_to_video.md
2525
- Online Serving:
2626
- BAGEL-7B-MoT: user_guide/examples/online_serving/bagel.md
27+
- GLM-Image Online Serving: user_guide/examples/online_serving/glm_image.md
2728
- Image-To-Image: user_guide/examples/online_serving/image_to_image.md
28-
- LoRA Inference(Diffusion): user_guide/examples/online_serving/lora_inference.md
29+
- Image-To-Video: user_guide/examples/online_serving/image_to_video.md
2930
- Qwen2.5-Omni: user_guide/examples/online_serving/qwen2_5_omni.md
3031
- Qwen3-Omni: user_guide/examples/online_serving/qwen3_omni.md
3132
- Qwen3-TTS: user_guide/examples/online_serving/qwen3_tts.md
3233
- Text-To-Image: user_guide/examples/online_serving/text_to_image.md
34+
- Text-To-Video: user_guide/examples/online_serving/text_to_video.md
3335
- General:
3436
- usage/*
3537
- Configuration:
@@ -46,13 +48,17 @@ nav:
4648
- Quantization:
4749
- Overview: user_guide/diffusion/quantization/overview.md
4850
- FP8: user_guide/diffusion/quantization/fp8.md
51+
- GGUF: user_guide/diffusion/quantization/gguf.md
4952
- Parallelism Acceleration: user_guide/diffusion/parallelism_acceleration.md
5053
- CPU Offloading: user_guide/diffusion/cpu_offload_diffusion.md
5154
- LoRA: user_guide/diffusion/lora.md
55+
- Hybrid Sharded Data Parallel: design/feature/hsdp.md
56+
- Custom Pipeline: features/custom_pipeline.md
5257
- ComfyUI: features/comfyui.md
5358
- Developer Guide:
5459
- General:
5560
- contributing/README.md
61+
- pr_reviewer.md
5662
- glob: contributing/*
5763
flatten_single_child_sections: true
5864
- Model Implementation:
@@ -72,6 +78,7 @@ nav:
7278
- design/feature/tensor_parallel.md
7379
- design/feature/cache_dit.md
7480
- design/feature/teacache.md
81+
- design/feature/async_chunk_design.md
7582
- Module Design:
7683
- design/module/ar_module.md
7784
- design/module/dit_module.md

docs/api/README.md

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,6 @@ Main entry points for vLLM-Omni inference and serving.
77
- [vllm_omni.entrypoints.async_omni.AsyncOmni][]
88
- [vllm_omni.entrypoints.async_omni_diffusion.AsyncOmniDiffusion][]
99
- [vllm_omni.entrypoints.async_omni_llm.AsyncOmniLLM][]
10-
- [vllm_omni.entrypoints.chat_utils.extract_audio_from_video_async][]
1110
- [vllm_omni.entrypoints.cli.benchmark.base.OmniBenchmarkSubcommandBase][]
1211
- [vllm_omni.entrypoints.cli.benchmark.main.OmniBenchmarkSubcommand][]
1312
- [vllm_omni.entrypoints.cli.benchmark.serve.OmniBenchmarkServingSubcommand][]
@@ -19,6 +18,7 @@ Main entry points for vLLM-Omni inference and serving.
1918
- [vllm_omni.entrypoints.omni_llm.OmniLLM][]
2019
- [vllm_omni.entrypoints.omni_stage.OmniStage][]
2120
- [vllm_omni.entrypoints.stage_utils.OmniStageTaskType][]
21+
- [vllm_omni.entrypoints.zmq_utils.ZmqQueue][]
2222

2323
## Inputs
2424

@@ -36,6 +36,12 @@ Input data structures for multi-modal inputs.
3636
Engine classes for offline and online inference.
3737

3838
- [vllm_omni.diffusion.diffusion_engine.DiffusionEngine][]
39+
- [vllm_omni.distributed.omni_connectors.connectors.mooncake_transfer_engine_connector.BufferAllocator][]
40+
- [vllm_omni.distributed.omni_connectors.connectors.mooncake_transfer_engine_connector.ManagedBuffer][]
41+
- [vllm_omni.distributed.omni_connectors.connectors.mooncake_transfer_engine_connector.MooncakeAgentMetadata][]
42+
- [vllm_omni.distributed.omni_connectors.connectors.mooncake_transfer_engine_connector.MooncakeTransferEngineConnector][]
43+
- [vllm_omni.distributed.omni_connectors.connectors.mooncake_transfer_engine_connector.QueryRequest][]
44+
- [vllm_omni.distributed.omni_connectors.connectors.mooncake_transfer_engine_connector.QueryResponse][]
3945
- [vllm_omni.engine.AdditionalInformationEntry][]
4046
- [vllm_omni.engine.AdditionalInformationPayload][]
4147
- [vllm_omni.engine.OmniEngineCoreOutput][]
@@ -89,8 +95,10 @@ Configuration classes.
8995
Worker classes and model runners for distributed inference.
9096

9197
- [vllm_omni.diffusion.worker.diffusion_model_runner.DiffusionModelRunner][]
98+
- [vllm_omni.diffusion.worker.diffusion_worker.CustomPipelineWorkerExtension][]
9299
- [vllm_omni.diffusion.worker.diffusion_worker.DiffusionWorker][]
93100
- [vllm_omni.diffusion.worker.diffusion_worker.WorkerProc][]
101+
- [vllm_omni.diffusion.worker.diffusion_worker.WorkerWrapperBase][]
94102
- [vllm_omni.platforms.npu.worker.npu_ar_model_runner.ExecuteModelState][]
95103
- [vllm_omni.platforms.npu.worker.npu_ar_model_runner.NPUARModelRunner][]
96104
- [vllm_omni.platforms.npu.worker.npu_ar_worker.NPUARWorker][]

docs/contributing/profiling.md

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -131,9 +131,6 @@ python image_to_video.py \
131131

132132
2. **Wan-AI/Wan2.2-I2V-A14B-Diffusers**: [https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/image_to_video](https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/image_to_video)
133133

134-
> **Note:**
135-
As of now, asynchronous (online) profiling is not fully supported in vLLM-Omni. While start_profile() and stop_profile() methods exist, they are only reliable in offline inference scripts (e.g., the provided end2end.py examples). Do not use them in server-mode or streaming scenarios—traces may be incomplete or fail to flush.
136-
137134
### 4. Analyzing Omni Traces
138135

139136
Output files are saved to your configured ```VLLM_TORCH_PROFILER_DIR```.

0 commit comments

Comments
 (0)