[CI] Modify some CI test cases to run on L4 environment to reduce H100 resource usage. by yenuo26 · Pull Request #1543 · vllm-project/vllm-omni

yenuo26 · 2026-02-27T10:25:46Z

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Modify some CI test cases to run on L4 environment to reduce H100 resource usage.

Test Plan

test benchmark testcase and abort testcase
/workspace/.venv/bin/python -m pytest -sv tests/benchmarks/test_serve_cli.py tests/engine/test_async_omni_engine_abort.py --html=report.html --self-contained-html

2.test qwen2.5 example testcase
run in ci

Test Result

test benchmark testcase and abort testcase

Result	Test	Duration	Links
Passed	tests/benchmarks/test_serve_cli.py::test_bench_serve_chat[omni_server0]	00:02:33
Passed	tests/engine/test_async_omni_engine_abort.py::test_abort	00:00:59

2.test qwen2.5 example testcase

Essential Elements of an Effective PR Description Checklist

--- - [ ] The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)". - [ ] The test plan. Please provide the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the [test style doc](https://docs.vllm.ai/projects/vllm-omni/en/latest/contributing/ci/tests_style/) - [ ] The test results. Please paste the results comparison before and after, or the e2e results. - [ ] (Optional) The necessary documentation update, such as updating `supported_models.md` and `examples` for a new model. **Please run `mkdocs serve` to sync the documentation editions to `./docs`.** - [ ] (Optional) Release notes update. If your change is user-facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

- Updated the nightly test script to handle multiple pytest commands and capture exit statuses. - Changed model from "Qwen/Qwen3-Omni-30B-A3B-Instruct" to "Qwen/Qwen2.5-Omni-7B" in benchmark tests. - Updated stage configuration file for qwen2.5-omni. - Adjusted prompt in the online serving test to specify a word limit for the answer. Signed-off-by: yenuo26 <410167048@qq.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 291364629f

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

tests/benchmarks/test_serve_cli.py

- Consolidated the Benchmark & Engine Test steps in both test-merge.yml and test-ready.yml. - Changed the agent queue to "gpu_4_queue" and updated the Docker plugin configuration for better resource management. - Removed the deprecated stage configuration file for Qwen3 Omni Thinker. Signed-off-by: yenuo26 <410167048@qq.com>

Signed-off-by: wangyu <53896905+yenuo26@users.noreply.github.com>

Signed-off-by: yenuo26 <410167048@qq.com>

- Set mm_processor_cache_gb to 0 in qwen2_5_omni_ci.yaml, qwen2_5_omni_multiconnector.yaml, and qwen2_5_omni.yaml. - Removed skip marker from test_qwen2_5_omni.py to enable the test. Signed-off-by: yenuo26 <410167048@qq.com>

lishunyang12

Left a couple comments. The H100 -> L4 migration itself makes sense, but a few things need attention.

yenuo26 · 2026-02-28T03:53:45Z

vllm_omni/model_executor/stage_configs/qwen2_5_omni.yaml

      engine_output_type: latent
      enable_prefix_caching: false
      max_num_batched_tokens: 32768
+      mm_processor_cache_gb: 0


Please see #1534 for the reason of the change.

I saw #1534, makes sense for the CI config. But this same change is also added to the production stage configs (qwen2_5_omni.yaml and qwen2_5_omni_multiconnector.yaml) — disables the mm processor cache for all users, not just CI. Was that intentional? If it is only needed to work around an L4 memory constraint, keep it in the CI configs only.

we make accuracy higher priority

yenuo26 · 2026-02-28T03:53:45Z

vllm_omni/model_executor/stage_configs/qwen2_5_omni.yaml

      engine_output_type: latent
      enable_prefix_caching: false
      max_num_batched_tokens: 32768
+      mm_processor_cache_gb: 0


Please see #1534 for the reason of the change.

lishunyang12 · 2026-02-28T04:10:23Z

.buildkite/test-merge.yml


-  - label: "Benchmark & Engine Test with H100"
-    timeout_in_minutes: 15
+  - label: "Benchmark & Engine Test"


The old config had timeout_in_minutes: 15 at the Buildkite level. The inner timeout 15m only kills the bash process — if the Docker pull or container startup hangs, Buildkite will wait forever. Add timeout_in_minutes back.

lishunyang12 · 2026-02-28T04:10:57Z

tests/examples/online_serving/test_qwen2_5_omni.py



-@pytest.mark.skip(reason="There is a known issue with stream error.")
 @pytest.mark.advanced_model


Which fix resolved the stream error? Worth adding a comment or linking the PR in the commit message so this does not get re-skipped later.

lishunyang12 · 2026-02-28T04:11:17Z

tests/benchmarks/test_serve_cli.py


-models = ["Qwen/Qwen3-Omni-30B-A3B-Instruct"]
-stage_configs = [str(Path(__file__).parent.parent / "e2e" / "stage_configs" / "qwen3_omni_ci.yaml")]
+models = ["Qwen/Qwen2.5-Omni-7B"]


Switching from Qwen3-30B to Qwen2.5-7B means benchmark numbers are no longer comparable across runs. If this test is meant to track perf regressions over time, consider keeping a Qwen3 benchmark on H100 (even if less frequent) alongside this L4 one.

hsliuustc0106

lgtm

hsliuustc0106 · 2026-02-28T06:41:24Z

vllm_omni/model_executor/stage_configs/qwen2_5_omni.yaml

      engine_output_type: latent
      enable_prefix_caching: false
      max_num_batched_tokens: 32768
+      mm_processor_cache_gb: 0


we make accuracy higher priority

yenuo26 and others added 2 commits February 27, 2026 16:26

Merge branch 'vllm-project:main' into nightly

602c764

yenuo26 requested a review from hsliuustc0106 as a code owner February 27, 2026 10:25

chatgpt-codex-connector bot reviewed Feb 27, 2026

View reviewed changes

tests/benchmarks/test_serve_cli.py Show resolved Hide resolved

yenuo26 force-pushed the nightly branch from 9961556 to b1ece67 Compare February 27, 2026 10:33

Merge branch 'main' into nightly

1c4031d

Signed-off-by: wangyu <53896905+yenuo26@users.noreply.github.com>

Gaohan123 added the ready label to trigger buildkite CI label Feb 27, 2026

yenuo26 added 4 commits February 27, 2026 20:56

Fix interpolation escaping errors

357ea20

Signed-off-by: yenuo26 <410167048@qq.com>

debug for test fail

17b0639

Signed-off-by: yenuo26 <410167048@qq.com>

recover debug step

97f6453

Signed-off-by: yenuo26 <410167048@qq.com>

Add mm_processor_cache_gb configuration to stage YAML files

9f90f97

- Set mm_processor_cache_gb to 0 in qwen2_5_omni_ci.yaml, qwen2_5_omni_multiconnector.yaml, and qwen2_5_omni.yaml. - Removed skip marker from test_qwen2_5_omni.py to enable the test. Signed-off-by: yenuo26 <410167048@qq.com>

yenuo26 force-pushed the nightly branch from 06ff51d to 9f90f97 Compare February 28, 2026 03:38

Merge branch 'main' into nightly

36e1297

lishunyang12 reviewed Feb 28, 2026

View reviewed changes

Merge branch 'main' into nightly

be22626

yenuo26 commented Feb 28, 2026

View reviewed changes

lishunyang12 reviewed Feb 28, 2026

View reviewed changes

hsliuustc0106 approved these changes Feb 28, 2026

View reviewed changes

hsliuustc0106 merged commit cd2234a into vllm-project:main Feb 28, 2026
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI] Modify some CI test cases to run on L4 environment to reduce H100 resource usage.#1543

[CI] Modify some CI test cases to run on L4 environment to reduce H100 resource usage.#1543
hsliuustc0106 merged 10 commits intovllm-project:mainfrom
yenuo26:nightly

yenuo26 commented Feb 27, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

lishunyang12 left a comment

Uh oh!

yenuo26 Feb 28, 2026

Uh oh!

lishunyang12 Feb 28, 2026

Uh oh!

hsliuustc0106 Feb 28, 2026

Uh oh!

yenuo26 Feb 28, 2026

Uh oh!

lishunyang12 Feb 28, 2026

Uh oh!

lishunyang12 Feb 28, 2026

Uh oh!

lishunyang12 Feb 28, 2026

Uh oh!

hsliuustc0106 left a comment

Uh oh!

hsliuustc0106 Feb 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants



		@pytest.mark.skip(reason="There is a known issue with stream error.")
		@pytest.mark.advanced_model

Conversation

yenuo26 commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

lishunyang12 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hsliuustc0106 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

yenuo26 commented Feb 27, 2026 •

edited

Loading