[Refactor]: Phase1 for rebasing_additional_info by divyanshsinghvi · Pull Request #1394 · vllm-project/vllm-omni

divyanshsinghvi · 2026-02-17T19:01:53Z

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Test Plan

pytest tests/worker/test_omni_gpu_model_runner.py -v
pytest tests/e2e/offline_inference/test_qwen3_omni.py -v
pytest tests/e2e/offline_inference/test_qwen2_5_omni.py -v

Test Result

pytest tests/worker/test_omni_gpu_model_runner.py -v
======================================================================================== test session starts =========================================================================================
platform linux -- Python 3.12.12, pytest-9.0.2, pluggy-1.6.0 -- /workspace/vllm-omni/.venv/bin/python3
cachedir: .pytest_cache
rootdir: /workspace/vllm-omni
configfile: pyproject.toml
plugins: anyio-4.12.1, asyncio-1.3.0, cov-7.0.0
asyncio: mode=Mode.AUTO, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 6 items                                                                                                                                                                                    

tests/worker/test_omni_gpu_model_runner.py::test_talker_mtp_forward_cpu_updates_inputs_and_info PASSED                                                                                         [ 16%]
tests/worker/test_omni_gpu_model_runner.py::test_talker_mtp_forward_cpu_empty_batch_noop PASSED                                                                                                [ 33%]
tests/worker/test_omni_gpu_model_runner.py::test_merge_intermediate_buffer_writes_to_buffer_and_setattr PASSED                                                                                 [ 50%]
tests/worker/test_omni_gpu_model_runner.py::test_merge_intermediate_buffer_accumulates PASSED                                                                                                  [ 66%]
tests/worker/test_omni_gpu_model_runner.py::test_merge_intermediate_buffer_skips_empty_update PASSED                                                                                           [ 83%]
tests/worker/test_omni_gpu_model_runner.py::test_merge_intermediate_buffer_skips_unknown_req_id PASSED                                                                                         [100%]

pytest tests/e2e/offline_inference/test_qwen3_omni.py -v   
========================================================================================== test session starts ===========================================================================================
platform linux -- Python 3.12.12, pytest-9.0.2, pluggy-1.6.0 -- /workspace/vllm-omni/.venv/bin/python3
cachedir: .pytest_cache
rootdir: /workspace/vllm-omni
configfile: pyproject.toml
plugins: asyncio-1.3.0, anyio-4.12.1, cov-7.0.0
asyncio: mode=Mode.AUTO, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 1 item                                                                                                                                                                                         

tests/e2e/offline_inference/test_qwen3_omni.py::test_video_to_audio[omni_runner0] 
PASSED                                                                                                           [100%]

============================================================================================ warnings summary ============================================================================================
<frozen importlib._bootstrap>:488
  <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyPacked has no __module__ attribute

<frozen importlib._bootstrap>:488
  <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyObject has no __module__ attribute

.venv/lib/python3.12/site-packages/torch/jit/_script.py:362: 14 warnings
  /workspace/vllm-omni/.venv/lib/python3.12/site-packages/torch/jit/_script.py:362: DeprecationWarning: `torch.jit.script_method` is deprecated. Please switch to `torch.compile` or `torch.export`.
    warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=============================================================================== 1 passed, 16 warnings in 400.67s (0:06:40) ===============================================================================

pytest tests/e2e/offline_inference/test_qwen2_5_omni.py -v                              
========================================================================================== test session starts ===========================================================================================
platform linux -- Python 3.12.12, pytest-9.0.2, pluggy-1.6.0 -- /workspace/vllm-omni/.venv/bin/python3
cachedir: .pytest_cache
rootdir: /workspace/vllm-omni
configfile: pyproject.toml
plugins: asyncio-1.3.0, anyio-4.12.1, cov-7.0.0
asyncio: mode=Mode.AUTO, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 2 items                                                                                                                                                                                        

tests/e2e/offline_inference/test_qwen2_5_omni.py::test_mix_to_audio[omni_runner0] PASSED                                                                                                           [ 50%]
tests/e2e/offline_inference/test_qwen2_5_omni.py::test_text_to_text[omni_runner0] PASSED                                                                                                           [100%]

============================================================================================ warnings summary ============================================================================================
<frozen importlib._bootstrap>:488
  <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyPacked has no __module__ attribute

<frozen importlib._bootstrap>:488
  <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyObject has no __module__ attribute

.venv/lib/python3.12/site-packages/torch/jit/_script.py:362: 14 warnings
  /workspace/vllm-omni/.venv/lib/python3.12/site-packages/torch/jit/_script.py:362: DeprecationWarning: `torch.jit.script_method` is deprecated. Please switch to `torch.compile` or `torch.export`.
    warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=============================================================================== 2 passed, 16 warnings in 287.08s (0:04:47) ===============================================================================

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan. Please providing the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
The test results. Please pasting the results comparison before and after, or e2e results.
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
(Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Signed-off-by: dsinghvi <divyanshsinghvi@gmail.com>

lishunyang12

A few questions on the fallback logic and a potential behavior change.

vllm_omni/model_executor/models/qwen3_omni/qwen3_omni.py

vllm_omni/worker/gpu_model_runner.py

lishunyang12 · 2026-02-21T04:58:42Z

vllm_omni/worker/gpu_model_runner.py

            try:
                if getattr(new_req_data, "additional_information", None) is not None:
+                    warnings.warn(
+                        "additional_information on request data is deprecated, use model_intermediate_buffer",


These deprecation warnings fire on every request with the old field name. In high-throughput scenarios that could get noisy — worth gating with a _compat_warned flag so they only fire once per process.

@lishunyang12 Doesn't warnings.warn default filter handle it by itself? Usually I think it filters with filename,linenumber indexing?
Or the concern was because it's only being filtered but the code will run it and then only during logging it's filtered could be a problem in high throughput situation.

Though I think it's better to add flag for clarity.

tests/worker/test_omni_gpu_model_runner.py

Signed-off-by: dsinghvi <divyanshsinghvi@gmail.com>

divyanshsinghvi · 2026-02-24T00:04:24Z

cc: @tzhouam

hsliuustc0106 · 2026-02-24T01:27:48Z

PR Review

This PR implements Phase 1 of RFC #1351 to rebase additional_information into model_intermediate_buffer.

Summary of Changes

New model_intermediate_buffer — Centralized request-indexed dictionary in model runners replacing scattered additional_information_cpu setattr usage
Backward compatibility — Dual-writes to both old and new paths with deprecation warnings
Model layer updates — qwen3_omni.py and qwen3_tts.py check new buffer first, fall back to old kwarg
Renamed merge method — _merge_additional_information_update → _merge_intermediate_buffer (old name aliased)
Test coverage — 4 new tests for merge behavior

Strengths

Well-structured migration matching RFC Phase 1
Comprehensive backward compatibility with deprecation warnings
Good test coverage for the new merge logic
Consistent changes across GPU/NPU runners

Blocking Issues

Missing e2e test results — The PR description shows empty results for pytest tests/e2e/offline_inference/test_qwen3_omni.py -v and pytest tests/e2e/offline_inference/test_qwen2_5_omni.py -v. Please run these tests and report the results before merging.

Non-blocking Concerns

Dual-write overhead — Every update writes to both model_intermediate_buffer and additional_information_cpu. Consider the performance impact in hot paths.
Alias clarity — The line _merge_additional_information_update = _merge_intermediate_buffer is a clever compatibility shim, but could confuse readers. Consider adding a comment like # Deprecated alias for backward compat.
Memory cleanup — The model_intermediate_buffer.pop(req_id, None) in _update_states is good, but please verify this handles all request lifecycle paths (cancellation, errors, etc.).

Verdict

Conditional Approve — The Phase 1 implementation is solid and well-aligned with the RFC. Please address the missing e2e test results before merging.

tzhouam · 2026-02-24T02:03:06Z

Qwen 2.5 omni may need similar modifications

hsliuustc0106 · 2026-02-24T02:07:21Z

PR Review: Phase 1 for rebasing additional_info

This PR implements Phase 1 of RFC #1351 to rebase additional_information into model_intermediate_buffer.

Summary of Changes

New model_intermediate_buffer — Centralized request-indexed dictionary in model runners replacing scattered additional_information_cpu setattr usage
Backward compatibility — Dual-writes to both old and new paths with deprecation warnings
Model layer updates — qwen3_omni.py and qwen3_tts.py check new buffer first, fall back to old kwarg
Renamed merge method — _merge_additional_information_update → _merge_intermediate_buffer (old name aliased)
Test coverage — 4 new tests for merge behavior

Strengths

Well-structured migration matching RFC Phase 1
Comprehensive backward compatibility with deprecation warnings
Good test coverage for the new merge logic
Consistent changes across GPU/NPU runners
Proper cleanup of model_intermediate_buffer when requests finish

Inline Comments Posted

See specific line-level comments for:

Typo in deprecation warning class name (2 instances)
Good defensive copy in qwen3_tts.py

Non-blocking Concerns

Dual-write overhead — Every update writes to both model_intermediate_buffer and additional_information_cpu. Consider performance impact in hot paths during migration period.
Alias naming — The line _merge_additional_information_update = _merge_intermediate_buffer is a clever compatibility shim, but could confuse readers. Consider adding a comment like # Deprecated alias for backward compat.
Missing e2e tests — PR description shows empty results for pytest tests/e2e/offline_inference/test_qwen3_omni.py -v and pytest tests/e2e/offline_inference/test_qwen2_5_omni.py -v.

Verdict

Conditional Approve — The Phase 1 implementation is solid and well-aligned with the RFC. Please address the typo and run the missing e2e tests before merging.

Note: Once Phase 2 and Phase 3 are complete and the old paths are removed, the dual-write overhead and alias naming will become moot.

hsliuustc0106 · 2026-02-24T07:07:01Z

@vllm-omni-reviewer

divyanshsinghvi · 2026-02-24T08:26:38Z

Qwen 2.5 omni may need similar modifications

Qwen 2.5 doesn't use runtime_additional_info directly so we won't be requiring that, it's being passed from info_dict only which should be handled by the current internal code.

Qwen3 has some info passed from info_dict and runtime_additional_info also. So had to add if else.

There are "dead" additional_info in forward() calls for both, but I think we can remove them in second stage.

Signed-off-by: dsinghvi <divyanshsinghvi@gmail.com>

This reverts commit 57814cb. Signed-off-by: Divyansh Singhvi <divyanshsinghvi@gmail.com>

Signed-off-by: dsinghvi <divyanshsinghvi@gmail.com>

tzhouam · 2026-02-25T04:30:58Z

@amy-why-3459 @R2-Y Please review this PR if it affects the async chunk as the runtime additional information will not be passed to scheduler in the future design.

Signed-off-by: Zhou Taichang <tzhouam@connect.ust.hk>

tzhouam

I have added some comments here.
Please see if it helps.

vllm_omni/model_executor/models/qwen3_tts/qwen3_tts.py

vllm_omni/model_executor/models/qwen3_tts/qwen3_tts_talker.py

vllm_omni/platforms/npu/worker/npu_model_runner.py

vllm_omni/worker/gpu_model_runner.py

tzhouam · 2026-03-01T11:57:54Z

Also please solve the conflicts then we can merge.
Thanks

Signed-off-by: dsinghvi <divyanshsinghvi@gmail.com>

tzhouam

LGTM

Phase1; needs to be tested

23b2308

Signed-off-by: dsinghvi <divyanshsinghvi@gmail.com>

divyanshsinghvi force-pushed the rebase_additional_info branch from 444f1f4 to 23b2308 Compare February 17, 2026 20:26

refactor checks

6df545c

Signed-off-by: dsinghvi <divyanshsinghvi@gmail.com>

lishunyang12 reviewed Feb 21, 2026

View reviewed changes

divyanshsinghvi added 7 commits February 23, 2026 20:44

issue with empty list; wrong fallback

469aef3

Signed-off-by: dsinghvi <divyanshsinghvi@gmail.com>

fix [] vs None

a3bf610

Signed-off-by: dsinghvi <divyanshsinghvi@gmail.com>

gpu npu self request behaviour as previous

f7ef8de

Signed-off-by: dsinghvi <divyanshsinghvi@gmail.com>

Merge branch 'main' into rebase_additional_info

162e616

Signed-off-by: dsinghvi <divyanshsinghvi@gmail.com>

warning only once

3a67ddd

Signed-off-by: dsinghvi <divyanshsinghvi@gmail.com>

test

f0d1d1c

Signed-off-by: dsinghvi <divyanshsinghvi@gmail.com>

warning

dd53c82

Signed-off-by: dsinghvi <divyanshsinghvi@gmail.com>

divyanshsinghvi marked this pull request as ready for review February 23, 2026 22:31

divyanshsinghvi requested a review from hsliuustc0106 as a code owner February 23, 2026 22:31

divyanshsinghvi requested a review from lishunyang12 February 23, 2026 22:33

divyanshsinghvi changed the title ~~[WIP][Refactor]: Phase1 for rebasing_additional_info~~ [Refactor]: Phase1 for rebasing_additional_info Feb 23, 2026

tzhouam requested review from tzhouam and removed request for lishunyang12 February 24, 2026 01:26

This comment was marked as off-topic.

Sign in to view

divyanshsinghvi added 3 commits February 24, 2026 08:53

remove addinfo

57814cb

Signed-off-by: dsinghvi <divyanshsinghvi@gmail.com>

Revert "remove addinfo"

2720976

This reverts commit 57814cb. Signed-off-by: Divyansh Singhvi <divyanshsinghvi@gmail.com>

qwen3_tts talker

69407da

Signed-off-by: dsinghvi <divyanshsinghvi@gmail.com>

divyanshsinghvi force-pushed the rebase_additional_info branch from 8847d35 to 69407da Compare February 24, 2026 09:53

Merge branch 'main' into rebase_additional_info

30c574b

divyanshsinghvi requested a review from hsliuustc0106 February 24, 2026 13:35

tzhouam added the ready label to trigger buildkite CI label Feb 25, 2026

Merge branch 'main' into rebase_additional_info

cf37b5c

Signed-off-by: Zhou Taichang <tzhouam@connect.ust.hk>

tzhouam reviewed Mar 1, 2026

View reviewed changes

warnings_once

66ba84a

Signed-off-by: dsinghvi <divyanshsinghvi@gmail.com>

divyanshsinghvi force-pushed the rebase_additional_info branch from 60a101d to 66ba84a Compare March 1, 2026 20:50

divyanshsinghvi added 3 commits March 1, 2026 21:52

Merge branch 'main' into rebase_additional_info

3b49ab3

Signed-off-by: dsinghvi <divyanshsinghvi@gmail.com>

logger once

c5a54aa

Signed-off-by: dsinghvi <divyanshsinghvi@gmail.com>

remove old function

0eb5cc2

Signed-off-by: dsinghvi <divyanshsinghvi@gmail.com>

tzhouam self-requested a review March 2, 2026 02:08

tzhouam approved these changes Mar 2, 2026

View reviewed changes

tzhouam merged commit 68764cc into vllm-project:main Mar 2, 2026
7 checks passed

Conversation

divyanshsinghvi commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

lishunyang12 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

lishunyang12 Feb 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

divyanshsinghvi Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

divyanshsinghvi commented Feb 24, 2026

Uh oh!

hsliuustc0106 commented Feb 24, 2026

PR Review

Summary of Changes

Strengths

Blocking Issues

Non-blocking Concerns

Verdict

Uh oh!

This comment was marked as off-topic.

Uh oh!

tzhouam commented Feb 24, 2026

Uh oh!

hsliuustc0106 commented Feb 24, 2026

PR Review: Phase 1 for rebasing additional_info

Summary of Changes

Strengths

Inline Comments Posted

Non-blocking Concerns

Verdict

Uh oh!

hsliuustc0106 commented Feb 24, 2026

Uh oh!

divyanshsinghvi commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tzhouam commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tzhouam left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tzhouam commented Mar 1, 2026

Uh oh!

tzhouam left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

divyanshsinghvi commented Feb 17, 2026 •

edited

Loading

lishunyang12 left a comment •

edited

Loading

lishunyang12 Feb 21, 2026 •

edited

Loading

divyanshsinghvi Feb 23, 2026 •

edited

Loading

divyanshsinghvi commented Feb 24, 2026 •

edited

Loading

tzhouam commented Feb 25, 2026 •

edited

Loading