Skip to content

[Refactor]: Phase1 for rebasing_additional_info#1394

Merged
tzhouam merged 18 commits intovllm-project:mainfrom
divyanshsinghvi:rebase_additional_info
Mar 2, 2026
Merged

[Refactor]: Phase1 for rebasing_additional_info#1394
tzhouam merged 18 commits intovllm-project:mainfrom
divyanshsinghvi:rebase_additional_info

Conversation

@divyanshsinghvi
Copy link
Contributor

@divyanshsinghvi divyanshsinghvi commented Feb 17, 2026

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

#1351

Test Plan

  1. pytest tests/worker/test_omni_gpu_model_runner.py -v
  2. pytest tests/e2e/offline_inference/test_qwen3_omni.py -v
  3. pytest tests/e2e/offline_inference/test_qwen2_5_omni.py -v

Test Result

pytest tests/worker/test_omni_gpu_model_runner.py -v
======================================================================================== test session starts =========================================================================================
platform linux -- Python 3.12.12, pytest-9.0.2, pluggy-1.6.0 -- /workspace/vllm-omni/.venv/bin/python3
cachedir: .pytest_cache
rootdir: /workspace/vllm-omni
configfile: pyproject.toml
plugins: anyio-4.12.1, asyncio-1.3.0, cov-7.0.0
asyncio: mode=Mode.AUTO, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 6 items                                                                                                                                                                                    

tests/worker/test_omni_gpu_model_runner.py::test_talker_mtp_forward_cpu_updates_inputs_and_info PASSED                                                                                         [ 16%]
tests/worker/test_omni_gpu_model_runner.py::test_talker_mtp_forward_cpu_empty_batch_noop PASSED                                                                                                [ 33%]
tests/worker/test_omni_gpu_model_runner.py::test_merge_intermediate_buffer_writes_to_buffer_and_setattr PASSED                                                                                 [ 50%]
tests/worker/test_omni_gpu_model_runner.py::test_merge_intermediate_buffer_accumulates PASSED                                                                                                  [ 66%]
tests/worker/test_omni_gpu_model_runner.py::test_merge_intermediate_buffer_skips_empty_update PASSED                                                                                           [ 83%]
tests/worker/test_omni_gpu_model_runner.py::test_merge_intermediate_buffer_skips_unknown_req_id PASSED                                                                                         [100%]

pytest tests/e2e/offline_inference/test_qwen3_omni.py -v   
========================================================================================== test session starts ===========================================================================================
platform linux -- Python 3.12.12, pytest-9.0.2, pluggy-1.6.0 -- /workspace/vllm-omni/.venv/bin/python3
cachedir: .pytest_cache
rootdir: /workspace/vllm-omni
configfile: pyproject.toml
plugins: asyncio-1.3.0, anyio-4.12.1, cov-7.0.0
asyncio: mode=Mode.AUTO, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 1 item                                                                                                                                                                                         

tests/e2e/offline_inference/test_qwen3_omni.py::test_video_to_audio[omni_runner0] 
PASSED                                                                                                           [100%]

============================================================================================ warnings summary ============================================================================================
<frozen importlib._bootstrap>:488
  <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyPacked has no __module__ attribute

<frozen importlib._bootstrap>:488
  <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyObject has no __module__ attribute

.venv/lib/python3.12/site-packages/torch/jit/_script.py:362: 14 warnings
  /workspace/vllm-omni/.venv/lib/python3.12/site-packages/torch/jit/_script.py:362: DeprecationWarning: `torch.jit.script_method` is deprecated. Please switch to `torch.compile` or `torch.export`.
    warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=============================================================================== 1 passed, 16 warnings in 400.67s (0:06:40) ===============================================================================


pytest tests/e2e/offline_inference/test_qwen2_5_omni.py -v                              
========================================================================================== test session starts ===========================================================================================
platform linux -- Python 3.12.12, pytest-9.0.2, pluggy-1.6.0 -- /workspace/vllm-omni/.venv/bin/python3
cachedir: .pytest_cache
rootdir: /workspace/vllm-omni
configfile: pyproject.toml
plugins: asyncio-1.3.0, anyio-4.12.1, cov-7.0.0
asyncio: mode=Mode.AUTO, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 2 items                                                                                                                                                                                        

tests/e2e/offline_inference/test_qwen2_5_omni.py::test_mix_to_audio[omni_runner0] PASSED                                                                                                           [ 50%]
tests/e2e/offline_inference/test_qwen2_5_omni.py::test_text_to_text[omni_runner0] PASSED                                                                                                           [100%]

============================================================================================ warnings summary ============================================================================================
<frozen importlib._bootstrap>:488
  <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyPacked has no __module__ attribute

<frozen importlib._bootstrap>:488
  <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyObject has no __module__ attribute

.venv/lib/python3.12/site-packages/torch/jit/_script.py:362: 14 warnings
  /workspace/vllm-omni/.venv/lib/python3.12/site-packages/torch/jit/_script.py:362: DeprecationWarning: `torch.jit.script_method` is deprecated. Please switch to `torch.compile` or `torch.export`.
    warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=============================================================================== 2 passed, 16 warnings in 287.08s (0:04:47) ===============================================================================

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan. Please providing the test scripts & test commands. Please state the reasons if your codes don't require additional test scripts. For test file guidelines, please check the test style doc
  • The test results. Please pasting the results comparison before and after, or e2e results.
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model. Please run mkdocs serve to sync the documentation editions to ./docs.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Signed-off-by: dsinghvi <divyanshsinghvi@gmail.com>
Signed-off-by: dsinghvi <divyanshsinghvi@gmail.com>
Copy link
Contributor

@lishunyang12 lishunyang12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few questions on the fallback logic and a potential behavior change.

try:
if getattr(new_req_data, "additional_information", None) is not None:
warnings.warn(
"additional_information on request data is deprecated, use model_intermediate_buffer",
Copy link
Contributor

@lishunyang12 lishunyang12 Feb 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These deprecation warnings fire on every request with the old field name. In high-throughput scenarios that could get noisy — worth gating with a _compat_warned flag so they only fire once per process.

Copy link
Contributor Author

@divyanshsinghvi divyanshsinghvi Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lishunyang12 Doesn't warnings.warn default filter handle it by itself? Usually I think it filters with filename,linenumber indexing?
Or the concern was because it's only being filtered but the code will run it and then only during logging it's filtered could be a problem in high throughput situation.

Though I think it's better to add flag for clarity.

Signed-off-by: dsinghvi <divyanshsinghvi@gmail.com>
Signed-off-by: dsinghvi <divyanshsinghvi@gmail.com>
Signed-off-by: dsinghvi <divyanshsinghvi@gmail.com>
Signed-off-by: dsinghvi <divyanshsinghvi@gmail.com>
Signed-off-by: dsinghvi <divyanshsinghvi@gmail.com>
Signed-off-by: dsinghvi <divyanshsinghvi@gmail.com>
Signed-off-by: dsinghvi <divyanshsinghvi@gmail.com>
@divyanshsinghvi divyanshsinghvi marked this pull request as ready for review February 23, 2026 22:31
@divyanshsinghvi divyanshsinghvi changed the title [WIP][Refactor]: Phase1 for rebasing_additional_info [Refactor]: Phase1 for rebasing_additional_info Feb 23, 2026
@divyanshsinghvi
Copy link
Contributor Author

cc: @tzhouam

@tzhouam tzhouam requested review from tzhouam and removed request for lishunyang12 February 24, 2026 01:26
@hsliuustc0106
Copy link
Collaborator

PR Review

This PR implements Phase 1 of RFC #1351 to rebase additional_information into model_intermediate_buffer.

Summary of Changes

  1. New model_intermediate_buffer — Centralized request-indexed dictionary in model runners replacing scattered additional_information_cpu setattr usage
  2. Backward compatibility — Dual-writes to both old and new paths with deprecation warnings
  3. Model layer updatesqwen3_omni.py and qwen3_tts.py check new buffer first, fall back to old kwarg
  4. Renamed merge method_merge_additional_information_update_merge_intermediate_buffer (old name aliased)
  5. Test coverage — 4 new tests for merge behavior

Strengths

  • Well-structured migration matching RFC Phase 1
  • Comprehensive backward compatibility with deprecation warnings
  • Good test coverage for the new merge logic
  • Consistent changes across GPU/NPU runners

Blocking Issues

  1. Missing e2e test results — The PR description shows empty results for pytest tests/e2e/offline_inference/test_qwen3_omni.py -v and pytest tests/e2e/offline_inference/test_qwen2_5_omni.py -v. Please run these tests and report the results before merging.

Non-blocking Concerns

  • Dual-write overhead — Every update writes to both model_intermediate_buffer and additional_information_cpu. Consider the performance impact in hot paths.
  • Alias clarity — The line _merge_additional_information_update = _merge_intermediate_buffer is a clever compatibility shim, but could confuse readers. Consider adding a comment like # Deprecated alias for backward compat.
  • Memory cleanup — The model_intermediate_buffer.pop(req_id, None) in _update_states is good, but please verify this handles all request lifecycle paths (cancellation, errors, etc.).

Verdict

Conditional Approve — The Phase 1 implementation is solid and well-aligned with the RFC. Please address the missing e2e test results before merging.

hsliuustc0106

This comment was marked as off-topic.

@tzhouam
Copy link
Collaborator

tzhouam commented Feb 24, 2026

Qwen 2.5 omni may need similar modifications

@hsliuustc0106
Copy link
Collaborator

PR Review: Phase 1 for rebasing additional_info

This PR implements Phase 1 of RFC #1351 to rebase additional_information into model_intermediate_buffer.

Summary of Changes

  1. New model_intermediate_buffer — Centralized request-indexed dictionary in model runners replacing scattered additional_information_cpu setattr usage
  2. Backward compatibility — Dual-writes to both old and new paths with deprecation warnings
  3. Model layer updatesqwen3_omni.py and qwen3_tts.py check new buffer first, fall back to old kwarg
  4. Renamed merge method_merge_additional_information_update_merge_intermediate_buffer (old name aliased)
  5. Test coverage — 4 new tests for merge behavior

Strengths

  • Well-structured migration matching RFC Phase 1
  • Comprehensive backward compatibility with deprecation warnings
  • Good test coverage for the new merge logic
  • Consistent changes across GPU/NPU runners
  • Proper cleanup of model_intermediate_buffer when requests finish

Inline Comments Posted

See specific line-level comments for:

  • Typo in deprecation warning class name (2 instances)
  • Good defensive copy in qwen3_tts.py

Non-blocking Concerns

  • Dual-write overhead — Every update writes to both model_intermediate_buffer and additional_information_cpu. Consider performance impact in hot paths during migration period.
  • Alias naming — The line _merge_additional_information_update = _merge_intermediate_buffer is a clever compatibility shim, but could confuse readers. Consider adding a comment like # Deprecated alias for backward compat.
  • Missing e2e tests — PR description shows empty results for pytest tests/e2e/offline_inference/test_qwen3_omni.py -v and pytest tests/e2e/offline_inference/test_qwen2_5_omni.py -v.

Verdict

Conditional Approve — The Phase 1 implementation is solid and well-aligned with the RFC. Please address the typo and run the missing e2e tests before merging.

Note: Once Phase 2 and Phase 3 are complete and the old paths are removed, the dual-write overhead and alias naming will become moot.

@hsliuustc0106
Copy link
Collaborator

@vllm-omni-reviewer

@divyanshsinghvi
Copy link
Contributor Author

divyanshsinghvi commented Feb 24, 2026

Qwen 2.5 omni may need similar modifications

Qwen 2.5 doesn't use runtime_additional_info directly so we won't be requiring that, it's being passed from info_dict only which should be handled by the current internal code.

Qwen3 has some info passed from info_dict and runtime_additional_info also. So had to add if else.

There are "dead" additional_info in forward() calls for both, but I think we can remove them in second stage.

Signed-off-by: dsinghvi <divyanshsinghvi@gmail.com>
This reverts commit 57814cb.

Signed-off-by: Divyansh Singhvi <divyanshsinghvi@gmail.com>
Signed-off-by: dsinghvi <divyanshsinghvi@gmail.com>
@tzhouam
Copy link
Collaborator

tzhouam commented Feb 25, 2026

@amy-why-3459 @R2-Y Please review this PR if it affects the async chunk as the runtime additional information will not be passed to scheduler in the future design.

@tzhouam tzhouam added the ready label to trigger buildkite CI label Feb 25, 2026
Signed-off-by: Zhou Taichang <tzhouam@connect.ust.hk>
Copy link
Collaborator

@tzhouam tzhouam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added some comments here.
Please see if it helps.

@tzhouam
Copy link
Collaborator

tzhouam commented Mar 1, 2026

Also please solve the conflicts then we can merge.
Thanks

Signed-off-by: dsinghvi <divyanshsinghvi@gmail.com>
@divyanshsinghvi divyanshsinghvi force-pushed the rebase_additional_info branch from 60a101d to 66ba84a Compare March 1, 2026 20:50
Signed-off-by: dsinghvi <divyanshsinghvi@gmail.com>
Signed-off-by: dsinghvi <divyanshsinghvi@gmail.com>
Signed-off-by: dsinghvi <divyanshsinghvi@gmail.com>
@tzhouam tzhouam self-requested a review March 2, 2026 02:08
Copy link
Collaborator

@tzhouam tzhouam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tzhouam tzhouam merged commit 68764cc into vllm-project:main Mar 2, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready label to trigger buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants