Skip to content

[BugFix][Core] Fix a bug running multi-modal with ascend_scheduler#3675

Merged
wangxiyuan merged 1 commit intovllm-project:mainfrom
whx-sjtu:fix_ascend_scheduler
Oct 25, 2025
Merged

[BugFix][Core] Fix a bug running multi-modal with ascend_scheduler#3675
wangxiyuan merged 1 commit intovllm-project:mainfrom
whx-sjtu:fix_ascend_scheduler

Conversation

@whx-sjtu
Copy link
Copy Markdown
Collaborator

@whx-sjtu whx-sjtu commented Oct 23, 2025

This PR fix the bug related with running multi-modal models with AscendScheduler. This bug was introduced by PR #2372 by using the same parameter names as vLLM with different default values. The error is as following:

Details
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597] WorkerProc failed to start.
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597] Traceback (most recent call last):
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]   File "/mnt/share/whx/repos/vllm-main/vllm/v1/executor/multiproc_executor.py", line 571, in worker_main
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]     worker = WorkerProc(*args, **kwargs)
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]   File "/mnt/share/whx/repos/vllm-main/vllm/v1/executor/multiproc_executor.py", line 437, in __init__
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]     self.worker.load_model()
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]   File "/mnt/share/whx/repos/vllm-ascend-main/vllm_ascend/worker/worker_v1.py", line 307, in load_model
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]     self.model_runner.load_model()
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]   File "/mnt/share/whx/repos/vllm-ascend-main/vllm_ascend/worker/model_runner_v1.py", line 2656, in load_model
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]     self.model = get_model(vllm_config=self.vllm_config)
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]   File "/mnt/share/whx/repos/vllm-main/vllm/model_executor/model_loader/__init__.py", line 119, in get_model
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]     return loader.load_model(vllm_config=vllm_config,
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]   File "/mnt/share/whx/repos/vllm-main/vllm/model_executor/model_loader/base_loader.py", line 45, in load_model
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]     model = initialize_model(vllm_config=vllm_config,
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]   File "/mnt/share/whx/repos/vllm-main/vllm/model_executor/model_loader/utils.py", line 63, in initialize_model
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]     return model_class(vllm_config=vllm_config, prefix=prefix)
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]   File "/mnt/share/whx/repos/vllm-ascend-main/vllm_ascend/models/qwen2_5_vl.py", line 513, in __init__
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]     super().__init__(vllm_config=vllm_config, prefix=prefix)
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]   File "/mnt/share/whx/repos/vllm-main/vllm/model_executor/models/qwen2_5_vl.py", line 1023, in __init__
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]     self.language_model = init_vllm_registered_model(
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]   File "/mnt/share/whx/repos/vllm-main/vllm/model_executor/models/utils.py", line 316, in init_vllm_registered_model
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]     vllm_config = vllm_config.with_hf_config(hf_config,
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]   File "/mnt/share/whx/repos/vllm-main/vllm/config/__init__.py", line 300, in with_hf_config
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]     return replace(self, model_config=model_config)
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]   File "/usr/local/python3.11.10/lib/python3.11/dataclasses.py", line 1503, in replace
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]     return obj.__class__(**changes)
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]            ^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]   File "/usr/local/python3.11.10/lib/python3.11/site-packages/pydantic/_internal/_dataclasses.py", line 123, in __init__
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]     s.__pydantic_validator__.validate_python(ArgsKwargs(args, kwargs), self_instance=s)
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597] pydantic_core._pydantic_core.ValidationError: 1 validation error for VllmConfig
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597] scheduler_config
(EngineCore_DP0 pid=1339259) (Worker_TP2 pid=1339276) ERROR 10-23 15:55:51 [multiproc_executor.py:597]   Value error, max_long_partial_prefills (2147483647) must be greater than or equal to 1 and less than or equal to max_num_partial_prefills (1). [type=value_error, input_value=AscendSchedulerConfig(run..., decode_max_num_seqs=0), input_type=AscendSchedulerConfig]

Currently I fix this bug by changing the default values of these two parameters to align with vLLM. Please take a look: @Csrayz @frankie-ys @xueliangyang-oeuler @wangxiyuan

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This PR fixes a bug when running multi-modal models with AscendScheduler. The fix involves renaming configuration parameters in AscendSchedulerConfig to avoid conflicts with the base SchedulerConfig from vLLM. This is a good approach. The implementation looks correct, but I found that one of the new parameter names is misleading, which could lead to confusion and incorrect usage. I've provided a suggestion to improve the naming for better clarity and maintainability.

@github-actions
Copy link
Copy Markdown
Contributor

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

@github-actions github-actions bot added the documentation Improvements or additions to documentation label Oct 23, 2025
@wangxiyuan
Copy link
Copy Markdown
Collaborator

@Csrayz can you take a look at this change?

@whx-sjtu whx-sjtu force-pushed the fix_ascend_scheduler branch from 86263c5 to 0349604 Compare October 24, 2025 02:21
@github-actions github-actions bot removed the documentation Improvements or additions to documentation label Oct 24, 2025
@whx-sjtu
Copy link
Copy Markdown
Collaborator Author

whx-sjtu commented Oct 24, 2025

@Csrayz can you take a look at this change?

Today I found that the meaning of these two parameters is aligned with vLLM. The problem lies in default values. We should just align the default valuse with vLLM. cc @Csrayz @wangxiyuan

@whx-sjtu whx-sjtu force-pushed the fix_ascend_scheduler branch 2 times, most recently from 95cd053 to ffee8b2 Compare October 24, 2025 06:19
@wangxiyuan wangxiyuan added the ready read for review label Oct 24, 2025
@whx-sjtu whx-sjtu added the ready-for-test start test by label for PR label Oct 24, 2025
Signed-off-by: hw_whx <wanghexiang7@huawei.com>
@whx-sjtu whx-sjtu force-pushed the fix_ascend_scheduler branch from 79ab771 to 2148652 Compare October 24, 2025 16:08
@wangxiyuan wangxiyuan merged commit e33751e into vllm-project:main Oct 25, 2025
26 of 28 checks passed
luolun pushed a commit to luolun/vllm-ascend that referenced this pull request Nov 19, 2025
…llm-project#3675)

This PR fix the bug related with running multi-modal models with
AscendScheduler. This bug was introduced by PR vllm-project#2372 by using the same
parameter names as vLLM with different default values. 

Currently I fix this bug by changing the default values of these two
parameters to align with vLLM. 

- vLLM version: v0.11.0rc3
- vLLM main:
vllm-project/vllm@17c540a

Signed-off-by: hw_whx <wanghexiang7@huawei.com>
Co-authored-by: hw_whx <wanghexiang7@huawei.com>
Signed-off-by: luolun <luolun1995@cmbchina.com>
hwhaokun pushed a commit to hwhaokun/vllm-ascend that referenced this pull request Nov 19, 2025
…llm-project#3675)

This PR fix the bug related with running multi-modal models with
AscendScheduler. This bug was introduced by PR vllm-project#2372 by using the same
parameter names as vLLM with different default values.

Currently I fix this bug by changing the default values of these two
parameters to align with vLLM.

- vLLM version: v0.11.0rc3
- vLLM main:
vllm-project/vllm@17c540a

Signed-off-by: hw_whx <wanghexiang7@huawei.com>
Co-authored-by: hw_whx <wanghexiang7@huawei.com>
Signed-off-by: hwhaokun <haokun0405@163.com>
NSDie pushed a commit to NSDie/vllm-ascend that referenced this pull request Nov 24, 2025
…llm-project#3675)

This PR fix the bug related with running multi-modal models with
AscendScheduler. This bug was introduced by PR vllm-project#2372 by using the same
parameter names as vLLM with different default values.

Currently I fix this bug by changing the default values of these two
parameters to align with vLLM.

- vLLM version: v0.11.0rc3
- vLLM main:
vllm-project/vllm@17c540a

Signed-off-by: hw_whx <wanghexiang7@huawei.com>
Co-authored-by: hw_whx <wanghexiang7@huawei.com>
Signed-off-by: nsdie <yeyifan@huawei.com>
Clorist33 pushed a commit to Clorist33/vllm-ascend that referenced this pull request Dec 10, 2025
…llm-project#3675)

This PR fix the bug related with running multi-modal models with
AscendScheduler. This bug was introduced by PR vllm-project#2372 by using the same
parameter names as vLLM with different default values. 

Currently I fix this bug by changing the default values of these two
parameters to align with vLLM. 

- vLLM version: v0.11.0rc3
- vLLM main:
vllm-project/vllm@17c540a

Signed-off-by: hw_whx <wanghexiang7@huawei.com>
Co-authored-by: hw_whx <wanghexiang7@huawei.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

module:tests ready read for review ready-for-test start test by label for PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants