Skip to content

Commit 050d202

Browse files
authored
[Quickfix] Fix dp+ep+tp error when sp chunked the hidden_states (#3246)
### What this PR does / why we need it? Fix dp+ep+tp inplace copy error when sp chunked the `hidden_states`. ### How was this patch tested? test locally with the following scripts ```bash python examples/offline_data_parallel.py \ --model="Qwen/Qwen3-30B-A3B" \ --dp-size=2 \ --tp-size=2 \ --enable-expert-parallel ``` Signed-off-by: MengqingCao <[email protected]>
1 parent cf445c4 commit 050d202

File tree

2 files changed

+4
-0
lines changed

2 files changed

+4
-0
lines changed

vllm_ascend/ops/fused_moe.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -295,6 +295,7 @@ def __init__(
295295
in_dtype=params_dtype,
296296
)
297297
self.moe_config = moe
298+
# TODO: The self.moe_config.tp_size here is not correct, fixme soon
298299

299300
if quant_config is None:
300301
self.quant_method = AscendUnquantizedFusedMoEMethod(moe)

vllm_ascend/platform.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@
1616
#
1717

1818
import gc
19+
import os
1920
from datetime import timedelta
2021
from typing import TYPE_CHECKING, Optional, Tuple
2122

@@ -260,6 +261,8 @@ def check_and_update_config(cls, vllm_config: VllmConfig) -> None:
260261
compilation_config.level = CompilationLevel.NO_COMPILATION
261262

262263
if parallel_config and parallel_config.worker_cls == "auto":
264+
# TODO: this is a tricky way to disable `use_sequence_parallel_moe` in vllm.
265+
os.environ["VLLM_ALL2ALL_BACKEND"] = "flashinfer_all2allv"
263266
if ascend_config.torchair_graph_config.enabled or ascend_config.enable_shared_expert_dp:
264267
parallel_config.worker_cls = "vllm_ascend.torchair.torchair_worker.NPUTorchairWorker"
265268
else:

0 commit comments

Comments
 (0)