Commit 950c4b2

authored

[main] refactor alltoallv in fused_moe (#2487)

### What this PR does / why we need it? Refactor all2all-related fused_experts (both quantized/unquantized) into TokenDispatcherWithAll2AllV, including dispatch & combine calculation. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? E2E & UT - vLLM version: v0.10.0 - vLLM main: vllm-project/vllm@65197a5 Signed-off-by: Pr0Wh1teGivee <[email protected]>

1 parent 4af5b80 commit 950c4b2Copy full SHA for 950c4b2

2 files changed

+560

-107

lines changed

tests/ut/ops
- test_token_dispatcher.py
vllm_ascend/ops/moe_dispatcher
- token_dispatcher.py

2 files changed

+560

-107

lines changed

Comments

(0)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit 950c4b2

2 files changed

2 files changed

File tree

2 files changed

2 files changed

0 commit comments