Skip to content

Commit c94b38c

Browse files
[Readme] EPLB Support Scenarios (#4315)
### What this PR does / why we need it? Add information on the scope of EPLB support. --------- Signed-off-by: shenchuxiaofugui <[email protected]>
1 parent 9c6d0b4 commit c94b38c

File tree

2 files changed

+14
-0
lines changed

2 files changed

+14
-0
lines changed

docs/source/user_guide/feature_guide/eplb_swift_balancer.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,13 @@ Expert balancing for MoE models in LLM serving is essential for optimal performa
1212
- Adaptive Scaling: Automatically adjusts to workload fluctuations while maintaining stable performance.
1313
- Fault Tolerance: Redundant expert placement ensures system resilience during hardware failures.
1414

15+
## Support Scenarios
16+
17+
### Models:
18+
DeepseekV3/V3.1/R1、Qwen3-MOE
19+
### MOE QuantType:
20+
W8A8-dynamic
21+
1522
## How to Use EPLB
1623

1724
### Dynamic EPLB

vllm_ascend/ops/common_fused_moe.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,8 @@
3838
from vllm_ascend.ops.expert_load_balancer import ExpertLoadBalancer
3939
from vllm_ascend.ops.moe.experts_selector import select_experts
4040
from vllm_ascend.ops.moe.moe_comm_method import setup_moe_comm_method
41+
from vllm_ascend.quantization.w8a8_dynamic import \
42+
AscendW8A8DynamicFusedMoEMethod
4143
from vllm_ascend.utils import (ACL_FORMAT_FRACTAL_NZ, enable_sp, is_310p,
4244
is_enable_nz, npu_stream_switch,
4345
shared_expert_dp_enabled,
@@ -247,6 +249,11 @@ def __init__(self, *args, **kwargs):
247249
self.moe_load = torch.zeros(local_num_experts,
248250
dtype=torch.int64).npu()
249251

252+
eplb_enable = self.dynamic_eplb or (self.expert_map_path is not None)
253+
if eplb_enable and (not isinstance(self.quant_method,
254+
AscendW8A8DynamicFusedMoEMethod)):
255+
raise ValueError("Eplb supports only w8a8_dynamic quantization.")
256+
250257
self.moe_config.num_experts = self.global_num_experts
251258
self.moe_config.num_local_experts = self.local_num_experts
252259
self.moe_config.original_num_experts = num_experts

0 commit comments

Comments
 (0)