-
Notifications
You must be signed in to change notification settings - Fork 681
[Feature] Support redundant expert for eplb #5918
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Support redundant expert for eplb #5918
Conversation
|
Thanks for your contribution! |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## develop #5918 +/- ##
==========================================
Coverage ? 67.02%
==========================================
Files ? 348
Lines ? 44673
Branches ? 6876
==========================================
Hits ? 29941
Misses ? 12520
Partials ? 2212
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
这个PR为EPLB(Expert Parallel Load Balancing)添加了冗余专家支持功能。主要改动包括将冗余专家配置从 model_config 迁移到 eplb_config,并在专家并行计算中正确使用冗余专家数量。此外,还修复了一些潜在的bug,如变量未初始化问题。
主要变更:
- 将
eplb_config初始化提前,以便在计算专家并行配置时使用冗余专家数量 - 统一将冗余专家配置源从
model_config.redundant_experts_num改为eplb_config.redundant_experts_num - 在CUDA kernel中添加对7和17个专家/rank的支持,以适配冗余专家场景
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| fastdeploy/worker/worker_process.py | 将eplb_config初始化提前,并在计算num_experts时加入redundant_experts_num |
| fastdeploy/worker/experts_manager.py | 将num_nodes从动态计算改为硬编码值8 |
| fastdeploy/model_executor/models/ernie4_5_moe.py | 将redundant_experts_num的配置源从model_config改为eplb_config |
| fastdeploy/model_executor/load_weight_utils.py | 添加模运算将冗余专家ID映射回实际专家ID以加载权重 |
| fastdeploy/model_executor/layers/moe/moe.py | 移除is_rearrange条件检查并修改state_dict访问方式 |
| fastdeploy/model_executor/layers/moe/fused_moe_backend_base.py | 将redundant_experts_num的配置源从model_config改为eplb_config |
| fastdeploy/model_executor/layers/moe/ep.py | 将redundant_experts_num的配置源从model_config改为eplb_config |
| fastdeploy/model_executor/layers/backends/xpu/moe/ep.py | 将redundant_experts_num的配置源从model_config改为eplb_config |
| fastdeploy/entrypoints/engine_client.py | 初始化content和status_code变量以避免未定义变量错误 |
| custom_ops/gpu_ops/moe/ep_moe_expert_dispatch.cu | 添加对7和17个专家/rank的支持 |
|
|
||
| for i in range(fd_config.model_config.moe_layer_start_index, fd_config.model_config.num_hidden_layers): | ||
| for j in get_expert_ranges(fd_config): | ||
| j = j % fd_config.model_config.moe_num_experts |
Copilot
AI
Jan 7, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
当 fd_config.model_config.moe_num_experts 是列表时,模运算可能会失败。代码在第237行检查 moe_num_experts 是否为列表,但在第252行的模运算中直接使用了它,这会导致 TypeError: unsupported operand type(s) for %: 'int' and 'list'。
建议修改为:
moe_num_experts = fd_config.model_config.moe_num_experts
if isinstance(moe_num_experts, list):
moe_num_experts = moe_num_experts[0]
j = j % moe_num_experts| j = j % fd_config.model_config.moe_num_experts | |
| moe_num_experts = fd_config.model_config.moe_num_experts | |
| if isinstance(moe_num_experts, list): | |
| moe_num_experts = moe_num_experts[0] | |
| j = j % moe_num_experts |
fastdeploy/worker/experts_manager.py
Outdated
|
|
||
| self.num_replicas = self.num_expert + self.redundant_experts_num | ||
| self.num_nodes = max(ep_size // 8, 1) | ||
| self.num_nodes = 8 |
Copilot
AI
Jan 7, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
将 num_nodes 从 max(ep_size // 8, 1) 硬编码为 8 可能会导致在 ep_size 小于 8 的场景下出现问题。原来的逻辑确保了至少有 1 个节点,而新的硬编码值 8 可能不适用于所有部署配置。建议要么恢复动态计算,要么添加验证逻辑确保 ep_size >= 8,或者在代码注释中说明为什么硬编码为 8 是合理的。
| self.num_nodes = 8 | |
| # Determine number of nodes based on ep_size to support small ep_size values | |
| self.num_nodes = max(ep_size // 8, 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.
Comments suppressed due to low confidence (1)
fastdeploy/entrypoints/engine_client.py:646
- 在函数开始处初始化
content, status_code = None, HTTPStatus.OK是一个改进,确保了这两个变量总是有定义的值。但是,需要注意在 line 644-646 的逻辑中,当 content 为 None 且 "ips" 不在请求中时会设置错误,但此时 status_code 已经在 line 643 被设置为 BAD_REQUEST。这样会导致 line 648 的检查可能使用不一致的状态。建议在设置 content 时始终同步设置 status_code,或者重构逻辑使其更清晰。
content, status_code = None, HTTPStatus.OK
eplb_config = self.fd_config.eplb_config
if not eplb_config.enable_eplb:
content = {"code": 1, "msg": "redundant expert is disabled"}
status_code = HTTPStatus.BAD_REQUEST
return content, status_code
if (
request_dict.get("user", "") != eplb_config.redundant_expert_api_user
or request_dict.get("passwd", "") != eplb_config.redundant_expert_api_password
):
content = {"code": 1, "msg": "user or passwd is invalid"}
status_code = HTTPStatus.UNAUTHORIZED
return content, status_code
if self.fd_config.parallel_config.tensor_parallel_rank != 0:
content = {
"code": 1,
"msg": f"actual rank {self.fd_config.parallel_config.tensor_parallel_rank}, expect rank 0",
}
status_code = HTTPStatus.BAD_REQUEST
return content, status_code
action = request_dict.get("action", "")
api_server_logger.info(f"redundant_expert: rearrange_experts recv request, action {action}")
if action == "":
# action: start rearrange experts
# params: {'user': 'xxx', 'passwd': 'xxx', 'ips': ['10.54.99.77:8000', '10.54.99.77:8300']}
if self.rearrange_experts_signal.value[0] != RearrangeExpertStatus.FREE.value:
content = {
"code": 1,
"msg": f"rearrange is doing. actual status {self.rearrange_experts_signal.value[0]}, expect status {RearrangeExpertStatus.FREE.value}",
}
status_code = HTTPStatus.BAD_REQUEST
if "ips" not in request_dict and content is None:
content = {"code": 1, "msg": "ips in request is None"}
status_code = HTTPStatus.BAD_REQUEST
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.
| """ | ||
| logical_expert_ids = [ | ||
| i | ||
| i % self.fd_config.model_config.moe_num_experts |
Copilot
AI
Jan 8, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
添加了模运算来将冗余专家 ID 映射回实际的专家 ID,这是一个关键的逻辑变更。建议在此处添加英文注释说明为什么需要这个映射,例如:"Map redundant expert IDs back to actual expert IDs since redundant experts share weights with actual experts"。
| moe_num_experts = fd_config.model_config.moe_num_experts | ||
| if isinstance(moe_num_experts, list): | ||
| moe_num_experts = moe_num_experts[0] | ||
| j = j % moe_num_experts |
Copilot
AI
Jan 8, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
在循环内部重复执行 moe_num_experts 的提取和类型检查可能效率较低。建议将第 253-256 行的逻辑移到循环外部,在开始迭代之前执行一次即可。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.
…into eplb_redun
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 11 out of 11 changed files in this pull request and generated 5 comments.
| up_gate_proj_expert_weight_key.format(logical_expert_ids[0] if is_rearrange else self.expert_id_offset) | ||
| in state_dict | ||
| ) | ||
| is_ffn_merged = up_gate_proj_expert_weight_key.format(logical_expert_ids[0]) in state_dict |
Copilot
AI
Jan 9, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
移除了 is_rearrange 参数的检查,现在只要 redundant_table_manger 不为 None 就会执行重排逻辑。这个改动简化了逻辑,但需要确保所有调用此方法的地方都已更新,不再传递 is_rearrange 参数。此外,在第 495 行生成 is_ffn_merged 的检查中,也移除了对 is_rearrange 的依赖,确保逻辑的一致性。建议验证此改动不会影响非重排场景下的权重加载行为。
| is_ffn_merged = up_gate_proj_expert_weight_key.format(logical_expert_ids[0]) in state_dict | |
| if logical_expert_ids: | |
| first_expert_key = up_gate_proj_expert_weight_key.format(logical_expert_ids[0]) | |
| is_ffn_merged = first_expert_key in state_dict | |
| else: | |
| # No local experts found, fall back to non-merged FFN loading path | |
| is_ffn_merged = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.
Motivation
--eplb-config '{"redundant_experts_num": 32, "redundant_expert_async_load_model_shmem_size_gb": 10}'
Modifications
Usage or Command
Accuracy Tests
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.