-
Notifications
You must be signed in to change notification settings - Fork 681
[Cherry-Pick][Feature] Support redundant expert for eplb (#5918) #5923
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Cherry-Pick][Feature] Support redundant expert for eplb (#5918) #5923
Conversation
|
Thanks for your contribution! |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## release/online/20251131 #5923 +/- ##
==========================================================
Coverage ? 58.46%
==========================================================
Files ? 320
Lines ? 39204
Branches ? 5918
==========================================================
Hits ? 22921
Misses ? 14443
Partials ? 1840
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 10 out of 10 changed files in this pull request and generated 5 comments.
| self.num_hidden_layers = num_hidden_layers | ||
|
|
||
| self.num_replicas = self.num_expert + self.redundant_experts_num | ||
| self.num_nodes = max(ep_size // 8, 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个改动的目的是啥?那如果是单机呢
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这部分修改就是为了适配单机/多机,测试来看num_nodes=8能够保证初始化专家排布的正确性
gongshaotian
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
RichardWooSJTU
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
c2ad0a9
into
PaddlePaddle:release/online/20251131
Motivation
--eplb-config '{"redundant_experts_num": 32, "redundant_expert_async_load_model_shmem_size_gb": 10}'
Modifications
Usage or Command
Accuracy Tests
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.