-
Notifications
You must be signed in to change notification settings - Fork 681
[Cherry-Pick][BugFix] Support redundant expert for eplb (#5918) #5922
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…nd optimize config (PaddlePaddle#5213) * update extract_mtp_model * modify config usage
* fix ds type bug * update code
…dlePaddle#5227) * [CI] Add Cherry-Pick PR check logic (cherry picked from commit 8bc2e13) * [Cherry-Pick][CI] Add check trigger and logic
* fix eplb noaux * fix eplb noaux
…path(PaddlePaddle#5205) (PaddlePaddle#5231) * refactor(mtp): split draft_tokens into standalone post-processing path for MTP + logprobs * Restore Request.__repr__ implementation * ci * add envs * fix unittest
…ddle#5134) (PaddlePaddle#5256) * [Feature] suppert flash_mask_attention backend * fix unittest * clean code
…ddle#5277) (PaddlePaddle#5281) * Set pip global index URL to Tsinghua mirror * Update Docker image tag in CI workflow
…ddle#5347) * fix mm to_dict bug * pd support async download * update code * update test case * update log * Revert "update log" This reverts commit 6e88315. * update code * fix mtp bug
…e#5338 (PaddlePaddle#5395) * [Optimize] Robust stabilty for PD deployment --------- Co-authored-by: Kaipeng Deng <[email protected]>
…lePaddle#5282 PaddlePaddle#5429) (PaddlePaddle#5535) * [Quantization] Support w4afp8 MoE dynamic quantization (PaddlePaddle#5282) * support dynamic activation quant for w4afp8 * support dynamic w4afp8 * add test * fix * fix --------- Co-authored-by: zhoutianzi666 <[email protected]> * support w4afp8 mtp (PaddlePaddle#5429) * fix ep --------- Co-authored-by: zhoutianzi666 <[email protected]>
) (PaddlePaddle#5589) * [CI] Add Cherry-Pick PR check logic (cherry picked from commit 8bc2e13) * [Cherry-Pick][CI] Add check trigger and logic * [CI] Adapt vl_model baseline changes due to Paddle update (PaddlePaddle#5576) * [Cherry-Pick][CI] Adape unit_test due to Paddle update(5576)
…dlePaddle#5590 (PaddlePaddle#5591) * fix bug * fix bug --------- Co-authored-by: YuBaoku <[email protected]>
…ddle#5624) (PaddlePaddle#5695) * support multi-step mtp with cudagraph * fix usage * fix unit test
… & load and DeepEP low latency two stage(PaddlePaddle#5613 PaddlePaddle#5608) (PaddlePaddle#5677) * support w4afp8 moe offline permute & load (PaddlePaddle#5613) * support w4afp8 two stage (PaddlePaddle#5608) * fix
Co-authored-by: Jiang-Jia-Jun <[email protected]>
…e update(PaddlePaddle#5732) (PaddlePaddle#5734) * [Cherry-Pick][CI] Revert adapt vl_model baseline changes due to Paddle update(PaddlePaddle#5732)
…ed and PD-split modes (PaddlePaddle#5738) (PaddlePaddle#5792) * fix attn_mask_offset in mtp with multi-step and pd-split-mode * fix xpu operater register * update pmtp multi-step mtp strategy in d-split -mode * add note * fix xpu register
* initial commit * remove split_batch_decoder_layers1 * format code * restore /root/paddlejob/workspace/env_run/output/zkk/DBO/FastDeploy/fastdeploy/model_executor/layers/normalization.py * restore rms --------- Co-authored-by: yangjianfeng01 <[email protected]>
|
Thanks for your contribution! |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation
--eplb-config '{"redundant_experts_num": 32, "redundant_expert_async_load_model_shmem_size_gb": 10}'
Modifications
Usage or Command
Accuracy Tests
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.