[Cherry-Pick][BugFix] Support redundant expert for eplb (#5918) #5922

xiaoxiaohehe001 · 2026-01-07T06:50:32Z

Motivation

Support redundant expert for eplb
启动服务时添加
--eplb-config '{"redundant_experts_num": 32, "redundant_expert_async_load_model_shmem_size_gb": 10}'
[Cherry-Pick][BugFix] Support redundant expert for eplb ([Feature] Support redundant expert for eplb #5918)

Modifications

Usage or Command

Accuracy Tests

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

…nd optimize config (PaddlePaddle#5213) * update extract_mtp_model * modify config usage

* fix ds type bug * update code

…dlePaddle#5227) * [CI] Add Cherry-Pick PR check logic (cherry picked from commit 8bc2e13) * [Cherry-Pick][CI] Add check trigger and logic

* fix eplb noaux * fix eplb noaux

…path(PaddlePaddle#5205) (PaddlePaddle#5231) * refactor(mtp): split draft_tokens into standalone post-processing path for MTP + logprobs * Restore Request.__repr__ implementation * ci * add envs * fix unittest

)

…ddle#5134) (PaddlePaddle#5256) * [Feature] suppert flash_mask_attention backend * fix unittest * clean code

…ddle#5277) (PaddlePaddle#5281) * Set pip global index URL to Tsinghua mirror * Update Docker image tag in CI workflow

…AX_PREFILL_NUM (PaddlePaddle#5316)

…ddle#5347) * fix mm to_dict bug * pd support async download * update code * update test case * update log * Revert "update log" This reverts commit 6e88315. * update code * fix mtp bug

…e#5338 (PaddlePaddle#5395) * [Optimize] Robust stabilty for PD deployment --------- Co-authored-by: Kaipeng Deng <[email protected]>

…lePaddle#5282 PaddlePaddle#5429) (PaddlePaddle#5535) * [Quantization] Support w4afp8 MoE dynamic quantization (PaddlePaddle#5282) * support dynamic activation quant for w4afp8 * support dynamic w4afp8 * add test * fix * fix --------- Co-authored-by: zhoutianzi666 <[email protected]> * support w4afp8 mtp (PaddlePaddle#5429) * fix ep --------- Co-authored-by: zhoutianzi666 <[email protected]>

) (PaddlePaddle#5589) * [CI] Add Cherry-Pick PR check logic (cherry picked from commit 8bc2e13) * [Cherry-Pick][CI] Add check trigger and logic * [CI] Adapt vl_model baseline changes due to Paddle update (PaddlePaddle#5576) * [Cherry-Pick][CI] Adape unit_test due to Paddle update(5576)

…dlePaddle#5590 (PaddlePaddle#5591) * fix bug * fix bug --------- Co-authored-by: YuBaoku <[email protected]>

…ddle#5624) (PaddlePaddle#5695) * support multi-step mtp with cudagraph * fix usage * fix unit test

… & load and DeepEP low latency two stage(PaddlePaddle#5613 PaddlePaddle#5608) (PaddlePaddle#5677) * support w4afp8 moe offline permute & load (PaddlePaddle#5613) * support w4afp8 two stage (PaddlePaddle#5608) * fix

Co-authored-by: Jiang-Jia-Jun <[email protected]>

…e update(PaddlePaddle#5732) (PaddlePaddle#5734) * [Cherry-Pick][CI] Revert adapt vl_model baseline changes due to Paddle update(PaddlePaddle#5732)

…Paddle#5723) (PaddlePaddle#5724)

…ed and PD-split modes (PaddlePaddle#5738) (PaddlePaddle#5792) * fix attn_mask_offset in mtp with multi-step and pd-split-mode * fix xpu operater register * update pmtp multi-step mtp strategy in d-split -mode * add note * fix xpu register

* initial commit * remove split_batch_decoder_layers1 * format code * restore /root/paddlejob/workspace/env_run/output/zkk/DBO/FastDeploy/fastdeploy/model_executor/layers/normalization.py * restore rms --------- Co-authored-by: yangjianfeng01 <[email protected]>

paddle-bot · 2026-01-07T06:50:38Z

Thanks for your contribution!

freeliuzc and others added 30 commits November 25, 2025 14:25

fix kernel output extract (PaddlePaddle#5212)

e581b7d

[Speculative Decoding][Cherry Pick]Update extract_mtp_weight script a…

a11d17c

…nd optimize config (PaddlePaddle#5213) * update extract_mtp_model * modify config usage

[BugFix][Cherry Pick] fix ds type bug (PaddlePaddle#5220)

e0c7ebf

* fix ds type bug * update code

[Cherry-Pick][CI] Add check trigger and logic(PaddlePaddle#5191) (Pad…

49be443

…dlePaddle#5227) * [CI] Add Cherry-Pick PR check logic (cherry picked from commit 8bc2e13) * [Cherry-Pick][CI] Add check trigger and logic

[Cherry-Pick] Fix eplb noaux(PaddlePaddle#5239) (PaddlePaddle#5240)

7107533

* fix eplb noaux * fix eplb noaux

fix pd-split first step bug (PaddlePaddle#5246)

bdcc952

[Cherry-Pick] MTP split draft_tokens into standalone post-processing …

3d74a4b

…path(PaddlePaddle#5205) (PaddlePaddle#5231) * refactor(mtp): split draft_tokens into standalone post-processing path for MTP + logprobs * Restore Request.__repr__ implementation * ci * add envs * fix unittest

cp_fix_bug (PaddlePaddle#5253)

69b4d05

Add method to disable sequence parallel MoE if needed (PaddlePaddle#5268

9b0c65b

)

[Cherry-Pick][Feature] support flash_mask_attention backend(PaddlePa…

fd1313c

…ddle#5134) (PaddlePaddle#5256) * [Feature] suppert flash_mask_attention backend * fix unittest * clean code

[Cherry-pick][XPU][CI] Set pip index URL to Tsinghua mirror (PaddlePa…

89ed1a9

…ddle#5277) (PaddlePaddle#5281) * Set pip global index URL to Tsinghua mirror * Update Docker image tag in CI workflow

Update load_weight_utils.py (PaddlePaddle#5285)

b990644

fix mm to_dict bug (PaddlePaddle#5299)

f1e1f5d

[Optimization] 1.fix tp+ep moe_forward; 2.set max_prefill_batch=env.M…

04b2c43

…AX_PREFILL_NUM (PaddlePaddle#5316)

supports mtp split_kv_attn (PaddlePaddle#5344)

cae2c1c

[Cherry-Pick][BugFix] Fix async download(PaddlePaddle#5349) (PaddlePa…

9b5b08c

…ddle#5347) * fix mm to_dict bug * pd support async download * update code * update test case * update log * Revert "update log" This reverts commit 6e88315. * update code * fix mtp bug

[Others] Maintain the mtp branch temporarily. (PaddlePaddle#5447)

f08fb25

fix limit_thinking bug (PaddlePaddle#5477)

c5973c2

fix attention bug in spec decoding (PaddlePaddle#5480)

6715196

[CI] disable test_cuda_graph_dynamic_subgraph.py in unit_test

f133ce5

[Optimize][Cherry-pick] Robust stabilty for PD deployment PaddlePaddl…

4c76171

…e#5338 (PaddlePaddle#5395) * [Optimize] Robust stabilty for PD deployment --------- Co-authored-by: Kaipeng Deng <[email protected]>

[Cherry-Pick][BugFix] fix speculate_limit_thinking_content_length Pad…

e65000a

…dlePaddle#5590 (PaddlePaddle#5591) * fix bug * fix bug --------- Co-authored-by: YuBaoku <[email protected]>

[Speculative Decoding]Support multi-step mtp with cudagraph (PaddlePa…

52280be

…ddle#5624) (PaddlePaddle#5695) * support multi-step mtp with cudagraph * fix usage * fix unit test

fix eplb weight updating (PaddlePaddle#5529) (PaddlePaddle#5661)

1b74540

Co-authored-by: Jiang-Jia-Jun <[email protected]>

[Cherry-Pick][CI] Revert adapt vl_model baseline changes due to Paddl…

db774a6

…e update(PaddlePaddle#5732) (PaddlePaddle#5734) * [Cherry-Pick][CI] Revert adapt vl_model baseline changes due to Paddle update(PaddlePaddle#5732)

[Cherry-Pick][CI]Fix multistep MTP in splitewise-prefill mode (Paddle…

b018c49

…Paddle#5723) (PaddlePaddle#5724)

commit (PaddlePaddle#5791)

d054cf6

freeliuzc and others added 6 commits December 26, 2025 17:00

[CI] Remove useless cases in 1131 and fix XPU (PaddlePaddle#5801)

9807f2b

fix quant (PaddlePaddle#5837)

44cbf2e

add del to decrease peak memory (PaddlePaddle#5862)

7aea651

[Cherry-Pick] Support redundant expert for eplb

20ef041

xiaoxiaohehe001 had a problem deploying to Metax_ci January 7, 2026 06:50 — with GitHub Actions Failure

xiaoxiaohehe001 had a problem deploying to Metax_ci January 7, 2026 06:51 — with GitHub Actions Failure

xiaoxiaohehe001 closed this Jan 7, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Cherry-Pick][BugFix] Support redundant expert for eplb (#5918) #5922

[Cherry-Pick][BugFix] Support redundant expert for eplb (#5918) #5922

Uh oh!

xiaoxiaohehe001 commented Jan 7, 2026

Uh oh!

paddle-bot bot commented Jan 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

13 participants

[Cherry-Pick][BugFix] Support redundant expert for eplb (#5918) #5922

[Cherry-Pick][BugFix] Support redundant expert for eplb (#5918) #5922

Uh oh!

Conversation

xiaoxiaohehe001 commented Jan 7, 2026

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

paddle-bot bot commented Jan 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

13 participants