Skip to content

Conversation

@xiaoxiaohehe001
Copy link
Collaborator

Motivation

  • Support redundant expert for eplb
  • 启动服务时添加
    --eplb-config '{"redundant_experts_num": 32, "redundant_expert_async_load_model_shmem_size_gb": 10}'
  • [Cherry-Pick][BugFix] Support redundant expert for eplb ([Feature] Support redundant expert for eplb #5918)

Modifications

Usage or Command

Accuracy Tests

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

freeliuzc and others added 30 commits November 25, 2025 14:25
…nd optimize config (PaddlePaddle#5213)

* update extract_mtp_model

* modify config usage
…dlePaddle#5227)

* [CI] Add Cherry-Pick PR check logic

(cherry picked from commit 8bc2e13)

* [Cherry-Pick][CI] Add check trigger and logic
…path(PaddlePaddle#5205) (PaddlePaddle#5231)

* refactor(mtp): split draft_tokens into standalone post-processing path for MTP + logprobs

* Restore Request.__repr__ implementation

* ci

* add envs

* fix unittest
…ddle#5134) (PaddlePaddle#5256)

* [Feature] suppert flash_mask_attention backend

* fix unittest

* clean code
…ddle#5277)  (PaddlePaddle#5281)

* Set pip global index URL to Tsinghua mirror

* Update Docker image tag in CI workflow
…ddle#5347)

* fix mm to_dict bug

* pd support async download

* update code

* update test case

* update log

* Revert "update log"

This reverts commit 6e88315.

* update code

* fix mtp bug
…e#5338 (PaddlePaddle#5395)

* [Optimize] Robust stabilty for PD deployment

---------

Co-authored-by: Kaipeng Deng <[email protected]>
…lePaddle#5282 PaddlePaddle#5429) (PaddlePaddle#5535)

* [Quantization] Support w4afp8 MoE dynamic quantization (PaddlePaddle#5282)

* support dynamic activation quant for w4afp8

* support dynamic w4afp8

* add test

* fix

* fix

---------

Co-authored-by: zhoutianzi666 <[email protected]>

* support w4afp8 mtp (PaddlePaddle#5429)

* fix ep

---------

Co-authored-by: zhoutianzi666 <[email protected]>
) (PaddlePaddle#5589)

* [CI] Add Cherry-Pick PR check logic

(cherry picked from commit 8bc2e13)

* [Cherry-Pick][CI] Add check trigger and logic

* [CI] Adapt vl_model baseline changes due to Paddle update (PaddlePaddle#5576)

* [Cherry-Pick][CI] Adape unit_test due to Paddle update(5576)
…ddle#5624) (PaddlePaddle#5695)

* support multi-step mtp with cudagraph

* fix usage

* fix unit test
… & load and DeepEP low latency two stage(PaddlePaddle#5613 PaddlePaddle#5608) (PaddlePaddle#5677)

* support w4afp8 moe offline permute & load (PaddlePaddle#5613)

* support w4afp8 two stage (PaddlePaddle#5608)

* fix
…e update(PaddlePaddle#5732) (PaddlePaddle#5734)

* [Cherry-Pick][CI] Revert adapt vl_model baseline changes due to Paddle update(PaddlePaddle#5732)
freeliuzc and others added 6 commits December 26, 2025 17:00
…ed and PD-split modes (PaddlePaddle#5738) (PaddlePaddle#5792)

* fix attn_mask_offset in mtp with multi-step and pd-split-mode

* fix xpu operater register

* update pmtp multi-step mtp strategy in d-split -mode

* add note

* fix xpu register
* initial commit

* remove split_batch_decoder_layers1

* format code

* restore /root/paddlejob/workspace/env_run/output/zkk/DBO/FastDeploy/fastdeploy/model_executor/layers/normalization.py

* restore rms
---------

Co-authored-by: yangjianfeng01 <[email protected]>
@paddle-bot
Copy link

paddle-bot bot commented Jan 7, 2026

Thanks for your contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.