feat: support muon optimizer by xxyux · Pull Request #3984 · PaddlePaddle/PaddleFormers

xxyux · 2026-03-03T13:45:37Z

Before submitting

Lint code. If there are lint issues, please format the code first.

# Install and register `pre-commit` in the project folder
pip install pre-commit && pre-commit install

# Process previous code files separately
pre-commit run --file XXXX.py

Add test cases into tests folder. If there are codecov issues, please add tests cases first.

PR types

New features

PR changes

Others

Description

Integrate the Muon optimizer into PaddleFormers trainer and add ShardingV3
distributed training support.

Muon optimizer integration

trainer.py: create paddle.optimizer.Muon when optim=muon; annotate
fused QKV weights with per-head metadata (needs_qkv_split, head_num,
kv_head_num) for per-head orthogonalisation; handle Muon's
_moment_acc_str (vs AdamW's _moment1_acc_str) in optimizer state
save/restore
trainer_utils.py: add OptimizerNames.MUON and Muon optimizer
construction logic with default hyperparameters
training_args.py: register muon as a valid optimizer choice
offload_optimizer.py: monkey-patch Muon's _muon_update and
_apply_optimize for CPU offload support

ShardingV3 support

training_args.py: add sharding_v3 boolean argument, propagated via
FLAGS_sharding_v3 environment variable
trainer_utils.py: DygraphShardingOptimizerV3 initialisation path
reshard/sharding_v3.py (new): V3-specific checkpoint reshard logic for
save/restore with full-parameter ownership model
reshard/common.py: add SHARDING_STRATEGY_V3 constant
sharding_io.py: adapt optimizer state unwrapping for V3
zero_cost_checkpoint.py: adapt EMA and buffer handling for V3
moe_hybrid_parallel_optimizer.py: V3 optimizer routing for MoE

Tests

tests/muon/test_muon_smoke.py: smoke tests exercising both ShardingV2 and
ShardingV3 code paths on 2 GPUs with AMP O2, validating loss is finite
across 3 training steps

paddle-bot · 2026-03-03T13:45:44Z

Thanks for your contribution!

xxyux · 2026-03-04T06:05:54Z

/re-run all-failed

xxyux · 2026-03-09T07:43:26Z

/re-run all-failed

codecov-commenter · 2026-03-09T07:55:40Z

Codecov Report

❌ Patch coverage is 14.91228% with 194 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@ed15c99). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
paddleformers/trainer/utils/reshard/sharding_v3.py	12.06%	51 Missing ⚠️
paddleformers/trainer/trainer_utils.py	9.09%	50 Missing ⚠️
paddleformers/trainer/utils/offload_optimizer.py	0.00%	35 Missing ⚠️
paddleformers/trainer/trainer.py	28.00%	18 Missing ⚠️
paddleformers/trainer/utils/sharding_io.py	22.22%	14 Missing ⚠️
...ddleformers/utils/moe_hybrid_parallel_optimizer.py	0.00%	8 Missing ⚠️
paddleformers/trainer/training_args.py	12.50%	7 Missing ⚠️
paddleformers/trainer/utils/reshard/common.py	41.66%	7 Missing ⚠️
...addleformers/trainer/utils/zero_cost_checkpoint.py	50.00%	4 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             develop    #3984   +/-   ##
==========================================
  Coverage           ?   33.69%           
==========================================
  Files              ?      453           
  Lines              ?    86319           
  Branches           ?        0           
==========================================
  Hits               ?    29081           
  Misses             ?    57238           
  Partials           ?        0

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Muon optimizer integration: - Create Muon optimizer in trainer when `optim=muon`, with per-head QKV metadata annotation for fused QKV weight orthogonalisation - Handle Muon's `_moment_acc_str` (vs AdamW's `_moment1_acc_str`) in optimizer state save/restore - Add Muon `_muon_update`/`_apply_optimize` offload support in `offload_optimizer.py` ShardingV3 support: - Add `sharding_v3` training argument and `FLAGS_sharding_v3` environment variable dispatch - Implement `DygraphShardingOptimizerV3` init path in `trainer_utils.py` - Add V3 reshard logic (`reshard/sharding_v3.py`) for checkpoint save/restore - Adapt `sharding_io.py`, `zero_cost_checkpoint.py`, and `moe_hybrid_parallel_optimizer.py` for V3 optimizer unwrapping Tests: - Add Muon smoke tests (`tests/muon/`) exercising both V2 and V3 sharding paths on 2 GPUs with AMP O2

xxyux force-pushed the feature/add-muon-optimizer branch 2 times, most recently from 054248e to 546b763 Compare March 4, 2026 12:04

xxyux force-pushed the feature/add-muon-optimizer branch from 546b763 to 1678630 Compare March 10, 2026 13:18

xxyux force-pushed the feature/add-muon-optimizer branch from 1678630 to 7db2abc Compare March 10, 2026 13:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: support muon optimizer#3984

feat: support muon optimizer#3984
xxyux wants to merge 1 commit intoPaddlePaddle:developfrom
xxyux:feature/add-muon-optimizer

xxyux commented Mar 3, 2026 •

edited

Loading

Uh oh!

paddle-bot bot commented Mar 3, 2026

Uh oh!

xxyux commented Mar 4, 2026

Uh oh!

xxyux commented Mar 9, 2026

Uh oh!

codecov-commenter commented Mar 9, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

xxyux commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Before submitting

PR types

PR changes

Description

Muon optimizer integration

ShardingV3 support

Tests

Uh oh!

paddle-bot bot commented Mar 3, 2026

Uh oh!

xxyux commented Mar 4, 2026

Uh oh!

xxyux commented Mar 9, 2026

Uh oh!

codecov-commenter commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

xxyux commented Mar 3, 2026 •

edited

Loading

codecov-commenter commented Mar 9, 2026 •

edited

Loading