Commit 7db2abc
committed
feat: add Muon optimizer integration and ShardingV3 support
Muon optimizer integration:
- Create Muon optimizer in trainer when `optim=muon`, with per-head
QKV metadata annotation for fused QKV weight orthogonalisation
- Handle Muon's `_moment_acc_str` (vs AdamW's `_moment1_acc_str`)
in optimizer state save/restore
- Add Muon `_muon_update`/`_apply_optimize` offload support in
`offload_optimizer.py`
ShardingV3 support:
- Add `sharding_v3` training argument and `FLAGS_sharding_v3`
environment variable dispatch
- Implement `DygraphShardingOptimizerV3` init path in
`trainer_utils.py`
- Add V3 reshard logic (`reshard/sharding_v3.py`) for checkpoint
save/restore
- Adapt `sharding_io.py`, `zero_cost_checkpoint.py`, and
`moe_hybrid_parallel_optimizer.py` for V3 optimizer unwrapping
Tests:
- Add Muon smoke tests (`tests/muon/`) exercising both V2 and V3
sharding paths on 2 GPUs with AMP O21 parent ed15c99 commit 7db2abc
File tree
13 files changed
+588
-18
lines changed- paddleformers
- trainer
- utils
- reshard
- utils
- tests/muon
13 files changed
+588
-18
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
87 | 87 | | |
88 | 88 | | |
89 | 89 | | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
90 | 97 | | |
91 | 98 | | |
92 | 99 | | |
| |||
211 | 218 | | |
212 | 219 | | |
213 | 220 | | |
214 | | - | |
| 221 | + | |
215 | 222 | | |
216 | 223 | | |
217 | 224 | | |
| |||
1215 | 1222 | | |
1216 | 1223 | | |
1217 | 1224 | | |
1218 | | - | |
| 1225 | + | |
| 1226 | + | |
| 1227 | + | |
| 1228 | + | |
1219 | 1229 | | |
1220 | 1230 | | |
1221 | 1231 | | |
| |||
1277 | 1287 | | |
1278 | 1288 | | |
1279 | 1289 | | |
1280 | | - | |
1281 | | - | |
1282 | | - | |
1283 | | - | |
1284 | | - | |
| 1290 | + | |
| 1291 | + | |
| 1292 | + | |
| 1293 | + | |
| 1294 | + | |
| 1295 | + | |
1285 | 1296 | | |
1286 | 1297 | | |
1287 | 1298 | | |
| |||
1993 | 2004 | | |
1994 | 2005 | | |
1995 | 2006 | | |
1996 | | - | |
| 2007 | + | |
| 2008 | + | |
1997 | 2009 | | |
1998 | 2010 | | |
1999 | 2011 | | |
| |||
2930 | 2942 | | |
2931 | 2943 | | |
2932 | 2944 | | |
| 2945 | + | |
| 2946 | + | |
| 2947 | + | |
| 2948 | + | |
| 2949 | + | |
| 2950 | + | |
| 2951 | + | |
| 2952 | + | |
| 2953 | + | |
2933 | 2954 | | |
2934 | 2955 | | |
2935 | 2956 | | |
| |||
2947 | 2968 | | |
2948 | 2969 | | |
2949 | 2970 | | |
| 2971 | + | |
2950 | 2972 | | |
2951 | 2973 | | |
2952 | 2974 | | |
| |||
3070 | 3092 | | |
3071 | 3093 | | |
3072 | 3094 | | |
| 3095 | + | |
| 3096 | + | |
| 3097 | + | |
| 3098 | + | |
| 3099 | + | |
| 3100 | + | |
| 3101 | + | |
| 3102 | + | |
| 3103 | + | |
| 3104 | + | |
| 3105 | + | |
| 3106 | + | |
3073 | 3107 | | |
3074 | 3108 | | |
3075 | 3109 | | |
| |||
4031 | 4065 | | |
4032 | 4066 | | |
4033 | 4067 | | |
4034 | | - | |
4035 | | - | |
4036 | | - | |
| 4068 | + | |
4037 | 4069 | | |
4038 | 4070 | | |
4039 | 4071 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
45 | 45 | | |
46 | 46 | | |
47 | 47 | | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
48 | 55 | | |
49 | 56 | | |
50 | 57 | | |
| |||
498 | 505 | | |
499 | 506 | | |
500 | 507 | | |
| 508 | + | |
501 | 509 | | |
502 | 510 | | |
503 | 511 | | |
| |||
1502 | 1510 | | |
1503 | 1511 | | |
1504 | 1512 | | |
| 1513 | + | |
| 1514 | + | |
| 1515 | + | |
| 1516 | + | |
| 1517 | + | |
| 1518 | + | |
1505 | 1519 | | |
1506 | 1520 | | |
1507 | 1521 | | |
| |||
1515 | 1529 | | |
1516 | 1530 | | |
1517 | 1531 | | |
| 1532 | + | |
| 1533 | + | |
| 1534 | + | |
1518 | 1535 | | |
1519 | 1536 | | |
1520 | 1537 | | |
1521 | 1538 | | |
1522 | 1539 | | |
| 1540 | + | |
| 1541 | + | |
| 1542 | + | |
| 1543 | + | |
| 1544 | + | |
| 1545 | + | |
| 1546 | + | |
| 1547 | + | |
| 1548 | + | |
| 1549 | + | |
| 1550 | + | |
| 1551 | + | |
| 1552 | + | |
| 1553 | + | |
| 1554 | + | |
| 1555 | + | |
| 1556 | + | |
| 1557 | + | |
| 1558 | + | |
| 1559 | + | |
| 1560 | + | |
| 1561 | + | |
| 1562 | + | |
| 1563 | + | |
| 1564 | + | |
| 1565 | + | |
| 1566 | + | |
| 1567 | + | |
| 1568 | + | |
| 1569 | + | |
| 1570 | + | |
| 1571 | + | |
| 1572 | + | |
| 1573 | + | |
| 1574 | + | |
| 1575 | + | |
| 1576 | + | |
| 1577 | + | |
| 1578 | + | |
| 1579 | + | |
| 1580 | + | |
| 1581 | + | |
| 1582 | + | |
| 1583 | + | |
| 1584 | + | |
| 1585 | + | |
| 1586 | + | |
| 1587 | + | |
| 1588 | + | |
| 1589 | + | |
| 1590 | + | |
| 1591 | + | |
| 1592 | + | |
| 1593 | + | |
| 1594 | + | |
| 1595 | + | |
| 1596 | + | |
| 1597 | + | |
| 1598 | + | |
| 1599 | + | |
| 1600 | + | |
| 1601 | + | |
| 1602 | + | |
1523 | 1603 | | |
1524 | 1604 | | |
1525 | 1605 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1528 | 1528 | | |
1529 | 1529 | | |
1530 | 1530 | | |
| 1531 | + | |
| 1532 | + | |
| 1533 | + | |
| 1534 | + | |
| 1535 | + | |
| 1536 | + | |
| 1537 | + | |
| 1538 | + | |
| 1539 | + | |
| 1540 | + | |
| 1541 | + | |
1531 | 1542 | | |
1532 | 1543 | | |
1533 | 1544 | | |
| |||
2095 | 2106 | | |
2096 | 2107 | | |
2097 | 2108 | | |
| 2109 | + | |
| 2110 | + | |
| 2111 | + | |
| 2112 | + | |
| 2113 | + | |
| 2114 | + | |
| 2115 | + | |
| 2116 | + | |
| 2117 | + | |
| 2118 | + | |
2098 | 2119 | | |
2099 | 2120 | | |
2100 | 2121 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
92 | 92 | | |
93 | 93 | | |
94 | 94 | | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
95 | 149 | | |
96 | 150 | | |
97 | 151 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
12 | 12 | | |
13 | 13 | | |
14 | 14 | | |
15 | | - | |
| 15 | + | |
16 | 16 | | |
17 | 17 | | |
18 | 18 | | |
| 19 | + | |
19 | 20 | | |
20 | 21 | | |
21 | 22 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
21 | 21 | | |
22 | 22 | | |
23 | 23 | | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
24 | 31 | | |
25 | 32 | | |
26 | 33 | | |
| |||
29 | 36 | | |
30 | 37 | | |
31 | 38 | | |
| 39 | + | |
32 | 40 | | |
33 | 41 | | |
34 | 42 | | |
| |||
45 | 53 | | |
46 | 54 | | |
47 | 55 | | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
48 | 60 | | |
49 | 61 | | |
50 | 62 | | |
51 | 63 | | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
52 | 68 | | |
53 | 69 | | |
54 | 70 | | |
| |||
0 commit comments