Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
314 commits
Select commit Hold shift + click to select a range
ef48a13
ci: Update golden values dev
ko3n1g Oct 26, 2025
3281c01
ci: Approval gate
ko3n1g Oct 26, 2025
8fe0c3a
ci: Approval bot
ko3n1g Oct 26, 2025
a33936d
ci: Increase time limit for main tests
ko3n1g Oct 26, 2025
5176823
ci: Auto-assign milestone (#1952)
ko3n1g Oct 26, 2025
4b6ba60
ci: Run on push to release branch (#1960) (#1962)
ko3n1g Oct 26, 2025
221747d
[DEV] support split qkv in muon (#1915)
FDecaYed Oct 27, 2025
9069e12
[Dev] feat(moe): Fine-grained activation offloading (#1912)
lhb8125 Oct 27, 2025
a0a1866
ci: Add golden values for gpt_dynamic_inference_tp1_pp1_583m_cuda_gra…
ko3n1g Oct 27, 2025
c9fb78b
ci: Add more golden values
ko3n1g Oct 27, 2025
6f51284
ci: Aggregate throughput
ko3n1g Oct 27, 2025
eb07b69
Update dev branch codeowners (#1963)
chtruong814 Oct 27, 2025
fa384d2
[Dev] JIT for MoE router and preprocess (#1918)
yaox12 Oct 27, 2025
65c8f40
tests: Fix paths for test_cases
ko3n1g Oct 27, 2025
2155c47
Revert "[Dev] feat(moe): Fine-grained activation offloading (#1912)"
ko3n1g Oct 27, 2025
d95e86a
fix: Missing logger (#1966)
ko3n1g Oct 27, 2025
113cefb
ci: Update copyright checker (#1974)
ko3n1g Oct 27, 2025
d9e0806
[Dev] Update symmetric registration interface to sync-up with upstrea…
youngeunkwon0405 Oct 28, 2025
cc33e00
cp: `Megatron-FSDP Expert Parallel (DeepSeek-v3) Support` into `dev` …
chtruong814 Oct 28, 2025
13edb58
Revert "cp: `Megatron-FSDP Expert Parallel (DeepSeek-v3) Support` int…
ko3n1g Oct 28, 2025
c22c2aa
[Was PR1912][Dev] feat(moe): Fine-grained activation offloading (#1969)
lhb8125 Oct 28, 2025
bada8f9
ci(fix): `Run tests` label (#1970) (#2006)
ko3n1g Oct 29, 2025
ccf794e
Renaming golden values (#2020)
lhb8125 Oct 29, 2025
7342f67
Ko3n1g/chore/sync main to dev (#2018)
ko3n1g Oct 29, 2025
0d0f29c
Ko3n1g/fix/golden values (#2037)
ko3n1g Oct 29, 2025
1d1ac73
cp: `Megatron-FSDP Expert Parallel (DeepSeek-v3) Support` into `dev` …
chtruong814 Oct 30, 2025
2c85448
Update golden values due to PR #2007 (#2057)
chtruong814 Oct 31, 2025
402bc50
Add DeepSeek-V3 GB200 NVL72 optimization guide (#2059)
sbhavani Oct 31, 2025
4959f1f
ci(hotfix): Disable flaky tests
ko3n1g Oct 31, 2025
4008d3d
[DEV] feat(MoE): Refactor cuda_graph_scope (#1917)
buptzyb Oct 31, 2025
44539d1
Add mirror to main workflow (#2042)
chtruong814 Oct 31, 2025
41c6e66
ci: Remove cluster specific golden values (#2069)
ko3n1g Oct 31, 2025
9d05926
ci: Disable inference test
ko3n1g Nov 2, 2025
effebd8
[Dev] feat(moe): Support placing MTP layers into standalone stages (#…
BestJuly Nov 2, 2025
1d9edcf
[Dev] A Guide to Reproduce DeepSeek-V3 Pre-training Performance on GB…
yaox12 Nov 3, 2025
eb0783b
[Dev] Minor updates to the guide (#2103)
yaox12 Nov 3, 2025
ad226e4
Revert "[Dev] feat(moe): Support placing MTP layers into standalone s…
ko3n1g Nov 3, 2025
35eeab7
Reapply "[Dev] feat(moe): Support placing MTP layers into standalone …
ko3n1g Nov 3, 2025
21355dd
ci: Disable broken unit
ko3n1g Nov 3, 2025
4facf29
[Dev] Fixes for gpt-oss (#2116)
cuichenx Nov 4, 2025
bd199dc
[Dev] Support LayerWiseDistributedOptimizer with torch_dist checkpoin…
BoxiangW Nov 4, 2025
ee14b5b
[Dev] Nemotron nano v2 vl (#2115)
cuichenx Nov 4, 2025
09abfad
[DEV] remove training dependency from megatron core for fsdp checkpoi…
ananthsub Nov 4, 2025
ec37ae3
[Dev] Fix UT `TestPartialCudaGraph` which incorrectly set MTP in UT (…
BestJuly Nov 4, 2025
f3cbf03
ci: Restore `gpt3_moe_mcore_te_tp4_ep2_etp2_pp2_scoped_cudagraph`
ko3n1g Nov 4, 2025
ecbfe70
chore: Fix autoformatter (#2073) (#2134)
ko3n1g Nov 4, 2025
79e8592
add device and dtype to empty inv_dt init (#2137)
maanug-nv Nov 5, 2025
3c1b98e
Remove DS-V3 doc - draft being updated (#2155)
sbhavani Nov 6, 2025
b2fdd94
[DEV] torch_dist fixes, speed improvements and memory reduction for l…
FDecaYed Nov 6, 2025
6cc224d
[Dev] Fix Qwen3-Next hang on Blackwell, add a flag to control torch.c…
yuzhongw-nvidia Nov 6, 2025
f4bd87e
[dev] Fix cuda graph scope check in `language_model.py` (#2158)
ananthsub Nov 6, 2025
3207c23
[Dev] Remove experimental tags for fused kernels (#2172)
Victarry Nov 10, 2025
ae4cda5
Merge branch 'main' into dev
FDecaYed Nov 11, 2025
3cbe5c6
[DEV] pull changes from main(f150f42e3929f7f2171e3687e67990332a76285b…
ko3n1g Nov 11, 2025
6b59d71
fix(transformer_config): Initialize cuda_graph_scope if not set regar…
cuichenx Nov 11, 2025
442a7f2
[Dev] fix(offloading): Accuracy mismatch when offloading and recomput…
lhb8125 Nov 11, 2025
6b01330
Ko3n1g/chore/update dev release settings (#2099)
ko3n1g Nov 11, 2025
d7d71e0
Merge remote-tracking branch 'github/main' into dev
ko3n1g Nov 11, 2025
56019e6
[DEV] Cherry-pick: M4 + Dist Checkpoint: Replace global parallel stat…
yaoyu-33 Nov 11, 2025
2c2ee22
[Dev] Remove redundant reduce in aux_loss logging (#2094)
BestJuly Nov 12, 2025
b7c1e75
[DEV] Make CUDA graph compatible with FP8 params (tensorwise & blockw…
kunlunl Nov 12, 2025
ca68395
remove workflow
ko3n1g Nov 12, 2025
1d502cd
[Dev] Reduce Overhead in Timers (#2208)
yaox12 Nov 12, 2025
a2048c8
Revert "[DEV] Cherry-pick: M4 + Dist Checkpoint: Replace global paral…
ko3n1g Nov 12, 2025
a2a1c89
[Dev] replay: Cherry-pick: M4 + Dist Checkpoint: Replace global paral…
yaoyu-33 Nov 12, 2025
8427584
[Dev]Revert torch ckpt format change for LayerwiseDistOpt (#2228)
BoxiangW Nov 13, 2025
7020e1f
[Dev] Add more tests for LayerwiseDistOpt with dist_ckpt (#2132)
BoxiangW Nov 13, 2025
693587d
[Dev] Add muon golden value (#2247)
BoxiangW Nov 14, 2025
b55a544
bump deps
ko3n1g Nov 14, 2025
bfbf13f
Merge remote-tracking branch 'github/dev' into ko3n1g/chore/main-to-dev
ko3n1g Nov 14, 2025
658931e
ci: Create weekly dev branch (#2223)
ko3n1g Nov 14, 2025
1211348
[20251111] Ko3n1g/chore/main to dev (#2211)
ko3n1g Nov 14, 2025
71fa2e6
Revert "[20251111] Ko3n1g/chore/main to dev (#2211)" (#2266)
chtruong814 Nov 17, 2025
0bf9ff9
chroe: [Dev]Disable muon test for now (#2275)
BoxiangW Nov 17, 2025
565202f
Fixes of Merge main into dev
ko3n1g Nov 17, 2025
4b78163
Replay: [20251111] Ko3n1g/chore/main to dev (#2267)
ko3n1g Nov 17, 2025
d1a31a3
[Dev] MuonClip support (non-split version) on dev branch (#2194)
BoxiangW Nov 18, 2025
7968d5f
[dev] Add assertion for mxfp8 params without dp overlap (#2270)
kunlunl Nov 18, 2025
d09482c
[DEV] Save memory using main_param for moe in param_l2_norm (#2234)
BestJuly Nov 18, 2025
7da6e5b
[DEV][NVFP4] Fix NVFP4 Selective Activation Recompute (#2036)
zhongbozhu Nov 18, 2025
ca4c03e
[DEV] Fix aux loss scale when cp enabled (#2217)
Victarry Nov 18, 2025
157bec9
[Community][Dev] feat(moe): Adding context parallel support to eager …
nrailg Nov 18, 2025
5c1d294
[HOT FIX] Fix bug of hybrid-ep backend in flex-dispatcher (#2287)
Autumn1998 Nov 18, 2025
2782acf
Ko3n1g/ci/golden values weeklies (#2279)
ko3n1g Nov 18, 2025
dc9a38d
[DEV] Add support of fake distributed process group (#2254)
Victarry Nov 18, 2025
a8fc591
Cherrypick CI changes between 20251111 - 20251118 (#2292)
ko3n1g Nov 18, 2025
d547462
[DEV] Update emerging optimizers (#2261)
skyw Nov 18, 2025
056ebc5
ci(hotfix): Do not run on main/dev
ko3n1g Nov 18, 2025
5880674
[dev] ci(moe): Add a functional test case for Qwen3Next-specific feat…
yuzhongw-nvidia Nov 19, 2025
a4fce1d
[DEV] fix layerwise torch_dist checkpointing fails due to empty rank …
FDecaYed Nov 20, 2025
c6e2b29
[Dev] fix(megatron-fsdp): Resolve hang caused by non-deterministic re…
xuwchen Nov 20, 2025
c6f277a
ci: Disable flaky unit test (#2338)
ko3n1g Nov 20, 2025
716bb4a
feat: check: api backwards compatibility [dev] (#2341)
pablo-garay Nov 20, 2025
7b8e39e
Revert "[Dev] fix(megatron-fsdp): Resolve hang caused by non-determin…
ko3n1g Nov 20, 2025
cb88c6e
ci: Upload to testpypi only on main (#2342) (#2343)
ko3n1g Nov 21, 2025
c241d0c
Reapply "[Dev] fix(megatron-fsdp): Resolve hang caused by non-determi…
ko3n1g Nov 21, 2025
31f5049
feat: required check adjustment (#2349)
pablo-garay Nov 21, 2025
56682f8
[DEV] pull main Nov 25 (#2395)
FDecaYed Nov 28, 2025
b9c48ec
adding action for checking whether PR author is nvidia employee or no…
theothermike Nov 25, 2025
3aa0c4e
fix: exit failure when PR author is external contributor removed (#2410)
theothermike Nov 26, 2025
b750bdb
fix: adding k8s taints for ephermeral jobs (#2420)
theothermike Nov 27, 2025
c12909b
ci: Enable functional tests (#2419)
ko3n1g Nov 27, 2025
44933d7
Reapply "build: Upgrade deps (NVIDIA#2289)" (#2408)
ko3n1g Nov 27, 2025
98c64b2
fix: use a script to do node tainting in the cicd workflow (#2421)
theothermike Nov 27, 2025
c8fb49e
cp: CI changes until 20251128 (#2426)
ko3n1g Nov 28, 2025
03150b4
Revert "[DEV] pull main Nov 25 (#2395)"
ko3n1g Nov 28, 2025
6ca67bc
[Dev] Support packed seq in MTP (#2043)
BestJuly Dec 1, 2025
11caf01
Fix runaway Etpt in straggler detector by resetting FLOPs accumulator…
sbhavani Dec 1, 2025
92c8482
[Dev] feat(MoE): Refactor cuda_graph_scope - part2 (#2353)
buptzyb Dec 1, 2025
b0c96b3
[dev] DeepSeek V3.2 support (#2154)
kunlunl Dec 1, 2025
71357e2
Revert "[Dev] feat(MoE): Refactor cuda_graph_scope - part2 (#2353)"
ko3n1g Dec 1, 2025
fdcb0a4
Replay "[Dev] feat(MoE): Refactor cuda_graph_scope - part2 (#2353)" (…
buptzyb Dec 2, 2025
14b19b1
[Dev] Optimize TE cudagraph input memory (#2391)
buptzyb Dec 2, 2025
b0f5746
Fix HSDP Registering Device Mesh (#2388)
tomlifu Dec 2, 2025
5375ad4
fix: update baseline (#2468)
pablo-garay Dec 2, 2025
79660b7
fix: Add merge_group support with pre-flight pattern (#2469)
pablo-garay Dec 2, 2025
d72b218
DeepSeek V3 FSDP Fix for Precision-Aware Optimizer (#2204)
tomlifu Dec 3, 2025
436065a
[Dev] fix(moe): minor refactor for fine-grained activation offloading…
lhb8125 Dec 3, 2025
a4bee49
[Dev] feat: m4 leftover changes (#2226)
yaoyu-33 Dec 4, 2025
ad5a222
feat: add decorator: experimental_api (#2546)
pablo-garay Dec 4, 2025
7d17116
feat: API compat: ignore AttributeChangedValueBreakage (not a signatu…
pablo-garay Dec 4, 2025
274e04d
[Dev] Hybrid Data x Context Parallelism Feature (#2054)
parthmannan Dec 4, 2025
87ac13d
update API compat check baseline to 274e04d (#2548)
pablo-garay Dec 4, 2025
f0c1b55
feat: mcore trigger mbridge (#2340) (#2552)
pablo-garay Dec 5, 2025
8de5a7f
[Dev] Optimize TE CUDA Graph capturing time (#2483)
buptzyb Dec 5, 2025
1f08ceb
[Dev] Feature: linear cross entropy fusion (#2256)
Jianbing-D Dec 5, 2025
9cf6838
Fix gpt_layer_spec for frequently linear attention (#2481)
yuzhongw-nvidia Dec 5, 2025
89fe895
Skip trainloader when `args.skip_train` is True (#2501)
Niccolo-Ajroldi Dec 5, 2025
a6d86a6
[DEV] fixes for muon(qwen3-next, ep multi-adam) (#2564)
FDecaYed Dec 5, 2025
aee4a74
[Dev] remove fp16 assert in moe_grouped_gemm & EP (#2494)
HaochenYuan Dec 8, 2025
dfe4da2
Update tp support in muon (#2385)
skyw Dec 8, 2025
1d462bd
[DEV] Update GitHub MoE functional test cases (#2449)
Victarry Dec 8, 2025
23e092f
Fix: don't enter branch if mtp_num_layers == 0 (#2581)
rj42 Dec 9, 2025
c60d5c2
[Dev] fix(moe): Support HybridEP and reduce memory overhead for 1F1B …
lhb8125 Dec 10, 2025
4db2f11
Merge branch 'main' into dev
FDecaYed Dec 10, 2025
ed804b4
[dev] pull main 1201 (#2448)
ko3n1g Dec 11, 2025
2d398b4
chore: Bump baseline (#2626)
ko3n1g Dec 11, 2025
e8a9275
[Dev] Use the latest Hybrid-EP (#2424)
Autumn1998 Dec 12, 2025
305957a
API compat: ignore ParameterMovedBreakage for __init__ methods (#2649)
pablo-garay Dec 12, 2025
e93814b
[training migration] add training config dataclass and arg generation…
maanug-nv Dec 16, 2025
288b8ea
[Dev] Optimize TE CUDA Graph _get_sample_arguments() Time (#2568)
buptzyb Dec 17, 2025
0eec631
Reopen qwen3next functional test in lightweight mode (#2493)
yuzhongw-nvidia Dec 17, 2025
2ebff67
[Dev] Fix CUDA RNG Tracker (#2640)
buptzyb Dec 17, 2025
368e580
[Dev] Mark API backwards compatibility checks as OPTIONAL (non-blocki…
pablo-garay Dec 17, 2025
3714d81
[Dev] FP8 params support for megatron-fsdp (MXFP8/Blockwise) (#2086)
kunlunl Dec 18, 2025
a935008
[Dev] Feat(moe): Gated delta net context parallel (CP) (#2614)
yuzhongw-nvidia Dec 19, 2025
fd932c9
ci: Gridify test configs (#2707)
ko3n1g Dec 19, 2025
2b1fc70
Revert "[dev] Add assertion for mxfp8 params without dp overlap (#2270)"
ko3n1g Dec 22, 2025
4665be4
Revert "[Dev] Use the latest Hybrid-EP (#2424)" (#2732)
ko3n1g Dec 22, 2025
46b5505
[Dev] Fix ep overlap missing final layernorm (#2691)
Wohox Dec 23, 2025
0b6714e
[Dev] Remove calculation of padding token in moe routing loss (#2121)
HaochenYuan Dec 24, 2025
1068d77
Revert "[Dev] Remove calculation of padding token in moe routing loss…
chtruong814 Dec 24, 2025
9885ddb
[Dev] Disable ep overlap memory optimization (#2750)
Wohox Dec 30, 2025
14c35dc
Merge branch 'main' into dev
FDecaYed Dec 30, 2025
929e77f
feat: Cherry-pick PR of PR!2661 for dev branch (#2757)
youngeunkwon0405 Dec 30, 2025
b361561
Merge branch 'dev' into deyuf/dev_pull_main_1217_test
FDecaYed Dec 31, 2025
922e8e9
cp: Allow disabling external contributors (#2784) (#2786)
chtruong814 Dec 31, 2025
5455f0a
build: Pin down `nvidia-nvshmem-cu13` (#2798)
ko3n1g Jan 3, 2026
71d5c84
[dev] Fix bug of reuse_grad_buf_for_mxfp8_param_ag (#2801)
kunlunl Jan 5, 2026
8b93e0d
[Dev] Partial CUDA Graph support for EP Overlap (#2168)
Wohox Jan 5, 2026
c1045f6
Revert "[Dev] FP8 params support for megatron-fsdp (MXFP8/Blockwise) …
ko3n1g Jan 5, 2026
bd06945
Revert "[Dev] Partial CUDA Graph support for EP Overlap (#2168)"
ko3n1g Jan 5, 2026
29ffe43
Merge branch 'dev' into deyuf/dev_pull_main_1217_test
FDecaYed Jan 5, 2026
d8464fc
PR for testing pull main 1217 (#2716)
ko3n1g Jan 5, 2026
dfa6cc1
[Dev] Remove calculation of padding token in moe routing loss (#2754)
HaochenYuan Jan 6, 2026
5823534
[dev] Reapply fsdp mxfp8 (#2828)
kunlunl Jan 6, 2026
1ec0beb
[Dev] Partial CUDA Graph support for EP Overlap (#2810)
Wohox Jan 6, 2026
0bc4114
[Dev] fix EP Overlap Partial Cuda Graph Unit Test hang issue (#2838)
Wohox Jan 7, 2026
28c586e
build: Bump jet-client (#2877)
ko3n1g Jan 8, 2026
46d1f47
FP8 attention knob for nvFP4 recipe (#2818)
vasunvidia Jan 9, 2026
ed6ebff
[DEV][NVFP4][MOE] 128 Zero Padding for Grouped Quantization kernels a…
zhongbozhu Jan 9, 2026
ebe7079
Add check for full_iteration scope before instantiating CudaGraphMana…
vasunvidia Jan 9, 2026
736da3c
Reapply "[Dev] Use the latest Hybrid-EP (#2423)" (#2867)
ko3n1g Jan 9, 2026
9d741cf
build: Main dependency bump for 26.02 (#2682)
ko3n1g Jan 12, 2026
de866fa
ci(fix): Update golden values (#2921)
ko3n1g Jan 13, 2026
ae3dbc0
ci(hotfix): Re-add `gpt3_mcore_te_tp4_pp2_resume_torch_dist_reshard_8…
ko3n1g Jan 13, 2026
583dd58
ci: Skip broken tests after dependency update (#2935)
chtruong814 Jan 13, 2026
b0a702b
Cherry-pick optimizer override refactor from #2723 (#2835)
yaoyu-33 Jan 14, 2026
1964d39
ci(hotfix): Disable gpt_grpo_tp1_pp1_dp8_583m_throughputtest
ko3n1g Jan 14, 2026
383505c
[dev]: ci: Onboard GB200 (#2922)
ko3n1g Jan 14, 2026
ab3ae8a
ci(hotfix): Repair recipe
ko3n1g Jan 14, 2026
dce8e88
Fix clip_qk for virtual pipeline size > 1 (#2776)
juntaowww Jan 15, 2026
748ab80
ci(hotfix): GB200 to nightly
ko3n1g Jan 15, 2026
a32b198
ci(fix): GB200 racecondition (#2962)
ko3n1g Jan 15, 2026
7c6c4e9
Revert "ci(fix): GB200 racecondition (#2962)"
ko3n1g Jan 15, 2026
619115a
ci: Fix GB200 change (#2969) (#2974)
ko3n1g Jan 16, 2026
b395016
[Dev] TE cudagraph recompute (#2694)
buptzyb Jan 16, 2026
b927e1f
[Dev] docs(megatron-fsdp): add Megatron-FSDP user guide (#2397)
xuwchen Jan 16, 2026
6b157e0
[Dev] Optimizer State and Master Weight Offloading (#2760)
hxbai Jan 16, 2026
8ac3a9f
Revert "[Dev] Optimizer State and Master Weight Offloading (#2760)" (…
ko3n1g Jan 16, 2026
bd8411c
Forced load imbalance (#2917)
nanz-nv Jan 19, 2026
0a2e01f
[Dev] [Reapply] Optimizer State and Master Weight Offloading (#2987)
hxbai Jan 19, 2026
8abc086
ci(fix): CI_COMMIT_BRANCH on forks (#2982) (#2989)
ko3n1g Jan 19, 2026
5b17f19
[Dev] Update MoE readme. (#2808)
Victarry Jan 19, 2026
9ea50a9
feat: add routing replay for Mcore (#2693)
litianjian Jan 20, 2026
ac9f665
[dev] feat(moe): Support apply wd to qk layernorm for Qwen3-Next (#2825)
yuzhongw-nvidia Jan 21, 2026
6e2153b
[dev] feat(moe): Cherry-pick #1989 back to dev (#3011)
yuzhongw-nvidia Jan 21, 2026
68e5fec
[Dev]feat(moe): code refactor for fine grained activation offloading …
lhb8125 Jan 22, 2026
6807df4
[Dev] [fix] Bug fix for offloading in evaluate() (#3041)
lhb8125 Jan 22, 2026
b3bba3f
ci: Log node name (#3081) (#3082)
ko3n1g Jan 26, 2026
a4e3fb3
[dev] pull main 260122 (#3045)
FDecaYed Jan 27, 2026
420aa6a
ci: Skip test_precision_aware_optimizer (#3062)
thomasdhc Jan 23, 2026
da56650
Merge branch 'main' into deyuf/dev_pull_main_260122_fix_git
FDecaYed Jan 27, 2026
08357d8
[dev] fix git history for dev pull main 260122 (#3094)
ko3n1g Jan 27, 2026
0f82f05
[dev] fixes for pull main 260122 (#3103)
FDecaYed Jan 28, 2026
0ceb698
ci: Disable broken test (#3121)
ko3n1g Jan 28, 2026
f6f2abe
[Dev] Param offset in _ParamAndGradBucket should be aligned (#3010)
BestJuly Jan 29, 2026
d587dd1
[Dev] fix cg missing wgrad hook (#2999)
Wohox Jan 29, 2026
8f8f735
[Megatron-FSDP] Add fsdp_all_gather_in_start_param_sync option in DDP…
shjwudp Jan 29, 2026
bde9e32
[Dev] Support EP with HSDP (#2800)
wplf Jan 29, 2026
27fcfb2
Cherrypick CI improvements to dev branch (#3118)
ko3n1g Jan 29, 2026
a9fb6c8
Merge branch 'main' into deyuf/dev_pull_main_260130
FDecaYed Jan 30, 2026
55e3a0a
[dev] ci: Add DSv3 proxy (#3144)
ko3n1g Jan 30, 2026
a78ae49
[dev] ci: Fix DSv3 (#3187)
ko3n1g Jan 31, 2026
9375be4
Fix: nccl-ub in ddp path (#3181)
youngeunkwon0405 Feb 1, 2026
0f73a8a
[dev] perf(moe): Refine gated delta net implementation (#3040)
yuzhongw-nvidia Feb 2, 2026
5035cbe
[Dev] Add the missing part to support 1F1B overlap for Qwen3-Next (#2…
BestJuly Feb 2, 2026
4aac3fe
Use the latest hybrid-ep (#3092)
Autumn1998 Feb 2, 2026
bfa1d31
[BUG FIX] Try to enable cuda graph ut (#3192)
Autumn1998 Feb 2, 2026
13ad653
[Dev] Fix Linear-Cross-Entropy Convergence Issue (#2739)
shjwudp Feb 3, 2026
b8b8662
Revert "[Dev] Fix Linear-Cross-Entropy Convergence Issue (#2739)" (#3…
chtruong814 Feb 3, 2026
2ab74ab
Fix missing PackedSeqParams import (#3215)
parthmannan Feb 3, 2026
20e8ac8
fix merge main issues
FDecaYed Jan 30, 2026
77b5a3d
[dev] pull main 260130 (#3166)
ko3n1g Feb 3, 2026
c5b282b
ci(hotfix): Pin uv (#3233) (#3234)
ko3n1g Feb 3, 2026
8a29fd5
[DEV] Reapply fix Linear CE Fusion (#3226)
shjwudp Feb 4, 2026
dd17acc
Missing import fix (#3242)
parthmannan Feb 4, 2026
fa5bcf6
[Dev] Fix EP Overlap Bugs for Full-Iter CG (#3163)
Wohox Feb 4, 2026
a592819
[Refactor] Decouple topk and loss from DSA Indexer (#3013)
laixinn Feb 4, 2026
54f4feb
cp: Fix uv install for GH actions (#3259) (#3261)
chtruong814 Feb 5, 2026
ef336ca
[Dev] Fix EP Overlap missing record stream for shared expert (#3244)
Wohox Feb 5, 2026
ec94d63
Restore missing linear-cross-entropy option accidentally removed from…
shjwudp Feb 6, 2026
500e080
Fix reload_model_params failure when loading MoE models with explicit…
eternally-z Feb 9, 2026
433c169
ci: Disable moe20 tests (#3312)
ko3n1g Feb 9, 2026
fd4801e
ci: Pin down setuptools to lt 82 (#3316)
ko3n1g Feb 9, 2026
52eabf0
[None][Fix] Prevent resource leak warnings (#3216)
IanBoyanZhang Feb 10, 2026
c0030d6
[Dev] Fix backward dw dependency (#3338)
Wohox Feb 10, 2026
2c2e749
ci: Rely exclusively on GitHub CI (#3341)
ko3n1g Feb 10, 2026
98f6f81
[dev] ci: skip queue in merge-gate (#3344)
ko3n1g Feb 10, 2026
28b130f
Revert "[None][Fix] Prevent resource leak warnings (#3216)" (#3366)
ko3n1g Feb 11, 2026
e868e8f
ci: Fix dev branch merge queue (#3397)
chtruong814 Feb 13, 2026
c4b910f
[Dev] Add Qwen3-VL support with Megatron-FSDP (#2842)
xuwchen Feb 13, 2026
6059f36
Add absorbed-mla (#3193)
kunlunl Feb 13, 2026
9f2ca96
cp: Remove gpu sanity check (#3420) into dev (#3421)
chtruong814 Feb 13, 2026
1dcf0da
[dev] ci: Fix merge queue (#3385)
ko3n1g Feb 14, 2026
cd1c215
[dev] `cp: Cherrypick CI changes` (#3543)
ko3n1g Feb 23, 2026
aa86018
[Dev] Fix MoE aux loss tracker hang with MTP enabled (#3400)
Victarry Feb 25, 2026
2b4b9c4
ci: Remove multi-approval action from dev branch (#3576)
chtruong814 Feb 25, 2026
bf3cdb1
support GDN packed sequence
yuzhongw-nvidia Dec 12, 2025
e94395d
Fix several bugs
yuzhongw-nvidia Jan 22, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
55 changes: 5 additions & 50 deletions .github/CODEOWNERS
Original file line number Diff line number Diff line change
@@ -1,59 +1,14 @@
megatron/core/ @NVIDIA/core-adlr @NVIDIA/core-nemo

megatron/core/models/gpt/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/gpt

megatron/core/models/multimodal/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/multi-modal

megatron/core/models/mamba/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/hybrid-mamba
megatron/core/ssm/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/hybrid-mamba

megatron/core/datasets/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/datasets

megatron/core/distributed/fsdp/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/megatron-fsdp

megatron/core/transformer/fsdp_dtensor_checkpoint.py @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/megatron-fsdp

megatron/core/dist_checkpointing/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/dist-checkpointing

megatron/core/optimizer/distrib_optimizer/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/dist-optimizer

megatron/core/inference/modelopt_support @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/quantization-and-inference

megatron/core/datasets/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/datasets

megatron/core/pipeline_parallel/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/pipeline-parallelism

megatron/core/transformer/ @NVIDIA/core-adlr @NVIDIA/core-nemo

megatron/core/transformer/moe/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/mixture-of-experts-adlr @NVIDIA/mixture-of-experts-devtech

megatron/core/inference/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/inference

megatron/core/parallel_state.py @NVIDIA/core-adlr @NVIDIA/core-nemo

megatron/core/post_training/ @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/post-training

megatron/post_training/ @NVIDIA/post-training
* @NVIDIA/core-nemo @NVIDIA/core-devtech

megatron/core/transformer/cuda_graphs.py @NVIDIA/core-adlr @NVIDIA/core-nemo @NVIDIA/cuda-graphs

.gitlab/ @NVIDIA/ci
.github/ @NVIDIA/ci
.gitlab-ci.yml @NVIDIA/ci
docker/ @NVIDIA/ci
tests/unit_tests/run_ci_test.sh @NVIDIA/ci
tests/test_utils/python_scripts/
tests/functional_tests/python_test_utils/ @NVIDIA/ci
tests/functional_tests/shell_test_utils/ @NVIDIA/ci
tests/test_utils/recipes/ @NVIDIA/ci
tests/unit_tests/run_ci_test.sh @NVIDIA/ci

# API Backwards Compatibility Check
scripts/check_api_backwards_compatibility.py @NVIDIA/ci @pablo-garay
scripts/README_API_COMPAT.md @NVIDIA/ci @pablo-garay
.github/workflows/check_api_backwards_compatibility_workflow.yml @NVIDIA/ci @pablo-garay
docs/api-backwards-compatibility-check.md @NVIDIA/ci @pablo-garay
tests/unit_tests/test_api_backwards_compat_setup.py @NVIDIA/ci @pablo-garay

megatron/rl/ @NVIDIA/reinforcement-learning
examples/rl/ @NVIDIA/reinforcement-learning
test/unit_tests/test_rl_utils.py @NVIDIA/reinforcement-learning
train_rl.py @NVIDIA/reinforcement-learning
pyproject.toml @NVIDIA/ci
uv.lock @NVIDIA/ci
55 changes: 17 additions & 38 deletions .github/actions/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -48,46 +48,16 @@ inputs:
is_ci_workload:
description: "Is CI workload"
required: true

is_merge_group:
description: "Is merge group"
required: true
runs:
using: "composite"
steps:
- name: Print node name
shell: bash -x -e -u -o pipefail {0}
run: echo "node_name=$NODE_NAME" | tee -a "$GITHUB_OUTPUT"

- name: GPU Sanity Check
shell: bash -x -e -u -o pipefail {0}
run: |
echo "Starting GPU Sanity Check..."

# 1. Check for active Compute Processes
# query-compute-apps returns a list of PIDs using the GPU. If empty, we are good.
OPEN_PROCESSES=$(docker run --rm --gpus all ubuntu nvidia-smi --query-compute-apps=pid,process_name --format=csv,noheader)

if [ -n "$OPEN_PROCESSES" ]; then
echo "::error::❌ GPU is not clean! Found active processes:"
echo "$OPEN_PROCESSES"
else
echo "✅ No active compute processes found."
fi

# 2. Check VRAM Usage (Optional but recommended)
# We allow a small buffer (e.g., < 300MiB) for driver overhead/Xorg,
# though on headless K8s nodes this should be very close to 0.

MEMORY_USAGES=$(docker run --rm --gpus all ubuntu nvidia-smi --query-gpu=memory.used --format=csv,noheader,nounits)

# Check each GPU visible to the container
for MEMORY in $MEMORY_USAGES; do
if [ "$MEMORY" -gt 300 ]; then
echo "::error::❌ GPU VRAM usage is suspiciously high: ${MEMORY} MiB"
fi
done

echo "✅ GPU Memory is clean (all < 300 MiB)."
echo "Ready to start workflow."

- name: Checkout repository
uses: actions/checkout@v2

Expand Down Expand Up @@ -117,8 +87,10 @@ runs:
export PYTHONPATH=$(pwd)
export NEMORUN_HOME=$(pwd)
export NCCL_DEBUG=INFO
pip install --no-cache-dir uv
uv sync --only-group test
pip install --no-cache-dir "uv<0.9.29"
uv venv .venv
uv cache clean
uv sync --no-cache --only-group test
uv run python tests/test_utils/python_scripts/launch_nemo_run_workload.py \
--scope unit-tests \
--model unit-tests \
Expand Down Expand Up @@ -177,7 +149,12 @@ runs:
#!/bin/bash
set -euxo pipefail

if [ "${{ steps.has-run-tests-label.outputs.main }}" == "true" ]; then
if [ "${{ inputs.is_merge_group }}" == "true" ]; then
ARGS=(
--scope mr-github
--n-repeat 1
)
elif [ "${{ steps.has-run-tests-label.outputs.main }}" == "true" ]; then
ARGS=(
--scope mr-github
--enable-lightweight-mode
Expand All @@ -197,8 +174,10 @@ runs:

export PYTHONPATH=$(pwd)
export NEMORUN_HOME=$(pwd)
pip install --no-cache-dir uv
uv sync --only-group test
pip install --no-cache-dir "uv<0.9.29"
uv venv .venv
uv cache clean
uv sync --no-cache --only-group test
uv run python tests/test_utils/python_scripts/launch_nemo_run_workload.py \
${ARGS[@]} \
--model ${{ inputs.model }} \
Expand Down
2 changes: 1 addition & 1 deletion .github/copy-pr-bot.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
enabled: true
auto_sync_draft: false
auto_sync_ready: true
trustees_override: ["AAnoosheh", "ArEsKay3", "Autumn1998", "BestJuly", "BoxiangW", "ChenhanYu", "FDecaYed", "HaochenYuan", "ISEEKYAN", "JRD971000", "Phlip79", "QiZhangNV", "ShriyaRishab", "Victarry", "Wohox", "ZhiyuLi-Nvidia", "ahmadki", "aklife97", "ananthsub", "asolergi-nv", "buptzyb", "chtruong814", "cspades", "cuichenx", "deepakn94", "dimapihtar", "duncanriach", "erhoo82", "ericharper", "fanshiqing", "frsun-nvda", "gautham-kollu", "gdengk", "guyueh1", "hxbai", "jalbericiola", "janEbert", "jaredcasper", "jenchen13", "jiemingz", "jingqiny-99", "jkamalu", "jon-barker", "jstjohn", "kanz-nv", "kevalmorabia97", "ko3n1g", "kunlunl", "kvareddy", "kwyss-nvidia", "layalir", "lhb8125", "lmcafee-nvidia", "maanug-nv", "mathemakitten", "matthieule", "mehraakash", "mkhona-nvidia", "parthmannan", "prajwal1210", "pthombre", "rogerwaleffe", "sanandaraj5597", "sancha", "santhnm2", "sbak5", "shanmugamr1992", "shifangx", "shjwudp", "sidsingh-nvidia", "skyw", "sudhakarsingh27", "tdene", "theothermike", "thomasdhc", "trintamaki", "tylerpoon", "wdykas", "xiaoyao0115", "xuwchen", "yanring", "yaox12", "yaoyu-33", "yashaswikarnati", "yeyu-nvidia", "yobibyte", "youngeunkwon0405", "yuzhongw-nvidia", "zhongbozhu"]
trustees_override: ["AAnoosheh", "ArEsKay3", "Autumn1998", "BestJuly", "BoxiangW", "ChenhanYu", "FDecaYed", "HaochenYuan", "ISEEKYAN", "JRD971000", "Phlip79", "QiZhangNV", "RPrenger", "ShriyaRishab", "Victarry", "Wohox", "ZhiyuLi-Nvidia", "ahmadki", "aklife97", "ananthsub", "asolergi-nv", "buptzyb", "chtruong814", "cspades", "cuichenx", "deepakn94", "dimapihtar", "dingqingy-nv", "duncanriach", "erhoo82", "ericharper", "fanshiqing", "frsun-nvda", "gautham-kollu", "gdengk", "guyueh1", "hxbai", "ilml", "jalbericiola", "janEbert", "jaredcasper", "jenchen13", "jiemingz", "jingqiny-99", "jkamalu", "jon-barker", "jstjohn", "kanz-nv", "kevalmorabia97", "ko3n1g", "kunlunl", "kvareddy", "kwyss-nvidia", "layalir", "lhb8125", "lmcafee-nvidia", "maanug-nv", "mathemakitten", "matthieule", "mehraakash", "mkhona-nvidia", "parthmannan", "prajwal1210", "pthombre", "rogerwaleffe", "sajadn", "sanandaraj5597", "sancha", "santhnm2", "sbak5", "shanmugamr1992", "sharathts", "shengf-nv", "shifangx", "shjwudp", "sidsingh-nvidia", "skyw", "sudhakarsingh27", "tdene", "theothermike", "thomasdhc", "trintamaki", "tylerpoon", "wdykas", "xiaoyao0115", "xuwchen", "yanring", "yaox12", "yaoyu-33", "yashaswikarnati", "yeyu-nvidia", "yobibyte", "youngeunkwon0405", "yueshen2016", "yuzhongw-nvidia", "zhongbozhu"]
24 changes: 12 additions & 12 deletions .github/oncall_schedule.json
Original file line number Diff line number Diff line change
@@ -1,18 +1,6 @@
[
{
"user": "dimapihtar",
"date": "2026-01-28"
},
{
"user": "gautham-kollu",
"date": "2026-02-04"
},
{
"user": "janEbert",
"date": "2026-02-11"
},
{
"user": "Phlip79",
"date": "2026-02-18"
},
{
Expand Down Expand Up @@ -46,5 +34,17 @@
{
"user": "BoxiangW",
"date": "2026-04-15"
},
{
"user": "Phlip79",
"date": "2026-04-22"
},
{
"user": "asolergi-nv",
"date": "2026-04-29"
},
{
"user": "dimapihtar",
"date": "2026-05-06"
}
]
65 changes: 65 additions & 0 deletions .github/scripts/readme.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
#!/bin/bash

cat << 'EOF'
╔══════════════════════════════════════════════════════════════════════╗
║ ║
║ ███╗ ███╗██████╗ ██████╗ ██╗██████╗ ██████╗ ███████╗ ║
║ ████╗ ████║██╔══██╗██╔══██╗██║██╔══██╗██╔════╝ ██╔════╝ ║
║ ██╔████╔██║██████╔╝██████╔╝██║██║ ██║██║ ███╗█████╗ ║
║ ██║╚██╔╝██║██╔══██╗██╔══██╗██║██║ ██║██║ ██║██╔══╝ ║
║ ██║ ╚═╝ ██║██████╔╝██║ ██║██║██████╔╝╚██████╔╝███████╗ ║
║ ╚═╝ ╚═╝╚═════╝ ╚═╝ ╚═╝╚═╝╚═════╝ ╚═════╝ ╚══════╝ ║
║ ║
║ H O W T O : M B R I D G E T E S T I N G ║
╚══════════════════════════════════════════════════════════════════════╝

MBridge unit tests run automatically on every PR. To also trigger
functional tests, attach the label and re-run the workflow step.

┌─────────────────────────────────────────────────────────────────┐
│ DEFAULT │ Unit tests run on every PR (no action needed) │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Every PR ──► cicd-mbridge-testing ──► unit tests only │
│ │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│ STEP 1 │ Attach the label to your PR (for functional tests) │
├─────────────────────────────────────────────────────────────────┤
│ │
│ PR Labels ──► [ + Add label ] ──► "Run MBridge tests" │
│ │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│ STEP 2 │ Re-run this workflow step │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Actions ──► [ Re-run jobs ] ──► Re-run failed jobs │
│ │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│ RESULT │ Unit + functional tests run! │
├─────────────────────────────────────────────────────────────────┤
│ │
│ cicd-mbridge-testing ◄── unit + functional tests │
│ │
│ Tests run against MBridge using the merge commit │
│ SHA of your pull request. │
│ │
└─────────────────────────────────────────────────────────────────┘

┌────────────────────────────────────┐
│ Label present? NO → unit │
│ Label present? YES → unit + │
│ functional│
└────────────────────────────────────┘

NOTE: The label must be present BEFORE the re-run is triggered.
The CI checks for "Run MBridge tests" at runtime.

NOTE: All MBridge test results are optional — failures do not
block merging your PR.
EOF
5 changes: 1 addition & 4 deletions .github/workflows/_build_test_publish_wheel.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,6 @@ on:
type: boolean
default: true
secrets:
TWINE_USERNAME:
required: true
TWINE_PASSWORD:
required: true

Expand Down Expand Up @@ -147,7 +145,6 @@ jobs:
needs: [build-and-test-wheels]
runs-on: ubuntu-latest
if: inputs.no-publish == false
environment: ${{ (github.ref == 'refs/heads/main' || startsWith(github.ref, 'refs/heads/r')) && 'main' || 'public' }}
strategy:
fail-fast: false
matrix:
Expand All @@ -170,7 +167,7 @@ jobs:

- name: Publish wheels
env:
TWINE_USERNAME: ${{ secrets.TWINE_USERNAME }}
TWINE_USERNAME: __token__
TWINE_PASSWORD: ${{ secrets.TWINE_PASSWORD }}
TWINE_REPOSITORY: ${{ (github.ref == 'refs/heads/main' || startsWith(github.ref, 'refs/heads/r')) && 'pypi' || 'testpypi' }}
PLATFORM: ${{ matrix.PLATFORM }}
Expand Down
Loading