Open
Conversation
Signed-off-by: Maanu Grover <maanug@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: Maanu Grover <maanug@nvidia.com>
Co-authored-by: Xin Yao <xiny@nvidia.com>
Signed-off-by: Guyue Huang <guyueh@oci-hsg-cs-001-vscode-01.cm.cluster> Co-authored-by: Guyue Huang <guyueh@oci-hsg-cs-001-vscode-01.cm.cluster> Co-authored-by: Xin Yao <xiny@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: Charlie Truong <chtruong@nvidia.com>
Signed-off-by: Charlie Truong <chtruong@nvidia.com>
…IA#2765) Co-authored-by: Xin Yao <xiny@nvidia.com>
…VIDIA#3300) Co-authored-by: Xin Yao <xiny@nvidia.com>
Signed-off-by: nvidia <nvidia@TRY-64956-gpu01.nvidialaunchpad.com>
Signed-off-by: Faradawn Yang <73060648+faradawn@users.noreply.github.com>
… Manager patch, docs (NVIDIA#3507) Signed-off-by: Faradawn Yang <73060648+faradawn@users.noreply.github.com>
Signed-off-by: Keshav Santhanam <ksanthanam@nvidia.com>
Made-with: Cursor
Contributor
Author
|
/ok to test 8096db6 |
Preseed uv-created environments with setuptools and related build tools so git-source dependencies resolve reliably in CI, align the Transformer Engine source pin with the preserved lockfile, and restore the missing grouped_gemm helper that broke install-time imports. Made-with: Cursor
Contributor
Author
|
/ok to test c2cbe54 |
- Add InferenceGroupedMLP class to experts.py from main (was lost in merge conflict resolution while backends.py imported it) - Add megatron/core/inference/moe/ module from main (dependency of InferenceGroupedMLP) - Update mxfp8_tensor.py and add mxfp8_quantize.py from main (needed for triton backend support in inference MoE) - Fix duplicate autodoc items: remove duplicate _EMERGING_OPTIMIZERS placeholder, duplicate fsdp_all_gather_in_start_param_sync field, duplicate logger assignments in spec_utils.py and token_dispatcher.py - Fix docs warnings: replace H3 heading in moe_utils.py docstring with bold text, remove orphaned docs/source/api-guide/router_replay.md, remove redundant docs/api-guide/fine_grained_activation_offloading.md, exclude deepseek reproduce guide from Sphinx toctree check Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Contributor
Author
|
/ok to test 57a5344 |
- param_and_grad_buffer.py: keep layerwise optimizer all_gather path from main and dev's grad_enabled caching + no_grad wrapper for the standard distributed optimizer path - transformer_config.py: keep both fused_residual_rmsnorm (main) and use_transformer_engine_op_fuser (dev) config fields - test_mamba_moe_model.py: keep golden config entries from both branches Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Contributor
Author
|
/ok to test d057244 |
The process_mtp_loss function now passes input_ as keyword arg (from dev branch changes), but the test mock expected a positional 'hidden' parameter. Updated the mock signature to match. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Contributor
Author
|
/ok to test 1e9a599 |
Contributor
Author
|
/ok to test 1e9a599 |
The float32 variant deterministically times out with an NCCL ALLREDUCE timeout (SeqNum=361) in some CI shards while passing in others. The test and fusion code are identical to dev branch, indicating a pre-existing infrastructure issue with multi-GPU JIT compilation timing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Contributor
Author
|
/ok to test fc2d334 |
…nction The router.py passes dense_output=True for inference mode but the merge took dev's version of moe_utils.py which lacks this parameter. Added from main to fix TypeError in InferenceTopKRouter. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Contributor
Author
|
/ok to test 6521ee2 |
Contributor
Author
|
/ok to test bd44a67 |
The test was passing layer_wise_distributed_optimizer as a keyword arg to get_megatron_muon_optimizer(), but that function doesn't accept it. Set it on the OptimizerConfig object instead. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
bd44a67 to
5c04917
Compare
Contributor
Author
|
/ok to test 5c04917 |
Pass async_allgather and model_chunks from optimizer config to LayerWiseDistributedOptimizer constructor so overlap param gather works correctly with layer-wise optimizers. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Contributor
Author
|
/ok to test d8caf0a |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do ?
Sync the latest
mainbranch changes into the development line and preserve ongoingdev-only work through the required conflict resolutions.Summary
mainchanges into the merge branch, including broad updates across CI/workflows, docs, examples, inference, FSDP/resharding, and test coverage.devwork around emerging optimizers and layer-wise optimizer refactoring, HyperConnection, Dynamic-CP / THD handling, MoE, MLA/MTP, and related attention and training paths.devbehavior is retained while adopting the latestmainchanges.devuv.lockduring the merge because regenerating it with the installeduvcurrently fails on upstreamnemo-runmetadata.Contribution process
Pre-checks
Code review
Feel free to message or comment the @mcore-oncall to help accelerate your merge into main. The less complex your PR is, the faster it will be approved and merged!
All PRs start as draft. If you open a non-draft PR, it will be automatically converted to draft.
Step 1: Mark PR as "Ready for Review"
.github/CODEOWNERS.Final Review might get declined if these requirements are not fulfilled.
Step 2: Final Review
For PRs that change
megatron/core, once all expert reviewers have approved, theFinal Reviewlabel is applied automatically and final reviewers are assigned.For PRs outside
megatron/core, this step is skipped.Step 3: Approved
Once all required reviewers have approved, the
Approvedlabel is applied automatically.Merge
Any member of mcore-engineers will be able to merge your PR.
For MRs into `dev` branch
The proposed review process for `dev` branch is under active discussion.MRs are mergable after one approval by either
eharper@nvidia.comorzijiey@nvidia.com.