Skip to content

Fuse MLA DOWN projection GEMMs#3039

Merged
yaox12 merged 20 commits intoNVIDIA:mainfrom
cjld:main
Mar 11, 2026
Merged

Fuse MLA DOWN projection GEMMs#3039
yaox12 merged 20 commits intoNVIDIA:mainfrom
cjld:main

Conversation

@cjld
Copy link
Contributor

@cjld cjld commented Jan 22, 2026

dev PR: #2960

What does this PR do ?

Add cli args --mla-down-proj-fusion and mla_down_proj_fusion to the transformer config for the fusion.

⚠️ For major changes (either in lines of code or in its impact), please make sure to first share a design doc with the team. If you're unsure what's the best way to do so, contact the @mcore-oncall.

Contribution process

flowchart LR
    A[Pre-checks] --> B[PR Tests]
    subgraph Code Review/Approval
        C1[Expert Review] --> C2[Final Review]
    end
    B --> C1
    C2 --> D[Merge]
Loading

Pre-checks

  • I want this PR in a versioned release and have added the appropriate Milestone (e.g., Core 0.8)
  • I have added relevant unit tests
  • I have added relevant functional tests
  • I have added proper typing to my code Typing guidelines
  • I have added relevant documentation
  • I have run the autoformatter.sh on my PR

Code review

The following process is enforced via the CODEOWNERS file for changes into megatron/core. For changes outside of megatron/core, it is up to the PR author whether or not to tag the Final Reviewer team.

For MRs into `main` branch

Feel free to message or comment the @mcore-oncall to help accelerate your merge into main. The less complex your PR is, the faster it will be approved and merged!

(Step 1): Add PR label Expert Review

(Step 2): Collect the expert reviewers reviews

  1. Attach the Expert Review label when your PR is ready for review.
  2. GitHub auto-assigns expert reviewers based on your changes. They will get notified and pick up your PR soon.

⚠️ Only proceed to the next step once all reviewers have approved, merge-conflict are resolved and the CI is passing.
Final Review might get declined if these requirements are not fulfilled.

(Step 3): Final Review

  1. Add Final Review label
  2. GitHub auto-assigns final reviewers based on your changes. They will get notified and pick up your PR soon.

(Optional Step 4): Cherry-pick into release branch

If this PR also needs to be merged into core_r* release branches, after this PR has been merged, select Cherry-pick to open a new PR into the release branch.

For MRs into `dev` branch The proposed review process for `dev` branch is under active discussion.

MRs are mergable after one approval by either eharper@nvidia.com or zijiey@nvidia.com.

Merging your PR

Any member of core-adlr and core-nemo will be able to merge your PR.

@cjld cjld requested review from a team as code owners January 22, 2026 06:38
@copy-pr-bot
Copy link

copy-pr-bot bot commented Jan 22, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@ko3n1g ko3n1g requested a review from a team January 22, 2026 06:39
@erhoo82 erhoo82 added the dev2main: mbridge dev to main: this PR is needed in main for mbridge label Jan 22, 2026
@erhoo82 erhoo82 added this to the Core 0.16 milestone Jan 22, 2026
@maanug-nv
Copy link
Contributor

is there a PR to dev that we can link? i don't see the same changes on dev branch currently?

@maanug-nv maanug-nv added complexity: medium Expert Review [deprecated] Apply this label to indicate that your PR is ready for expert review. labels Jan 22, 2026
@maanug-nv maanug-nv requested a review from deepakn94 January 22, 2026 09:44
@erhoo82 erhoo82 mentioned this pull request Jan 23, 2026
6 tasks
@maanug-nv
Copy link
Contributor

Hi @cjld , are you able to update this PR to remove usage of env vars? I think that's the main blocker

@cjld
Copy link
Contributor Author

cjld commented Jan 29, 2026

Hi, @maanug-nv , I updated the PR. If the PR still has problems, please tell me.

@Phlip79
Copy link
Member

Phlip79 commented Feb 3, 2026

/ok to test 54b205d

@github-actions
Copy link
Contributor

github-actions bot commented Feb 3, 2026

Thank you for your contribution!

NVIDIA Megatron-LM is currently transitioning to development on Github. We will aim to review your PR after we complete our transition and stabilize our Github development process.

Thank you for your understanding.

@yaox12
Copy link
Member

yaox12 commented Mar 10, 2026

/ok to test a4050c1

@yaox12
Copy link
Member

yaox12 commented Mar 10, 2026

/ok to test 7597a63

@yaox12
Copy link
Member

yaox12 commented Mar 10, 2026

/ok to test 2b76d60

@cjld
Copy link
Contributor Author

cjld commented Mar 11, 2026

Can we add unit tests for this feature?

Hi, @ericharper , could you please review this unittest?

Merged FusedMLA unit tests into test_multi_latent_attention.py, covering:

  • Constructor & weight shape: Verifies FusedMLASelfAttention instantiation and fused down-projection weight dimensions (q_lora_rank + kv_lora_rank + qk_pos_emb_head_dim).
  • Forward pass: Tests GPU forward in both fp32 and bf16, parametrized over rope_type (yarn/rope).
  • QKV split: Validates _qkv_down_projection output shapes for q and kv branches.
  • Backward pass: Confirms gradient flow through linear_qkv_down_proj and input.
  • State dict compatibility: Loads unfused state dict into fused model and verifies weight fusion; checks sharded_state_dict() splits fused weight back into separate q/kv keys.
  • Error handling: Asserts q_lora_rank=None raises AssertionError.

All 14 tests pass.

@Phlip79 Phlip79 requested a review from a team March 11, 2026 01:11
@svcnvidia-nemo-ci svcnvidia-nemo-ci added Approved All necessary approvals have been made and removed Final Review PR is in the "final review" stage labels Mar 11, 2026
@chtruong814 chtruong814 removed the needs-follow-up Issue needs follow-up label Mar 11, 2026
@yaox12
Copy link
Member

yaox12 commented Mar 11, 2026

/ok to test f3cb7b4

@yaox12 yaox12 added this pull request to the merge queue Mar 11, 2026
@svcnvidia-nemo-ci
Copy link

🔄 Merge queue validation started!

You can track the progress here: https://github.com/NVIDIA/Megatron-LM/actions/runs/22938248348

Merged via the queue into NVIDIA:main with commit 8fd390d Mar 11, 2026
52 of 53 checks passed
HollowMan6 pushed a commit to HollowMan6/Megatron-LM that referenced this pull request Mar 16, 2026
Signed-off-by: Xin Yao <xiny@nvidia.com>
Co-authored-by: Xin Yao <xiny@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Approved All necessary approvals have been made complexity: medium dev2main: mbridge dev to main: this PR is needed in main for mbridge

Projects

None yet

Development

Successfully merging this pull request may close these issues.