Skip to content

Conversation

@conver334
Copy link

@conver334 conver334 commented Jan 22, 2026

What does this PR do ?

#1838 add exporting to model.layers.*.self_attn.rotary_emb.inv_freq. Its shape is not correct. The following error occurred:

model.layers.9.self_attn.rotary_emb.inv_freq        │ torch.Size([32]) != torch.Size([64])

Changelog

The calculation method of this layer is modified and compared with the imported HF parameters.

GitHub Actions CI

See the CI sectionin the Contributing doc for how to trigger the CI. A Nvidia developer will need to approve and trigger the CI for external contributors.

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

If you haven't finished some of the above items you can still open "Draft" PR.

Additional Information

  • Related to # (issue)

Summary by CodeRabbit

  • Bug Fixes

    • Enhanced weight verification with dtype normalization and shape-mismatch detection during model conversion.
    • Fixed rotary frequency calculations to ensure proper dtype consistency across devices.
  • Tests

    • Added functional test coverage for Moonlight model conversion with distributed training parallelism verification.
    • Updated rotary embedding tests to reflect corrected calculation methodology.

✏️ Tip: You can customize this high-level summary in your review settings.

@copy-pr-bot
Copy link

copy-pr-bot bot commented Jan 22, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Signed-off-by: conver334 <[email protected]>
yaoyu-33
yaoyu-33 previously approved these changes Jan 22, 2026
Signed-off-by: conver334 <[email protected]>
yaoyu-33
yaoyu-33 previously approved these changes Jan 22, 2026
@yaoyu-33
Copy link
Contributor

/ok to test a023beb

Signed-off-by: conver334 <[email protected]>
yaoyu-33
yaoyu-33 previously approved these changes Jan 23, 2026
@yaoyu-33
Copy link
Contributor

/ok to test 8d4c394

auto-merge was automatically disabled January 23, 2026 05:22

Head branch was pushed to by a user without write access

@BoxiangW BoxiangW enabled auto-merge (squash) January 28, 2026 02:51
Copy link
Contributor

@BoxiangW BoxiangW left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@BoxiangW
Copy link
Contributor

/ok to test 13ac99d

@yaoyu-33
Copy link
Contributor

/ok to test 664ed69

@coderabbitai
Copy link

coderabbitai bot commented Jan 28, 2026

📝 Walkthrough

Walkthrough

Updates weight verification logic with dtype normalization and shape-mismatch detection, corrects rotary frequency calculation in DeepSeek V3 bridge from step 2 to step 1, refactors benchmark script to use weights verification table, adds comprehensive Moonlight model conversion functional test, and adjusts corresponding unit test expectations.

Changes

Cohort / File(s) Summary
Weight Verification Enhancements
src/megatron/bridge/models/conversion/utils.py
Added dtype-normalized comparison for parameters and shape-mismatch detection prior to value comparison; "Matches Original" column now indicates shape mismatches or boolean allclose results
DeepSeek Rotary Frequency Fixes
src/megatron/bridge/models/deepseek/deepseek_v3_bridge.py, tests/unit_tests/models/deepseek/test_deepseek_bridges.py
Changed torch.arange step from 2 to 1 in inverse frequency calculation; added explicit dtype coercion to reference tensor's dtype; updated test assertions to match new expected values
Benchmark Script Refactoring
examples/conversion/hf_megatron_roundtrip_benchmark.py
Replaced per-iteration weight export loop with single weights_verification_table(pool) call; prints table only on rank 0; adjusted timing measurement to encapsulate table printing
Moonlight Model Conversion Test
tests/functional_tests/models/deepseek/test_moonlight_conversion.py
New functional test module with TestMoonlightConversion class; adds fixture for toy model creation with config overrides and dynamic model class loading; includes GPU-only parallelism test validating conversion across multiple TP/PP/EP configurations and config preservation

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 2
❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 57.14% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Results For Major Changes ⚠️ Warning PR contains significant changes to inv_freq calculation but PR description lacks test results, numerical validation, or confirmation that shape mismatch errors are resolved. Update PR description to include test execution results, explicit numerical validation of inv_freq shapes, and console output from test suite.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title accurately describes the main fix: correcting the inv_freq shape in exported Deepseek models' rotary embeddings.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@BoxiangW
Copy link
Contributor

/ok to test 5acf621

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants