Skip to content

Conversation

@kinjalpatel27
Copy link
Contributor

iCo-authored-by: Asma Kuriparambil Thekkumpate [email protected]

What does this PR do?

Type of change: New Feature

Overview:
This MR adds support for quantizing TE Ops in megatron, specifically TERowParallelLinear, TEColumnParallelLinear and TELayerNormColumnParallelLinear.

Usage

It can be used by enabling TE spec in megatron

Testing

Added unit tests for testing functionality
test_homogeneous_sharded_state_dict_te_spec
test_convert_mcore_te_gpt_model
test_quantize_forward_backward

Before your PR is "Ready for review"

  • Make sure you read and follow Contributor guidelines and your commits are signed.
  • Is this change backward compatible?: Yes
  • Did you write any new necessary tests?: Yes
  • Did you add or update any necessary documentation?: Yes
  • Did you update Changelog?: Yes

Additional Information

iCo-authored-by: Asma Kuriparambil Thekkumpate <[email protected]>
Co-authored-by: Kinjal Patel <[email protected]>
Signed-off-by: Kinjal Patel <[email protected]>
Signed-off-by: Kinjal Patel <[email protected]>
@kinjalpatel27 kinjalpatel27 requested review from a team as code owners December 2, 2025 20:35
@codecov
Copy link

codecov bot commented Dec 2, 2025

Codecov Report

❌ Patch coverage is 83.78378% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.55%. Comparing base (d0b0c0f) to head (d7c4802).
⚠️ Report is 21 commits behind head on main.

Files with missing lines Patch % Lines
modelopt/torch/utils/logging.py 80.00% 6 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #632      +/-   ##
==========================================
- Coverage   74.64%   74.55%   -0.09%     
==========================================
  Files         183      183              
  Lines       18542    18432     -110     
==========================================
- Hits        13840    13742      -98     
+ Misses       4702     4690      -12     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: Kinjal Patel <[email protected]>
Signed-off-by: Kinjal Patel <[email protected]>
Signed-off-by: Kinjal Patel <[email protected]>
Copy link
Contributor

@mxinO mxinO left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me! Do we still need a special modelopt layer spec after this MR?

)


def test_convert_mcore_te_gpt_model(distributed_setup_size_1):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have basic tests for TE? ref_output == test_output for quantizers disabled, and ref_output != test_output for quantizers enabled?

weight, weight_fp8, inputs = args[0], args[1], args[2]
remaining_args = args[3:]
idx = 1 if func_name == "_forward" else 0
weight, weight_fp8, inputs = args[idx], args[idx + 1], args[idx + 2]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should raise an error somewhere if TE fp8 training is enabled when using modelopt.

Signed-off-by: Kinjal Patel <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants