Added support for TElinear ops #632

kinjalpatel27 · 2025-12-02T20:35:12Z

iCo-authored-by: Asma Kuriparambil Thekkumpate [email protected]

What does this PR do?

Type of change: New Feature

Overview:
This MR adds support for quantizing TE Ops in megatron, specifically TERowParallelLinear, TEColumnParallelLinear and TELayerNormColumnParallelLinear.

Usage

It can be used by enabling TE spec in megatron

Testing

Added unit tests for testing functionality
test_homogeneous_sharded_state_dict_te_spec
test_convert_mcore_te_gpt_model
test_quantize_forward_backward

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed.
Is this change backward compatible?: Yes
Did you write any new necessary tests?: Yes
Did you add or update any necessary documentation?: Yes
Did you update Changelog?: Yes

Additional Information

iCo-authored-by: Asma Kuriparambil Thekkumpate <[email protected]> Co-authored-by: Kinjal Patel <[email protected]> Signed-off-by: Kinjal Patel <[email protected]>

Signed-off-by: Kinjal Patel <[email protected]>

codecov · 2025-12-02T20:45:50Z

Codecov Report

❌ Patch coverage is 83.78378% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.55%. Comparing base (d0b0c0f) to head (d7c4802).
⚠️ Report is 21 commits behind head on main.

Files with missing lines	Patch %	Lines
modelopt/torch/utils/logging.py	80.00%	6 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #632      +/-   ##
==========================================
- Coverage   74.64%   74.55%   -0.09%     
==========================================
  Files         183      183              
  Lines       18542    18432     -110     
==========================================
- Hits        13840    13742      -98     
+ Misses       4702     4690      -12

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: Kinjal Patel <[email protected]>

modelopt/torch/quantization/plugins/transformer_engine.py

Signed-off-by: Kinjal Patel <[email protected]>

mxinO

Looks great to me! Do we still need a special modelopt layer spec after this MR?

mxinO · 2025-12-04T02:27:53Z

tests/gpu/torch/quantization/plugins/test_megatron.py

    )
+
+
+def test_convert_mcore_te_gpt_model(distributed_setup_size_1):


Do we have basic tests for TE? ref_output == test_output for quantizers disabled, and ref_output != test_output for quantizers enabled?

mxinO · 2025-12-04T02:36:16Z

modelopt/torch/quantization/plugins/transformer_engine.py

-            weight, weight_fp8, inputs = args[0], args[1], args[2]
-            remaining_args = args[3:]
+            idx = 1 if func_name == "_forward" else 0
+            weight, weight_fp8, inputs = args[idx], args[idx + 1], args[idx + 2]


We should raise an error somewhere if TE fp8 training is enabled when using modelopt.

Signed-off-by: Kinjal Patel <[email protected]>

kinjalpatel27 added 2 commits December 2, 2025 20:19

Added support for TElinear ops

f25d5b8

iCo-authored-by: Asma Kuriparambil Thekkumpate <[email protected]> Co-authored-by: Kinjal Patel <[email protected]> Signed-off-by: Kinjal Patel <[email protected]>

minor

33a1e30

Signed-off-by: Kinjal Patel <[email protected]>

kinjalpatel27 requested review from a team as code owners December 2, 2025 20:35

kinjalpatel27 requested review from meenchen and mxinO December 2, 2025 20:35

minor

2cda4fc

Signed-off-by: Kinjal Patel <[email protected]>

realAsma reviewed Dec 3, 2025

View reviewed changes

modelopt/torch/quantization/plugins/transformer_engine.py Show resolved Hide resolved

realAsma approved these changes Dec 3, 2025

View reviewed changes

kinjalpatel27 added 2 commits December 3, 2025 18:32

minor

afcbb21

Signed-off-by: Kinjal Patel <[email protected]>

minor

8cd3451

Signed-off-by: Kinjal Patel <[email protected]>

mxinO reviewed Dec 4, 2025

View reviewed changes

minor

d7c4802

Signed-off-by: Kinjal Patel <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Added support for TElinear ops #632

Added support for TElinear ops #632

Uh oh!

kinjalpatel27 commented Dec 2, 2025

Uh oh!

codecov bot commented Dec 2, 2025 •

edited

Loading

Uh oh!

Uh oh!

mxinO left a comment

Uh oh!

mxinO Dec 4, 2025

Uh oh!

mxinO Dec 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		)


		def test_convert_mcore_te_gpt_model(distributed_setup_size_1):

Added support for TElinear ops #632

Are you sure you want to change the base?

Added support for TElinear ops #632

Uh oh!

Conversation

kinjalpatel27 commented Dec 2, 2025

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Uh oh!

codecov bot commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

mxinO left a comment

Choose a reason for hiding this comment

Uh oh!

mxinO Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

mxinO Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov bot commented Dec 2, 2025 •

edited

Loading