-
Notifications
You must be signed in to change notification settings - Fork 206
Added support for TElinear ops #632
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
iCo-authored-by: Asma Kuriparambil Thekkumpate <[email protected]> Co-authored-by: Kinjal Patel <[email protected]> Signed-off-by: Kinjal Patel <[email protected]>
Signed-off-by: Kinjal Patel <[email protected]>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #632 +/- ##
==========================================
- Coverage 74.64% 74.55% -0.09%
==========================================
Files 183 183
Lines 18542 18432 -110
==========================================
- Hits 13840 13742 -98
+ Misses 4702 4690 -12 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: Kinjal Patel <[email protected]>
Signed-off-by: Kinjal Patel <[email protected]>
Signed-off-by: Kinjal Patel <[email protected]>
mxinO
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great to me! Do we still need a special modelopt layer spec after this MR?
| ) | ||
|
|
||
|
|
||
| def test_convert_mcore_te_gpt_model(distributed_setup_size_1): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have basic tests for TE? ref_output == test_output for quantizers disabled, and ref_output != test_output for quantizers enabled?
| weight, weight_fp8, inputs = args[0], args[1], args[2] | ||
| remaining_args = args[3:] | ||
| idx = 1 if func_name == "_forward" else 0 | ||
| weight, weight_fp8, inputs = args[idx], args[idx + 1], args[idx + 2] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should raise an error somewhere if TE fp8 training is enabled when using modelopt.
Signed-off-by: Kinjal Patel <[email protected]>
iCo-authored-by: Asma Kuriparambil Thekkumpate [email protected]
What does this PR do?
Type of change: New Feature
Overview:
This MR adds support for quantizing TE Ops in megatron, specifically TERowParallelLinear, TEColumnParallelLinear and TELayerNormColumnParallelLinear.
Usage
It can be used by enabling TE spec in megatron
Testing
Added unit tests for testing functionality
test_homogeneous_sharded_state_dict_te_spectest_convert_mcore_te_gpt_modeltest_quantize_forward_backwardBefore your PR is "Ready for review"
Additional Information