-
Notifications
You must be signed in to change notification settings - Fork 169
Bugfix for bmm style expert nvfp4 weight scale export #384
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughAdds buffer registration of a corrected Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant C as Caller
participant E as _export_quantized_weight
participant M as sub_module
C->>E: invoke(weight, weight_scale, ...)
E->>E: Quantize weight
E->>M: Assign quantized weight
alt weight_scale is not None
E->>M: register_buffer("weight_scale", corrected_scale)
Note right of M #DFF2E1: New: buffer persisted on module
else weight_scale is None
Note over E #FFEFD5: No buffer registration
end
E-->>C: return
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
🔇 Additional comments (1)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
modelopt/torch/export/unified_export_hf.py
(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: wait-checks / wait
- GitHub Check: wait-checks / wait
- GitHub Check: build-docs
- GitHub Check: code-quality
# Register the corrected weight_scale as a buffer | ||
if weight_scale is not None: | ||
sub_module.register_buffer(quantizer_attrs.weight_scale, weight_scale) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix duplicate buffer registration
register_buffer
throws if the name is already registered. Earlier in this function we always register quantizer_attrs.weight_scale
, so this new call will raise KeyError
for every quantized module, breaking export. Update the existing buffer instead of re-registering it.
- if weight_scale is not None:
- sub_module.register_buffer(quantizer_attrs.weight_scale, weight_scale)
+ if weight_scale is not None:
+ setattr(sub_module, quantizer_attrs.weight_scale, weight_scale)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
# Register the corrected weight_scale as a buffer | |
if weight_scale is not None: | |
sub_module.register_buffer(quantizer_attrs.weight_scale, weight_scale) | |
# Register the corrected weight_scale as a buffer | |
if weight_scale is not None: | |
setattr(sub_module, quantizer_attrs.weight_scale, weight_scale) |
🤖 Prompt for AI Agents
In modelopt/torch/export/unified_export_hf.py around lines 335-337, the code
unconditionally calls sub_module.register_buffer(quantizer_attrs.weight_scale,
weight_scale) which raises when that buffer name was already registered earlier;
instead check whether the buffer name already exists on sub_module and if so
update the existing buffer value (e.g., assign to
sub_module._buffers[quantizer_attrs.weight_scale] or equivalent), otherwise
register it; ensure you only call register_buffer when the name is absent so
duplicate registration (KeyError) is avoided.
Signed-off-by: Zhiyu Cheng <[email protected]>
Signed-off-by: Zhiyu Cheng <[email protected]>
Signed-off-by: Zhiyu Cheng <[email protected]>
Signed-off-by: Zhiyu Cheng <[email protected]>
9dd1a3f
to
7acbe57
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #384 +/- ##
==========================================
+ Coverage 73.76% 73.86% +0.09%
==========================================
Files 171 171
Lines 17618 17629 +11
==========================================
+ Hits 12996 13021 +25
+ Misses 4622 4608 -14 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: Zhiyu Cheng <[email protected]>
What does this PR do?
Type of change: Bug fix
Overview:
Testing
python hf_ptq.py --pyt_ckpt_path /home/scratch.omniml_data_2/models/Llama-4-Scout-17B-16E-Instruct/ --qformat nvfp4 --export_path /home/scratch.omniml_data_2/zhiyuc/checkpoints/Llama-4-Scout-17B-16E-Instruct-nvfp4-0925 --trust_remote_code
Before your PR is "Ready for review"
Additional Information
Summary by CodeRabbit