Bugfix for bmm style expert nvfp4 weight scale export #384

Edwardf0t1 · 2025-09-27T09:07:36Z

What does this PR do?

Type of change: Bug fix

Overview:

Fix the missing buffer registration issue for corrected nvfp4 weight scaling factors for bmm style expert.
Resolve https://nvbugspro.nvidia.com/bug/5497630

Testing

python hf_ptq.py --pyt_ckpt_path /home/scratch.omniml_data_2/models/Llama-4-Scout-17B-16E-Instruct/ --qformat nvfp4 --export_path /home/scratch.omniml_data_2/zhiyuc/checkpoints/Llama-4-Scout-17B-16E-Instruct-nvfp4-0925 --trust_remote_code

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed.
Is this change backward compatible?: Yes
Did you write any new necessary tests?: Yes/No
Did you add or update any necessary documentation?: Yes/No
Did you update Changelog?: Yes/No

Additional Information

Summary by CodeRabbit

Bug Fixes
- Preserves corrected weight-scale metadata in exported quantized models, improving inference accuracy and consistency.
- Makes weight-scale information available after export (e.g., for serialization and loading), preventing missing-attribute issues in downstream workflows.

coderabbitai · 2025-09-27T09:07:44Z

Walkthrough

Adds buffer registration of a corrected weight_scale inside _export_quantized_weight: after computing and assigning the quantized weight, if weight_scale is present the value is registered as a buffer on the sub_module. No function signatures or error handling changed.

Changes

Cohort / File(s)	Summary
Quantized export buffer registration `modelopt/torch/export/unified_export_hf.py`	After quantization, conditionally registers corrected `weight_scale` as a buffer on the `sub_module` when available; no API changes.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant C as Caller
  participant E as _export_quantized_weight
  participant M as sub_module

  C->>E: invoke(weight, weight_scale, ...)
  E->>E: Quantize weight
  E->>M: Assign quantized weight
  alt weight_scale is not None
    E->>M: register_buffer("weight_scale", corrected_scale)
    Note right of M #DFF2E1: New: buffer persisted on module
  else weight_scale is None
    Note over E #FFEFD5: No buffer registration
  end
  E-->>C: return

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

I nibbled code where weights reside,
A tiny scale now stays inside.
Buffer burrow, safe and snug,
Quantized crumbs in module’s rug.
With twitching ears, I hop away—
“Stored the scale!” is all I say. 🐇

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title succinctly conveys that the pull request addresses a bug in the bmm-style expert nvfp4 weight scale export, which matches the core change of registering the corrected weight_scale buffer during export. It is concise and specific, enabling teammates to quickly understand the purpose without extraneous information.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch zhiyu/fix-bmm-experts-nvfp4-export

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9dd1a3f and 7acbe57.

📒 Files selected for processing (1)

modelopt/torch/export/unified_export_hf.py (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: linux
GitHub Check: wait-checks / wait
GitHub Check: wait-checks / wait
GitHub Check: build-docs
GitHub Check: code-quality

🔇 Additional comments (1)

modelopt/torch/export/unified_export_hf.py (1)
335-337: Fix duplicate buffer registration crash

Line 337 re-registers quantizer_attrs.weight_scale, but this buffer was already registered earlier in _export_quantized_weight. Calling register_buffer twice with the same name raises KeyError, which will break export for every quantized module. Update the existing buffer value instead of re-registering it.
-    if weight_scale is not None:
-        sub_module.register_buffer(quantizer_attrs.weight_scale, weight_scale)
+    if weight_scale is not None:
+        if quantizer_attrs.weight_scale in sub_module._buffers:
+            sub_module._buffers[quantizer_attrs.weight_scale] = weight_scale
+        else:
+            sub_module.register_buffer(quantizer_attrs.weight_scale, weight_scale)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 615f3c0 and 9dd1a3f.

📒 Files selected for processing (1)

modelopt/torch/export/unified_export_hf.py (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: wait-checks / wait
GitHub Check: wait-checks / wait
GitHub Check: build-docs
GitHub Check: code-quality

coderabbitai · 2025-09-27T09:10:29Z

modelopt/torch/export/unified_export_hf.py

+    # Register the corrected weight_scale as a buffer
+    if weight_scale is not None:
+        sub_module.register_buffer(quantizer_attrs.weight_scale, weight_scale)


⚠️ Potential issue | 🔴 Critical

Fix duplicate buffer registration

register_buffer throws if the name is already registered. Earlier in this function we always register quantizer_attrs.weight_scale, so this new call will raise KeyError for every quantized module, breaking export. Update the existing buffer instead of re-registering it.

- if weight_scale is not None: - sub_module.register_buffer(quantizer_attrs.weight_scale, weight_scale) + if weight_scale is not None: + setattr(sub_module, quantizer_attrs.weight_scale, weight_scale)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# Register the corrected weight_scale as a buffer

if weight_scale is not None:

sub_module.register_buffer(quantizer_attrs.weight_scale, weight_scale)

# Register the corrected weight_scale as a buffer

if weight_scale is not None:

setattr(sub_module, quantizer_attrs.weight_scale, weight_scale)

🤖 Prompt for AI Agents

In modelopt/torch/export/unified_export_hf.py around lines 335-337, the code unconditionally calls sub_module.register_buffer(quantizer_attrs.weight_scale, weight_scale) which raises when that buffer name was already registered earlier; instead check whether the buffer name already exists on sub_module and if so update the existing buffer value (e.g., assign to sub_module._buffers[quantizer_attrs.weight_scale] or equivalent), otherwise register it; ensure you only call register_buffer when the name is absent so duplicate registration (KeyError) is avoided.

Signed-off-by: Zhiyu Cheng <[email protected]>

codecov · 2025-09-27T19:14:06Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.86%. Comparing base (ad091e8) to head (7acbe57).
⚠️ Report is 15 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #384      +/-   ##
==========================================
+ Coverage   73.76%   73.86%   +0.09%     
==========================================
  Files         171      171              
  Lines       17618    17629      +11     
==========================================
+ Hits        12996    13021      +25     
+ Misses       4622     4608      -14

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: Zhiyu Cheng <[email protected]>

Edwardf0t1 requested a review from a team as a code owner September 27, 2025 09:07

Edwardf0t1 requested a review from sugunav14 September 27, 2025 09:07

Edwardf0t1 requested a review from kevalmorabia97 September 27, 2025 09:08

coderabbitai bot reviewed Sep 27, 2025

View reviewed changes

Edwardf0t1 added 4 commits September 27, 2025 19:00

fix bmm experts export issue with nvfp4 scales

061f2e5

Signed-off-by: Zhiyu Cheng <[email protected]>

fix bmm experts export issue with nvfp4 scales

a27920d

Signed-off-by: Zhiyu Cheng <[email protected]>

debug

c1b53cb

Signed-off-by: Zhiyu Cheng <[email protected]>

cleanup

7acbe57

Signed-off-by: Zhiyu Cheng <[email protected]>

Edwardf0t1 force-pushed the zhiyu/fix-bmm-experts-nvfp4-export branch from 9dd1a3f to 7acbe57 Compare September 27, 2025 19:00

cjluo-nv approved these changes Oct 2, 2025

View reviewed changes

meenchen approved these changes Oct 2, 2025

View reviewed changes

Edwardf0t1 merged commit baf55f2 into main Oct 2, 2025
27 checks passed

Edwardf0t1 deleted the zhiyu/fix-bmm-experts-nvfp4-export branch October 2, 2025 19:14

kevalmorabia97 pushed a commit that referenced this pull request Oct 3, 2025

Bugfix for bmm style expert nvfp4 weight scale export (#384)

4088baf

Signed-off-by: Zhiyu Cheng <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bugfix for bmm style expert nvfp4 weight scale export #384

Bugfix for bmm style expert nvfp4 weight scale export #384

Uh oh!

Edwardf0t1 commented Sep 27, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Sep 27, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Sep 27, 2025

Uh oh!

codecov bot commented Sep 27, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Bugfix for bmm style expert nvfp4 weight scale export #384

Bugfix for bmm style expert nvfp4 weight scale export #384

Uh oh!

Conversation

Edwardf0t1 commented Sep 27, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Testing

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Sep 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 27, 2025

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Sep 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Edwardf0t1 commented Sep 27, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 27, 2025 •

edited

Loading

codecov bot commented Sep 27, 2025 •

edited

Loading