Avoid squeezing original weight tensors with leading dim 1 #294

cjluo-nv · 2025-09-05T17:49:25Z

What does this PR do?

bug fix

Overview: ?

Phi4MM has weights with leading dim 1. Squeezing it will yield to shape mismatch with the original weight.

Testing

scripts/huggingface_example.sh --model /models/microsoft/Phi-4-multimodal-instruct/ --quant fp8 --export_fmt hf --trust_remote_code

and check the weights shape

Summary by CodeRabbit

Bug Fixes
- Avoids unintended collapsing of leading singleton dimensions for certain 3D tensors during export/quantization; only scale-related tensors are collapsed.
- Reduces shape mismatches and runtime errors when loading, converting, or serving quantized models, improving checkpoint stability and compatibility.

Signed-off-by: Chenjie Luo <[email protected]>

coderabbitai · 2025-09-05T17:49:33Z

Caution

Review failed

An error occurred during the review process. Please try again later.

✨ Finishing Touches

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch cjluo-nv-patch-1

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Edwardf0t1 · 2025-09-05T18:00:54Z

modelopt/torch/export/quant_utils.py

                post_state_dict[prefix + new_suffix] = value
                break

-    # Squeeze tensors with a leading dimension of 1


Can you specify what errors you get for phi4-mm?

We needed this for sticking with our ckpt format and I remember if we don’t squeeze it will break some deployment flow with TRT-LLM.
It works for all our supported models so far.

codecov · 2025-09-05T18:05:19Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.93%. Comparing base (b233ad1) to head (ab4e6d9).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #294   +/-   ##
=======================================
  Coverage   73.93%   73.93%           
=======================================
  Files         172      172           
  Lines       17408    17408           
=======================================
  Hits        12870    12870           
  Misses       4538     4538

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: Chenjie Luo <[email protected]>

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

modelopt/torch/export/quant_utils.py (1)

872-881: Narrow the key match and avoid mutating during iteration.

"scale" substring can overmatch (e.g., keys containing "rescaled"). Safer to target leaf names ending with "scale" (e.g., k_scale, v_scale, pre_quant_scale) and iterate over a snapshot for clarity.

Apply:

-    # Squeeze scales with a leading dimension of 1
-    for key, value in post_state_dict.items():
-        if (
-            "scale" in key
-            and isinstance(value, torch.Tensor)
-            and value.dim() == 3
-            and value.shape[0] == 1
-        ):
-            post_state_dict[key] = value.squeeze(0)
+    # Only squeeze scale tensors (not weights) when the leading expert dim is 1.
+    for key, value in list(post_state_dict.items()):
+        if (
+            isinstance(value, torch.Tensor)
+            and value.dim() == 3
+            and value.shape[0] == 1
+            and key.rsplit(".", 1)[-1].endswith("scale")
+        ):
+            post_state_dict[key] = value.squeeze(0)

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 867922a and 76ce70a.

📒 Files selected for processing (1)

modelopt/torch/export/quant_utils.py (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: linux
GitHub Check: wait-checks / wait
GitHub Check: code-quality
GitHub Check: build-docs

🔇 Additional comments (2)

modelopt/torch/export/quant_utils.py (2)

872-881: Good fix: stop squeezing weights; only squeeze scale tensors with leading dim 1.

This aligns with the Phi-4-MM report and should avoid shape mismatches on weights while keeping exported scale shapes sane.

872-881: Add regression test for squeeze behavior of scales vs weights. Ensure that weight tensors (e.g., shape (1,4,8)) retain their leading dimension while scale tensors (e.g., shape (1,4,1)) are squeezed to (4,1).

Edwardf0t1

Discussed offline - We should whitelist the scaling factor tensors for squeezing rather than remove it.

Signed-off-by: Chenjie Luo <[email protected]>

ChenhanYu · 2025-09-08T21:29:54Z

https://gitlab-master.nvidia.com/omniml/modelopt/-/pipelines/34670934

Signed-off-by: Chenjie Luo <[email protected]>

Do not squeeze weights with leading dim 1

867922a

Signed-off-by: Chenjie Luo <[email protected]>

cjluo-nv requested a review from a team as a code owner September 5, 2025 17:49

cjluo-nv requested review from ChenhanYu and Edwardf0t1 September 5, 2025 17:49

Edwardf0t1 requested changes Sep 5, 2025

View reviewed changes

Update quant_utils.py

76ce70a

Signed-off-by: Chenjie Luo <[email protected]>

coderabbitai bot reviewed Sep 5, 2025

View reviewed changes

Edwardf0t1 approved these changes Sep 5, 2025

View reviewed changes

Edwardf0t1 changed the title ~~Do not squeeze weights with leading dim 1~~ Avoid squeezing original weight tensors with leading dim 1 Sep 5, 2025

Update quant_utils.py

501d8f0

Signed-off-by: Chenjie Luo <[email protected]>

cjluo-nv enabled auto-merge (squash) September 5, 2025 22:12

Merge branch 'main' into cjluo-nv-patch-1

ab4e6d9

cjluo-nv merged commit 512dbb7 into main Sep 9, 2025
25 of 28 checks passed

cjluo-nv deleted the cjluo-nv-patch-1 branch September 9, 2025 00:12

benchislett pushed a commit that referenced this pull request Sep 15, 2025

Avoid squeezing original weight tensors with leading dim 1 (#294)

0bc8654

Signed-off-by: Chenjie Luo <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Avoid squeezing original weight tensors with leading dim 1 #294

Avoid squeezing original weight tensors with leading dim 1 #294

Uh oh!

cjluo-nv commented Sep 5, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Sep 5, 2025 •

edited

Loading

Review failed

Uh oh!

Edwardf0t1 Sep 5, 2025

Uh oh!

codecov bot commented Sep 5, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Edwardf0t1 left a comment

Uh oh!

ChenhanYu commented Sep 8, 2025

Uh oh!

Uh oh!

Uh oh!

Avoid squeezing original weight tensors with leading dim 1 #294

Avoid squeezing original weight tensors with leading dim 1 #294

Uh oh!

Conversation

cjluo-nv commented Sep 5, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Uh oh!

Edwardf0t1 Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Edwardf0t1 left a comment

Choose a reason for hiding this comment

Uh oh!

ChenhanYu commented Sep 8, 2025

Uh oh!

Uh oh!

Uh oh!

cjluo-nv commented Sep 5, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 5, 2025 •

edited

Loading

codecov bot commented Sep 5, 2025 •

edited

Loading