Skip to content

Conversation

cjluo-nv
Copy link
Collaborator

@cjluo-nv cjluo-nv commented Sep 5, 2025

What does this PR do?

bug fix

Overview: ?

Phi4MM has weights with leading dim 1. Squeezing it will yield to shape mismatch with the original weight.

Testing

scripts/huggingface_example.sh --model /models/microsoft/Phi-4-multimodal-instruct/ --quant fp8 --export_fmt hf --trust_remote_code

and check the weights shape

Summary by CodeRabbit

  • Bug Fixes
    • Avoids unintended collapsing of leading singleton dimensions for certain 3D tensors during export/quantization; only scale-related tensors are collapsed.
    • Reduces shape mismatches and runtime errors when loading, converting, or serving quantized models, improving checkpoint stability and compatibility.

@cjluo-nv cjluo-nv requested a review from a team as a code owner September 5, 2025 17:49
Copy link

coderabbitai bot commented Sep 5, 2025

Caution

Review failed

An error occurred during the review process. Please try again later.

✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch cjluo-nv-patch-1

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

post_state_dict[prefix + new_suffix] = value
break

# Squeeze tensors with a leading dimension of 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you specify what errors you get for phi4-mm?

We needed this for sticking with our ckpt format and I remember if we don’t squeeze it will break some deployment flow with TRT-LLM.
It works for all our supported models so far.

Copy link

codecov bot commented Sep 5, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.93%. Comparing base (b233ad1) to head (ab4e6d9).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #294   +/-   ##
=======================================
  Coverage   73.93%   73.93%           
=======================================
  Files         172      172           
  Lines       17408    17408           
=======================================
  Hits        12870    12870           
  Misses       4538     4538           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: Chenjie Luo <[email protected]>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
modelopt/torch/export/quant_utils.py (1)

872-881: Narrow the key match and avoid mutating during iteration.

"scale" substring can overmatch (e.g., keys containing "rescaled"). Safer to target leaf names ending with "scale" (e.g., k_scale, v_scale, pre_quant_scale) and iterate over a snapshot for clarity.

Apply:

-    # Squeeze scales with a leading dimension of 1
-    for key, value in post_state_dict.items():
-        if (
-            "scale" in key
-            and isinstance(value, torch.Tensor)
-            and value.dim() == 3
-            and value.shape[0] == 1
-        ):
-            post_state_dict[key] = value.squeeze(0)
+    # Only squeeze scale tensors (not weights) when the leading expert dim is 1.
+    for key, value in list(post_state_dict.items()):
+        if (
+            isinstance(value, torch.Tensor)
+            and value.dim() == 3
+            and value.shape[0] == 1
+            and key.rsplit(".", 1)[-1].endswith("scale")
+        ):
+            post_state_dict[key] = value.squeeze(0)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 867922a and 76ce70a.

📒 Files selected for processing (1)
  • modelopt/torch/export/quant_utils.py (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: linux
  • GitHub Check: wait-checks / wait
  • GitHub Check: code-quality
  • GitHub Check: build-docs
🔇 Additional comments (2)
modelopt/torch/export/quant_utils.py (2)

872-881: Good fix: stop squeezing weights; only squeeze scale tensors with leading dim 1.

This aligns with the Phi-4-MM report and should avoid shape mismatches on weights while keeping exported scale shapes sane.


872-881: Add regression test for squeeze behavior of scales vs weights. Ensure that weight tensors (e.g., shape (1,4,8)) retain their leading dimension while scale tensors (e.g., shape (1,4,1)) are squeezed to (4,1).

Copy link
Contributor

@Edwardf0t1 Edwardf0t1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline - We should whitelist the scaling factor tensors for squeezing rather than remove it.

@Edwardf0t1 Edwardf0t1 changed the title Do not squeeze weights with leading dim 1 Avoid squeezing original weight tensors with leading dim 1 Sep 5, 2025
Signed-off-by: Chenjie Luo <[email protected]>
@cjluo-nv cjluo-nv enabled auto-merge (squash) September 5, 2025 22:12
@ChenhanYu
Copy link
Collaborator

@cjluo-nv cjluo-nv merged commit 512dbb7 into main Sep 9, 2025
25 of 28 checks passed
@cjluo-nv cjluo-nv deleted the cjluo-nv-patch-1 branch September 9, 2025 00:12
benchislett pushed a commit that referenced this pull request Sep 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants