Fix DQ1 output type error in DQ1->DQ2 for FP4 weights in NVFP4 model #513

vishalpandya1990 · 2025-11-05T10:56:28Z

What does this PR do?

Type of change: Bug Fix

Overview: ?

In post-processing after NVFP4 PTQ and ONNX Export, we convert FP4-QDQ into DQ1->DQ2 for FP4 weights of the MatMuls. The output of DQ1 is of the original weight-type (FP16 for FP16 base model) but its scale is in FP32. There is a cast-to-fp16 after DQ2.
In above setting, with FP16 base model weights, DQ1 has x_scale in FP32 but its output type is set to FP16. This hybrid precision mode is not allowed up to opset-21, and thereby it leads to error when run with Onnxruntime.
Note that such hybrid precision mode is allowed in opset-23+ but they are not fully supported with onnxruntime EPs today, and even in future we would want to support opset < 23 too.
So, in this change, setting output of DQ1 to FP32 since its scale is in FP32. There is already a cast-to-fp16 after DQ2 (before Gemm).

Testing

Checked with trtexec binary and onnxruntime-trt-rtx ep - using sd3.5-medium model, on Windows RTX 5090.

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed.
Is this change backward compatible?: Yes/No
Did you write any new necessary tests?: Yes/No
Did you add or update any necessary documentation?: Yes/No
Did you update Changelog?: Yes/No

Additional Information

Signed-off-by: vipandya <[email protected]>

codecov · 2025-11-05T11:12:06Z

Codecov Report

❌ Patch coverage is 0% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.45%. Comparing base (b660d39) to head (e1d4af9).
⚠️ Report is 15 commits behind head on main.

Files with missing lines	Patch %	Lines
modelopt/onnx/quantization/qdq_utils.py	0.00%	6 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #513      +/-   ##
==========================================
+ Coverage   73.39%   73.45%   +0.05%     
==========================================
  Files         180      180              
  Lines       18134    18141       +7     
==========================================
+ Hits        13310    13325      +15     
+ Misses       4824     4816       -8

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

jingyu-ml · 2025-11-06T04:37:14Z

Did you test the performance before and after the change? Please attach the results to the PR, I’m asking because I’m concerned about potential performance impacts.

vishalpandya1990 added 3 commits November 5, 2025 10:39

fix type error in dq1-dq2 for FP4 weights

e9169a2

Signed-off-by: vipandya <[email protected]>

remove unused function param

2ed5d7e

Signed-off-by: vipandya <[email protected]>

remove unused function param from caller

e1d4af9

Signed-off-by: vipandya <[email protected]>

vishalpandya1990 requested a review from a team as a code owner November 5, 2025 10:56

vishalpandya1990 requested review from ajrasane and jingyu-ml November 5, 2025 10:56

i-riyad approved these changes Nov 5, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix DQ1 output type error in DQ1->DQ2 for FP4 weights in NVFP4 model #513

Fix DQ1 output type error in DQ1->DQ2 for FP4 weights in NVFP4 model #513

vishalpandya1990 commented Nov 5, 2025

Uh oh!

codecov bot commented Nov 5, 2025

Uh oh!

jingyu-ml commented Nov 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Fix DQ1 output type error in DQ1->DQ2 for FP4 weights in NVFP4 model #513

Are you sure you want to change the base?

Fix DQ1 output type error in DQ1->DQ2 for FP4 weights in NVFP4 model #513

Conversation

vishalpandya1990 commented Nov 5, 2025

What does this PR do?

Testing

Before your PR is "Ready for review"

Additional Information

Uh oh!

codecov bot commented Nov 5, 2025

Codecov Report

Uh oh!

jingyu-ml commented Nov 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants