Skip to content

Conversation

@vishalpandya1990
Copy link
Contributor

What does this PR do?

Type of change: Bug Fix

Overview: ?

  • In post-processing after NVFP4 PTQ and ONNX Export, we convert FP4-QDQ into DQ1->DQ2 for FP4 weights of the MatMuls. The output of DQ1 is of the original weight-type (FP16 for FP16 base model) but its scale is in FP32. There is a cast-to-fp16 after DQ2.
  • In above setting, with FP16 base model weights, DQ1 has x_scale in FP32 but its output type is set to FP16. This hybrid precision mode is not allowed up to opset-21, and thereby it leads to error when run with Onnxruntime.
  • Note that such hybrid precision mode is allowed in opset-23+ but they are not fully supported with onnxruntime EPs today, and even in future we would want to support opset < 23 too.
  • So, in this change, setting output of DQ1 to FP32 since its scale is in FP32. There is already a cast-to-fp16 after DQ2 (before Gemm).

Testing

  • Checked with trtexec binary and onnxruntime-trt-rtx ep - using sd3.5-medium model, on Windows RTX 5090.

Before your PR is "Ready for review"

  • Make sure you read and follow Contributor guidelines and your commits are signed.
  • Is this change backward compatible?: Yes/No
  • Did you write any new necessary tests?: Yes/No
  • Did you add or update any necessary documentation?: Yes/No
  • Did you update Changelog?: Yes/No

Additional Information

@vishalpandya1990 vishalpandya1990 requested a review from a team as a code owner November 5, 2025 10:56
@codecov
Copy link

codecov bot commented Nov 5, 2025

Codecov Report

❌ Patch coverage is 0% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.45%. Comparing base (b660d39) to head (e1d4af9).
⚠️ Report is 15 commits behind head on main.

Files with missing lines Patch % Lines
modelopt/onnx/quantization/qdq_utils.py 0.00% 6 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #513      +/-   ##
==========================================
+ Coverage   73.39%   73.45%   +0.05%     
==========================================
  Files         180      180              
  Lines       18134    18141       +7     
==========================================
+ Hits        13310    13325      +15     
+ Misses       4824     4816       -8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jingyu-ml
Copy link
Contributor

Did you test the performance before and after the change? Please attach the results to the PR, I’m asking because I’m concerned about potential performance impacts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants