Enable direct INT8/FP8 input in ONNX graph #354

gcunhase · 2025-09-22T21:03:30Z

What does this PR do?

Type of change: Expand existing feature

Overview: Allows users to generate a quantized ONNX model with direct quantized input via the existing --direct_io_types flag. This flag currently only support FP16 as the direct IO type, but this PR extends that to allow INT8 / FP8 precision as the graph input. For example, if the 1st layer if a model is being quantized and the user enables --direct_io_types, the graph input would directly connect to the 1st DQ node, thus setting the graph input to the network's quantize_mode.

Usage

$ python -m modelopt.onnx.quantization --onnx_path=resnet-18.onnx --direct_io_types

Testing

$ python -m modelopt.onnx.quantization --onnx_path=resnet-18.onnx --direct_io_types

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed.
Is this change backward compatible?: Yes
Did you write any new necessary tests?: N/A
Did you add or update any necessary documentation?: Yes
Did you update Changelog?: No

Summary by CodeRabbit

New Features
- Optional cleanup now removes quantize nodes at model inputs when direct_io_types is enabled, producing cleaner ONNX I/O with direct inputs in the desired precision.
Documentation
- Updated CLI help to explicitly list supported direct_io_types precisions: fp16, int8, fp8.
- Minor formatting improvement to quantize_mode help text for readability.

Signed-off-by: gcunhase <[email protected]>

coderabbitai · 2025-09-22T21:03:39Z

Walkthrough

Introduces a new utility to remove QuantizeLinear nodes at graph inputs, integrates it into the quantization flow when direct_io_types is enabled, and updates CLI help text for quantize_mode and direct_io_types. No public signatures changed; ONNX model IR is set to 10 when applying the new cleanup.

Changes

Cohort / File(s)	Summary
CLI help text tweaks `modelopt/onnx/quantization/__main__.py`	Reformats quantize_mode help string; extends direct_io_types help to list supported precisions {fp16, int8, fp8}.
QDQ input cleanup utility `modelopt/onnx/quantization/qdq_utils.py`	Adds `remove_graph_input_q(onnx.ModelProto) -> onnx.ModelProto` to remove QuantizeLinear at graph inputs, rewire to DQ consumers, adjust input tensor types, delete processed Q nodes, and set IR version to 10.
Quantization flow integration `modelopt/onnx/quantization/quantize.py`	Imports `remove_graph_input_q` and conditionally invokes it after removing input DQ and output Q when `direct_io_types` is True, before final sorting/export.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant U as User
    participant QZ as quantize.py
    participant QDU as qdq_utils
    participant M as ONNX Model

    U->>QZ: run quantize(...)
    QZ->>QDU: remove_input_dq_and_output_q(M)
    QDU-->>QZ: M (updated)

    alt direct_io_types == True
        QZ->>QDU: remove_graph_input_q(M)
        QDU-->>QZ: M (inputs rewired, IR=10)
    else direct_io_types == False
        Note over QZ: Skip input Q removal
    end

    QZ->>QZ: topological sort / export
    QZ-->>U: quantized ONNX model

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

I nibble wires, hop through nodes so spry,
Snip Q at inputs, let the DQ fly.
Help text polished, tidy as dew,
IR set, the graph feels new.
With twitching ears I squeak “direct I/O!”—
Quant dreams in fp8, int8, fp16 glow.

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title "Enable direct INT8/FP8 input in ONNX graph" concisely and accurately captures the primary change in this PR: extending direct_io_types to allow direct INT8 and FP8 inputs and adding logic (remove_graph_input_q) to prune input Quantize nodes. It is specific to the main functionality, clear to reviewers, and avoids unnecessary detail.
Docstring Coverage	✅ Passed	Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

modelopt/onnx/quantization/__main__.py (1)

245-251: Clarify scope and behavior of --direct_io_types.

Consider making the help explicit that this currently:

Applies only for INT8/FP8 quantization paths (inputs), not INT4; and
Adjusts inputs only (outputs remain unchanged).

This avoids user confusion.

Apply this minimal tweak:

-            "If True, the I/O types in the quantized ONNX model will be modified to be lower precision whenever "
-            "possible. Else, they will match the I/O types in the given ONNX model. "
-            "The currently supported precisions are {fp16, int8, fp8}."
+            "If True, attempts to use lower‑precision graph I/O where possible. Currently affects inputs only for "
+            "INT8/FP8 quantization; outputs are unchanged. Otherwise, I/O types match the input ONNX model. "
+            "Supported input precisions: {fp16, int8, fp8}."

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c0590b0 and 40dd0c5.

📒 Files selected for processing (3)

modelopt/onnx/quantization/__main__.py (2 hunks)
modelopt/onnx/quantization/qdq_utils.py (1 hunks)
modelopt/onnx/quantization/quantize.py (2 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

modelopt/onnx/quantization/quantize.py (1)

modelopt/onnx/quantization/qdq_utils.py (2)

remove_graph_input_q (874-943)

remove_input_dq_and_output_q (738-871)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: linux
GitHub Check: wait-checks / wait
GitHub Check: build-docs
GitHub Check: code-quality

🔇 Additional comments (3)

modelopt/onnx/quantization/__main__.py (1)

39-40: Help text reflow looks good.

No behavioral change; wording is clear.

modelopt/onnx/quantization/quantize.py (2)

63-67: Import wiring looks correct.

New utility is imported alongside existing Q/DQ helpers; consistent with module boundaries.

505-507: Guard against models without explicit zero-points at input DQ.

remove_graph_input_q assumes a 3‑input DQ (scale + zero_point) and that the zero_point is an initializer. Some toolchains emit 2‑input DQ (no zero_point) or feed zero_point via Constant. That would raise at runtime.

Action: Harden remove_graph_input_q (preferred; see suggested patch in qdq_utils.py comment) or add a precheck here before calling it.

Would you confirm that our INT8/FP8 activation quant flows always materialize zero_point as an initializer for input DQ? If not, I’ll push a defensive change.

modelopt/onnx/quantization/qdq_utils.py

codecov · 2025-09-22T21:16:36Z

Codecov Report

❌ Patch coverage is 9.37500% with 29 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.72%. Comparing base (74061f5) to head (40dd0c5).
⚠️ Report is 3 commits behind head on main.

Files with missing lines	Patch %	Lines
modelopt/onnx/quantization/qdq_utils.py	3.44%	28 Missing ⚠️
modelopt/onnx/quantization/quantize.py	66.66%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #354      +/-   ##
==========================================
- Coverage   73.84%   73.72%   -0.12%     
==========================================
  Files         172      172              
  Lines       17453    17484      +31     
==========================================
+ Hits        12888    12890       +2     
- Misses       4565     4594      +29

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: gcunhase <[email protected]>

Signed-off-by: gcunhase <[email protected]> Signed-off-by: Ye Yu <[email protected]>

Enable direct INT8/FP8 input in graph

40dd0c5

Signed-off-by: gcunhase <[email protected]>

gcunhase requested a review from a team as a code owner September 22, 2025 21:03

gcunhase requested a review from i-riyad September 22, 2025 21:03

coderabbitai bot reviewed Sep 22, 2025

View reviewed changes

modelopt/onnx/quantization/qdq_utils.py Show resolved Hide resolved

i-riyad approved these changes Sep 23, 2025

View reviewed changes

gcunhase merged commit 3d2004b into NVIDIA:main Sep 23, 2025
27 checks passed

kevalmorabia97 pushed a commit that referenced this pull request Sep 25, 2025

Enable direct INT8/FP8 input in ONNX graph (#354)

45814ac

Signed-off-by: gcunhase <[email protected]>

yeyu-nvidia pushed a commit that referenced this pull request Oct 1, 2025

Enable direct INT8/FP8 input in ONNX graph (#354)

55c5b4b

Signed-off-by: gcunhase <[email protected]> Signed-off-by: Ye Yu <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enable direct INT8/FP8 input in ONNX graph #354

Enable direct INT8/FP8 input in ONNX graph #354

Uh oh!

gcunhase commented Sep 22, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Sep 22, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

codecov bot commented Sep 22, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Enable direct INT8/FP8 input in ONNX graph #354

Enable direct INT8/FP8 input in ONNX graph #354

Uh oh!

Conversation

gcunhase commented Sep 22, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Sep 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

codecov bot commented Sep 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

gcunhase commented Sep 22, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 22, 2025 •

edited

Loading

codecov bot commented Sep 22, 2025 •

edited

Loading