Add command-line options in Windows' llm-ptq example for Gather nodes' INT4 ONNX quantization #418

vishalpandya1990 · 2025-10-09T15:38:03Z

What does this PR do?

Type of change: Windows' example update.

Overview: Support for ONNX INT4 quantization of Gather nodes have already been added. In this PR, updating the Windows' GenAI llm-ptq example to add options for this support.

Usage

use command-line options like --gather_quantize_axis=1 --gather_block_size=64 for GenAI LLM quantization. Full command-line example and useful command-line options is explained in the associated readme of this example.

Testing

Tested INT4 AWQ quantization of Gather nodes on Windows RTX 5090 with llama3.2-1B-instruct model.

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed.
Is this change backward compatible?: Yes/No
Did you write any new necessary tests?: Yes/No
Did you add or update any necessary documentation?: Yes/No
Did you update Changelog?: Yes/No

Additional Information

Summary by CodeRabbit

New Features
- Added CLI options to tune INT4 quantization of Gather nodes: --gather_block_size (default 32) and --gather_quantize_axis (optional).
- These options are now applied during quantization runs.
Documentation
- Updated README arguments table with the new options, defaults, and descriptions.
- Clarified help text to state that --block_size applies to INT4 quantization of MatMul/Gemm nodes.

Signed-off-by: vipandya <[email protected]>

coderabbitai · 2025-10-09T15:38:16Z

Walkthrough

Adds new CLI options for INT4 Gather-node quantization in the Windows ONNX PTQ example, threads them through the argument parser, updates the quantization call to pass new parameters, and documents the options in the README. Also clarifies help text for the existing block_size option.

Changes

Cohort / File(s)	Summary of Changes
Docs update (CLI options) `examples/windows/onnx_ptq/genai_llm/README.md`	Added CLI flags table entries for `--gather_quantize_axis` and `--gather_block_size` with defaults and descriptions.
CLI and quantization wiring `examples/windows/onnx_ptq/genai_llm/quantize.py`	Introduced CLI args `--gather_block_size` (int, default 32) and `--gather_quantize_axis` (int, default None). Passed them to `modelopt.onnx.quantization.int4.quantize_int4`. Updated help text for `--block_size` to target INT4 MatMul/Gemm.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    actor U as User
    participant CLI as quantize.py (CLI)
    participant Q as Quantization Config
    participant INT4 as modelopt.onnx.quantization.int4.quantize_int4
    participant M as ONNX Model

    U->>CLI: Run with args (--gather_block_size, --gather_quantize_axis, ...)
    CLI->>Q: Parse args into config
    note right of Q: New fields:<br/>gather_block_size<br/>gather_quantize_axis
    Q->>INT4: quantize_int4(model, ..., gather_block_size, gather_quantize_axis)
    INT4->>M: Apply INT4 quantization<br/>incl. Gather-specific settings
    INT4-->>CLI: Quantized model artifact
    CLI-->>U: Save/print completion

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

I twitch my ears at Gather’s call,
New knobs to tweak the blocks so small—
An axis set, the bits align,
INT4 crumbs in tidy line.
I thump, I hop, the graphs comply,
README squeaks, “Here’s how to try!”
Quant dreams baked—carrot-pie. 🥕✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The title clearly and concisely summarizes the core change by indicating that new command-line options are being added to the Windows llm-ptq example for Gather node INT4 ONNX quantization, reflecting the main modifications without extraneous detail and using specific, descriptive terms that help reviewers quickly understand the purpose of the pull request.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch vipandya/update_gather_example

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bc54694 and 4808114.

📒 Files selected for processing (2)

examples/windows/onnx_ptq/genai_llm/README.md (1 hunks)
examples/windows/onnx_ptq/genai_llm/quantize.py (2 hunks)

🔇 Additional comments (3)

examples/windows/onnx_ptq/genai_llm/quantize.py (2)

558-559: LGTM: Improved clarity for block_size parameter.

The updated help text clearly distinguishes this parameter as specific to MatMul/Gemm nodes, differentiating it from the new gather_block_size parameter.

444-445: LGTM: Parameters correctly passed to quantize_int4.

The new Gather quantization parameters are properly threaded through to the quantize_int4 function call, following the established pattern for other arguments.

examples/windows/onnx_ptq/genai_llm/README.md (1)

61-62: LGTM: Documentation accurately describes the new options.

The table entries correctly document the defaults and provide clear guidance on when to use these parameters. The description properly notes that gather_block_size is only relevant when gather_quantize_axis is enabled.

coderabbitai · 2025-10-09T15:40:26Z

examples/windows/onnx_ptq/genai_llm/quantize.py

+    parser.add_argument(
+        "--gather_block_size",
+        type=int,
+        default=32,
+        help="Block size for INT4 quantization of Gather nodes",
+    )
+    parser.add_argument(
+        "--gather_quantize_axis",
+        type=int,
+        default=None,
+        help="Quantization axis for INT4 quantization of Gather nodes",
    )


⚠️ Potential issue | 🟠 Major

Add input validation for the new Gather quantization parameters.

The new CLI arguments lack validation that could prevent runtime errors:

gather_quantize_axis: The README specifies "choose 0 or 1", but no validation enforces this constraint. Invalid values (e.g., 2, -1, 100) could be passed to quantize_int4, potentially causing unexpected behavior or errors.

gather_block_size: No validation ensures the value is positive. Zero or negative values could cause issues during quantization.

Apply this diff to add validation:

parser.add_argument( "--gather_quantize_axis", type=int, default=None, help="Quantization axis for INT4 quantization of Gather nodes", ) + args = parser.parse_args() + + # Validate gather quantization parameters + if args.gather_quantize_axis is not None and args.gather_quantize_axis not in [0, 1]: + parser.error("--gather_quantize_axis must be 0 or 1") + if args.gather_block_size <= 0: + parser.error("--gather_block_size must be positive") + main(args)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

parser.add_argument(

"--gather_block_size",

type=int,

default=32,

help="Block size for INT4 quantization of Gather nodes",

)

parser.add_argument(

"--gather_quantize_axis",

type=int,

default=None,

help="Quantization axis for INT4 quantization of Gather nodes",

)

parser.add_argument(

"--gather_block_size",

type=int,

default=32,

help="Block size for INT4 quantization of Gather nodes",

)

parser.add_argument(

"--gather_quantize_axis",

type=int,

default=None,

help="Quantization axis for INT4 quantization of Gather nodes",

)

args = parser.parse_args()

# Validate gather quantization parameters

if args.gather_quantize_axis is not None and args.gather_quantize_axis not in [0, 1]:

parser.error("--gather_quantize_axis must be 0 or 1")

if args.gather_block_size <= 0:

parser.error("--gather_block_size must be positive")

main(args)

🤖 Prompt for AI Agents

In examples/windows/onnx_ptq/genai_llm/quantize.py around lines 560 to 571, the new CLI args lack validation: enforce that gather_quantize_axis is either None or 0/1 and that gather_block_size is a positive integer (>0). Implement validation either by using argparse choices for gather_quantize_axis (choices=[0,1]) and type=int for gather_block_size with a custom positive-int type, or by checking after parse_args and calling parser.error with a clear message if args.gather_quantize_axis not in (None,0,1) or args.gather_block_size <= 0; ensure the script exits with a helpful error message on invalid input.

ynankani

LGTM

codecov · 2025-10-09T15:58:48Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.36%. Comparing base (39f81e5) to head (4808114).
⚠️ Report is 3 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #418   +/-   ##
=======================================
  Coverage   73.36%   73.36%           
=======================================
  Files         180      180           
  Lines       17919    17919           
=======================================
+ Hits        13146    13147    +1     
+ Misses       4773     4772    -1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…' INT4 ONNX quantization (#418) Signed-off-by: vipandya <[email protected]> Signed-off-by: Hrishith Thadicherla <[email protected]>

Add command-line options for Gather nodes' INT4 quantization

4808114

Signed-off-by: vipandya <[email protected]>

vishalpandya1990 self-assigned this Oct 9, 2025

vishalpandya1990 requested a review from a team as a code owner October 9, 2025 15:38

vishalpandya1990 requested a review from ynankani October 9, 2025 15:38

coderabbitai bot reviewed Oct 9, 2025

View reviewed changes

ynankani approved these changes Oct 9, 2025

View reviewed changes

vishalpandya1990 merged commit ee19a7e into main Oct 10, 2025
27 checks passed

vishalpandya1990 deleted the vipandya/update_gather_example branch October 10, 2025 05:45

hthadicherla pushed a commit that referenced this pull request Oct 14, 2025

Add command-line options in Windows' llm-ptq example for Gather nodes…

7f30fdc

…' INT4 ONNX quantization (#418) Signed-off-by: vipandya <[email protected]> Signed-off-by: Hrishith Thadicherla <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add command-line options in Windows' llm-ptq example for Gather nodes' INT4 ONNX quantization #418

Add command-line options in Windows' llm-ptq example for Gather nodes' INT4 ONNX quantization #418

Uh oh!

vishalpandya1990 commented Oct 9, 2025 •

edited

Loading

Uh oh!

coderabbitai bot commented Oct 9, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Oct 9, 2025

Uh oh!

ynankani left a comment

Uh oh!

codecov bot commented Oct 9, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add command-line options in Windows' llm-ptq example for Gather nodes' INT4 ONNX quantization #418

Add command-line options in Windows' llm-ptq example for Gather nodes' INT4 ONNX quantization #418

Uh oh!

Conversation

vishalpandya1990 commented Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 9, 2025

Choose a reason for hiding this comment

Uh oh!

ynankani left a comment

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

vishalpandya1990 commented Oct 9, 2025 •

edited

Loading

coderabbitai bot commented Oct 9, 2025 •

edited

Loading

codecov bot commented Oct 9, 2025 •

edited

Loading