-
Notifications
You must be signed in to change notification settings - Fork 190
Add command-line options in Windows' llm-ptq example for Gather nodes' INT4 ONNX quantization #418
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: vipandya <[email protected]>
WalkthroughAdds new CLI options for INT4 Gather-node quantization in the Windows ONNX PTQ example, threads them through the argument parser, updates the quantization call to pass new parameters, and documents the options in the README. Also clarifies help text for the existing block_size option. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor U as User
participant CLI as quantize.py (CLI)
participant Q as Quantization Config
participant INT4 as modelopt.onnx.quantization.int4.quantize_int4
participant M as ONNX Model
U->>CLI: Run with args (--gather_block_size, --gather_quantize_axis, ...)
CLI->>Q: Parse args into config
note right of Q: New fields:<br/>gather_block_size<br/>gather_quantize_axis
Q->>INT4: quantize_int4(model, ..., gather_block_size, gather_quantize_axis)
INT4->>M: Apply INT4 quantization<br/>incl. Gather-specific settings
INT4-->>CLI: Quantized model artifact
CLI-->>U: Save/print completion
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
examples/windows/onnx_ptq/genai_llm/README.md(1 hunks)examples/windows/onnx_ptq/genai_llm/quantize.py(2 hunks)
🔇 Additional comments (3)
examples/windows/onnx_ptq/genai_llm/quantize.py (2)
558-559: LGTM: Improved clarity for block_size parameter.The updated help text clearly distinguishes this parameter as specific to MatMul/Gemm nodes, differentiating it from the new
gather_block_sizeparameter.
444-445: LGTM: Parameters correctly passed to quantize_int4.The new Gather quantization parameters are properly threaded through to the
quantize_int4function call, following the established pattern for other arguments.examples/windows/onnx_ptq/genai_llm/README.md (1)
61-62: LGTM: Documentation accurately describes the new options.The table entries correctly document the defaults and provide clear guidance on when to use these parameters. The description properly notes that
gather_block_sizeis only relevant whengather_quantize_axisis enabled.
| parser.add_argument( | ||
| "--gather_block_size", | ||
| type=int, | ||
| default=32, | ||
| help="Block size for INT4 quantization of Gather nodes", | ||
| ) | ||
| parser.add_argument( | ||
| "--gather_quantize_axis", | ||
| type=int, | ||
| default=None, | ||
| help="Quantization axis for INT4 quantization of Gather nodes", | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add input validation for the new Gather quantization parameters.
The new CLI arguments lack validation that could prevent runtime errors:
-
gather_quantize_axis: The README specifies "choose 0 or 1", but no validation enforces this constraint. Invalid values (e.g., 2, -1, 100) could be passed toquantize_int4, potentially causing unexpected behavior or errors. -
gather_block_size: No validation ensures the value is positive. Zero or negative values could cause issues during quantization.
Apply this diff to add validation:
parser.add_argument(
"--gather_quantize_axis",
type=int,
default=None,
help="Quantization axis for INT4 quantization of Gather nodes",
)
+
args = parser.parse_args()
+
+ # Validate gather quantization parameters
+ if args.gather_quantize_axis is not None and args.gather_quantize_axis not in [0, 1]:
+ parser.error("--gather_quantize_axis must be 0 or 1")
+ if args.gather_block_size <= 0:
+ parser.error("--gather_block_size must be positive")
+
main(args)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| parser.add_argument( | |
| "--gather_block_size", | |
| type=int, | |
| default=32, | |
| help="Block size for INT4 quantization of Gather nodes", | |
| ) | |
| parser.add_argument( | |
| "--gather_quantize_axis", | |
| type=int, | |
| default=None, | |
| help="Quantization axis for INT4 quantization of Gather nodes", | |
| ) | |
| parser.add_argument( | |
| "--gather_block_size", | |
| type=int, | |
| default=32, | |
| help="Block size for INT4 quantization of Gather nodes", | |
| ) | |
| parser.add_argument( | |
| "--gather_quantize_axis", | |
| type=int, | |
| default=None, | |
| help="Quantization axis for INT4 quantization of Gather nodes", | |
| ) | |
| args = parser.parse_args() | |
| # Validate gather quantization parameters | |
| if args.gather_quantize_axis is not None and args.gather_quantize_axis not in [0, 1]: | |
| parser.error("--gather_quantize_axis must be 0 or 1") | |
| if args.gather_block_size <= 0: | |
| parser.error("--gather_block_size must be positive") | |
| main(args) |
🤖 Prompt for AI Agents
In examples/windows/onnx_ptq/genai_llm/quantize.py around lines 560 to 571, the
new CLI args lack validation: enforce that gather_quantize_axis is either None
or 0/1 and that gather_block_size is a positive integer (>0). Implement
validation either by using argparse choices for gather_quantize_axis
(choices=[0,1]) and type=int for gather_block_size with a custom positive-int
type, or by checking after parse_args and calling parser.error with a clear
message if args.gather_quantize_axis not in (None,0,1) or args.gather_block_size
<= 0; ensure the script exits with a helpful error message on invalid input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #418 +/- ##
=======================================
Coverage 73.36% 73.36%
=======================================
Files 180 180
Lines 17919 17919
=======================================
+ Hits 13146 13147 +1
+ Misses 4773 4772 -1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…' INT4 ONNX quantization (#418) Signed-off-by: vipandya <[email protected]> Signed-off-by: Hrishith Thadicherla <[email protected]>
What does this PR do?
Type of change: Windows' example update.
Overview: Support for ONNX INT4 quantization of Gather nodes have already been added. In this PR, updating the Windows' GenAI llm-ptq example to add options for this support.
Usage
use command-line options like
--gather_quantize_axis=1--gather_block_size=64for GenAI LLM quantization. Full command-line example and useful command-line options is explained in the associated readme of this example.Testing
Before your PR is "Ready for review"
Additional Information
Summary by CodeRabbit