-
Notifications
You must be signed in to change notification settings - Fork 162
Add int8_sq back to auto_quant support list #345
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Chenjie Luo <[email protected]>
WalkthroughAdds "int8_sq" to the qformat_list in auto_quantize within examples/llm_ptq/hf_ptq.py, enabling the int8_smoothquant configuration to be selected through the auto-quantize flow. No other logic, control flow, or API signatures changed. Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~2 minutes Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
examples/llm_ptq/hf_ptq.py (2)
92-106
: int8_sq re‑added to auto-quant support: LGTM; tighten wording and minor var renameChange looks good. Two tiny cleanups to reduce confusion and shadowing:
- Clarify the assertion message (this path may export TRT‑LLM checkpoints when int8_sq is used).
- Avoid shadowing the parameter name in the comprehension.
- assert all( - qformat + assert all( + fmt in [ "fp8", "int8_sq", "int4_awq", "nvfp4", "nvfp4_awq", "w4a8_awq", "fp8_pb_wo", "w4a8_mxfp4_fp8", "nvfp4_mlp_only", ] - for qformat in qformat_list - ), "One or more quantization formats provided are not supported for unified checkpoint export" + for fmt in qformat_list + ), "One or more quantization formats provided are not supported by the auto-quantize export path"Is exclusion of "fp8_pc_pt" from this allow-list (while present in QUANT_CFG_CHOICES) intentional for auto-quant? If yes, consider a brief comment above to prevent future regressions.
120-121
: Avoid shadowing the built-in name format in list comprehensionMinor readability nit: don’t shadow Python’s built-in format().
- quantization_formats=[QUANT_CFG_CHOICES[format] for format in qformat_list], + quantization_formats=[QUANT_CFG_CHOICES[fmt] for fmt in qformat_list],
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
examples/llm_ptq/hf_ptq.py
(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: linux
- GitHub Check: wait-checks / wait
- GitHub Check: build-docs
- GitHub Check: code-quality
🔇 Additional comments (1)
examples/llm_ptq/hf_ptq.py (1)
588-596
: Export behavior consistent with PR descriptionCondition explicitly routes int8_sq to TensorRT‑LLM checkpoint export. Matches the PR statement about final checkpoint format. No action needed.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #345 +/- ##
=======================================
Coverage 73.83% 73.83%
=======================================
Files 172 172
Lines 17453 17453
=======================================
Hits 12887 12887
Misses 4566 4566 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
What does this PR do?
minor change
Overview: ?
Add int8_sq back to auto_quant support list. We will just export the final checkpoint as the tensorrt_llm checkpoint.
Summary by CodeRabbit