Add int8_sq back to auto_quant support list #345

cjluo-nv · 2025-09-19T05:42:45Z

What does this PR do?

minor change

Overview: ?

Add int8_sq back to auto_quant support list. We will just export the final checkpoint as the tensorrt_llm checkpoint.

Summary by CodeRabbit

New Features
- Added support for INT8 SmoothQuant in the auto-quantization workflow, enabling selection of the int8_sq format for model compression. This expands quantization options and can improve performance and memory efficiency on compatible hardware. No changes to public APIs or user-facing workflows; existing configurations continue to work as before, with the new format available as an additional choice.

Signed-off-by: Chenjie Luo <[email protected]>

coderabbitai · 2025-09-19T05:42:54Z

Walkthrough

Adds "int8_sq" to the qformat_list in auto_quantize within examples/llm_ptq/hf_ptq.py, enabling the int8_smoothquant configuration to be selected through the auto-quantize flow. No other logic, control flow, or API signatures changed.

Changes

Cohort / File(s)	Summary
Quantization option update `examples/llm_ptq/hf_ptq.py`	Include "int8_sq" in qformat_list within auto_quantize to allow selecting INT8_SMOOTHQUANT_CFG via the auto-quantize path.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

Poem

A nibble of bits, a hop to queue,
I toggled a switch to let INT8 through.
SmoothQuant whispers, “I’m in the mix!”
The carrots compile, no extra tricks.
With one small string, we bound and flew—
Byte-sized dreams in int8 hue. 🥕🐇

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title "Add int8_sq back to auto_quant support list" is a short, single sentence that directly summarizes the primary change—restoring int8_sq to the auto_quant support list—so it accurately reflects the changeset and is clear for a teammate scanning history.

✨ Finishing touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch cjluo-nv-patch-1

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

examples/llm_ptq/hf_ptq.py (2)
92-106: int8_sq re‑added to auto-quant support: LGTM; tighten wording and minor var rename

Change looks good. Two tiny cleanups to reduce confusion and shadowing:

Clarify the assertion message (this path may export TRT‑LLM checkpoints when int8_sq is used).

Avoid shadowing the parameter name in the comprehension.
-    assert all(
-        qformat
+    assert all(
+        fmt
         in [
             "fp8",
             "int8_sq",
             "int4_awq",
             "nvfp4",
             "nvfp4_awq",
             "w4a8_awq",
             "fp8_pb_wo",
             "w4a8_mxfp4_fp8",
             "nvfp4_mlp_only",
         ]
-        for qformat in qformat_list
-    ), "One or more quantization formats provided are not supported for unified checkpoint export"
+        for fmt in qformat_list
+    ), "One or more quantization formats provided are not supported by the auto-quantize export path"
Is exclusion of "fp8_pc_pt" from this allow-list (while present in QUANT_CFG_CHOICES) intentional for auto-quant? If yes, consider a brief comment above to prevent future regressions.

120-121: Avoid shadowing the built-in name format in list comprehension

Minor readability nit: don’t shadow Python’s built-in format().
-        quantization_formats=[QUANT_CFG_CHOICES[format] for format in qformat_list],
+        quantization_formats=[QUANT_CFG_CHOICES[fmt] for fmt in qformat_list],

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4c36abe and e54ce4e.

📒 Files selected for processing (1)

examples/llm_ptq/hf_ptq.py (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: linux
GitHub Check: wait-checks / wait
GitHub Check: build-docs
GitHub Check: code-quality

🔇 Additional comments (1)

examples/llm_ptq/hf_ptq.py (1)

588-596: Export behavior consistent with PR description

Condition explicitly routes int8_sq to TensorRT‑LLM checkpoint export. Matches the PR statement about final checkpoint format. No action needed.

codecov · 2025-09-19T05:55:19Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.83%. Comparing base (4c36abe) to head (e54ce4e).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #345   +/-   ##
=======================================
  Coverage   73.83%   73.83%           
=======================================
  Files         172      172           
  Lines       17453    17453           
=======================================
  Hits        12887    12887           
  Misses       4566     4566

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Add int8_sq back to auto_quant support list

e54ce4e

Signed-off-by: Chenjie Luo <[email protected]>

cjluo-nv requested a review from a team as a code owner September 19, 2025 05:42

cjluo-nv requested a review from Edwardf0t1 September 19, 2025 05:42

cjluo-nv requested a review from realAsma September 19, 2025 05:42

coderabbitai bot reviewed Sep 19, 2025

View reviewed changes

realAsma approved these changes Sep 19, 2025

View reviewed changes

realAsma merged commit 5a3fd29 into main Sep 19, 2025
27 checks passed

realAsma deleted the cjluo-nv-patch-1 branch September 19, 2025 13:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add int8_sq back to auto_quant support list #345

Add int8_sq back to auto_quant support list #345

Uh oh!

cjluo-nv commented Sep 19, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Sep 19, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

codecov bot commented Sep 19, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Add int8_sq back to auto_quant support list #345

Add int8_sq back to auto_quant support list #345

Uh oh!

Conversation

cjluo-nv commented Sep 19, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

cjluo-nv commented Sep 19, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 19, 2025 •

edited

Loading

codecov bot commented Sep 19, 2025 •

edited

Loading