Skip to content

Conversation

cjluo-nv
Copy link
Collaborator

@cjluo-nv cjluo-nv commented Sep 19, 2025

What does this PR do?

minor change

Overview: ?

Add int8_sq back to auto_quant support list. We will just export the final checkpoint as the tensorrt_llm checkpoint.

Summary by CodeRabbit

  • New Features
    • Added support for INT8 SmoothQuant in the auto-quantization workflow, enabling selection of the int8_sq format for model compression. This expands quantization options and can improve performance and memory efficiency on compatible hardware. No changes to public APIs or user-facing workflows; existing configurations continue to work as before, with the new format available as an additional choice.

@cjluo-nv cjluo-nv requested a review from a team as a code owner September 19, 2025 05:42
@cjluo-nv cjluo-nv requested a review from Edwardf0t1 September 19, 2025 05:42
Copy link

coderabbitai bot commented Sep 19, 2025

Walkthrough

Adds "int8_sq" to the qformat_list in auto_quantize within examples/llm_ptq/hf_ptq.py, enabling the int8_smoothquant configuration to be selected through the auto-quantize flow. No other logic, control flow, or API signatures changed.

Changes

Cohort / File(s) Summary
Quantization option update
examples/llm_ptq/hf_ptq.py
Include "int8_sq" in qformat_list within auto_quantize to allow selecting INT8_SMOOTHQUANT_CFG via the auto-quantize path.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

Poem

A nibble of bits, a hop to queue,
I toggled a switch to let INT8 through.
SmoothQuant whispers, “I’m in the mix!”
The carrots compile, no extra tricks.
With one small string, we bound and flew—
Byte-sized dreams in int8 hue. 🥕🐇

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title "Add int8_sq back to auto_quant support list" is a short, single sentence that directly summarizes the primary change—restoring int8_sq to the auto_quant support list—so it accurately reflects the changeset and is clear for a teammate scanning history.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch cjluo-nv-patch-1

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@cjluo-nv cjluo-nv requested a review from realAsma September 19, 2025 05:42
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
examples/llm_ptq/hf_ptq.py (2)

92-106: int8_sq re‑added to auto-quant support: LGTM; tighten wording and minor var rename

Change looks good. Two tiny cleanups to reduce confusion and shadowing:

  • Clarify the assertion message (this path may export TRT‑LLM checkpoints when int8_sq is used).
  • Avoid shadowing the parameter name in the comprehension.
-    assert all(
-        qformat
+    assert all(
+        fmt
         in [
             "fp8",
             "int8_sq",
             "int4_awq",
             "nvfp4",
             "nvfp4_awq",
             "w4a8_awq",
             "fp8_pb_wo",
             "w4a8_mxfp4_fp8",
             "nvfp4_mlp_only",
         ]
-        for qformat in qformat_list
-    ), "One or more quantization formats provided are not supported for unified checkpoint export"
+        for fmt in qformat_list
+    ), "One or more quantization formats provided are not supported by the auto-quantize export path"

Is exclusion of "fp8_pc_pt" from this allow-list (while present in QUANT_CFG_CHOICES) intentional for auto-quant? If yes, consider a brief comment above to prevent future regressions.


120-121: Avoid shadowing the built-in name format in list comprehension

Minor readability nit: don’t shadow Python’s built-in format().

-        quantization_formats=[QUANT_CFG_CHOICES[format] for format in qformat_list],
+        quantization_formats=[QUANT_CFG_CHOICES[fmt] for fmt in qformat_list],
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4c36abe and e54ce4e.

📒 Files selected for processing (1)
  • examples/llm_ptq/hf_ptq.py (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: linux
  • GitHub Check: wait-checks / wait
  • GitHub Check: build-docs
  • GitHub Check: code-quality
🔇 Additional comments (1)
examples/llm_ptq/hf_ptq.py (1)

588-596: Export behavior consistent with PR description

Condition explicitly routes int8_sq to TensorRT‑LLM checkpoint export. Matches the PR statement about final checkpoint format. No action needed.

Copy link

codecov bot commented Sep 19, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.83%. Comparing base (4c36abe) to head (e54ce4e).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #345   +/-   ##
=======================================
  Coverage   73.83%   73.83%           
=======================================
  Files         172      172           
  Lines       17453    17453           
=======================================
  Hits        12887    12887           
  Misses       4566     4566           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@realAsma realAsma merged commit 5a3fd29 into main Sep 19, 2025
27 checks passed
@realAsma realAsma deleted the cjluo-nv-patch-1 branch September 19, 2025 13:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants