Reinstate int8_sq support for vlm_example. #333

cjluo-nv · 2025-09-17T22:51:40Z

What does this PR do?

Revert removal of int8_sq support for vlm_ptq example.

Overview: ?

Just that the int8_sq for vlm_ptq will still generate a TensorRT-LLM checkpoint and language only. User needs to use the TensorRT-LLM's TRT backend + separate imaging network building using TRT for inference. The torch runtime is not available.

Summary by CodeRabbit

New Features
- Added support for the int8_sq quantization format in the example quantization workflow.
- Introduced flags to configure inference tensor and pipeline parallelism during quantization.
- Improved error messaging to clearly list accepted quantization options.
Documentation
- Updated the changelog by removing the deprecation note for the int8_sq quantization format in version 0.37.

Signed-off-by: Chenjie Luo <[email protected]>

coderabbitai · 2025-09-17T22:51:47Z

Walkthrough

Removed a deprecation note for int8_sq in CHANGELOG. Updated the VLM PTQ Hugging Face example script to accept int8_sq, improved the unknown-arg error message, and added inference tensor/pipeline parallel flags when invoking the quantization script.

Changes

Cohort / File(s)	Change Summary
Changelog edit `CHANGELOG.rst`	Removed the 0.37 deprecation bullet for `int8_sq`; no other content changes.
VLM PTQ script update `examples/vlm_ptq/scripts/huggingface_example.sh`	Added `int8_sq` to accepted QFORMAT values; updated error message to include it; passed `--inference_tensor_parallel=$TP` and `--inference_pipeline_parallel=$PP` to `hf_ptq.py`.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    actor U as User
    participant SH as huggingface_example.sh
    participant PTQ as hf_ptq.py
    participant BE as Inference Backend

    U->>SH: Run with QFORMAT and TP/PP
    alt QFORMAT recognized (e.g., int8, int8_sq, w4a16, ...)
        note right of SH: Includes new int8_sq option
        SH->>PTQ: Invoke with --qformat QFORMAT<br/>--inference_tensor_parallel=$TP<br/>--inference_pipeline_parallel=$PP
        PTQ->>BE: Configure model + quantization
        BE-->>PTQ: Ready
        PTQ-->>SH: Quantization artifacts
        SH-->>U: Success
    else Unknown QFORMAT
        SH-->>U: Error listing accepted values (incl. int8_sq)
    end

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10–15 minutes

Poem

A hop, a bop, I tweak the log—
int8_sq returns, no epilogue.
The script now threads in tidy pairs,
TP, PP—parallel hares.
With flags aligned and formats true,
I thump the ground: “We’re good to queue!” 🐇✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title "Reinstate int8_sq support for vlm_example." is a concise, single sentence that directly describes the primary change—restoring int8_sq support for the VLM example. It aligns with the PR objectives and the file diffs (changes to examples/vlm_ptq and the CHANGELOG), so a reviewer scanning history can quickly understand the main intent.
Docstring Coverage	✅ Passed	No functions found in the changes. Docstring coverage check skipped.

✨ Finishing touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch chenjiel/reinstate_int8

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

examples/vlm_ptq/scripts/huggingface_example.sh (2)

92-100: Guard TP/PP flags to avoid passing empty values.

If TP/PP are unset, argparse may treat “= ” as invalid. Default to 1 or append only if set.

Apply one of these:

Option A (defaults):

-            --inference_tensor_parallel=$TP \
-            --inference_pipeline_parallel=$PP \
+            --inference_tensor_parallel=${TP:-1} \
+            --inference_pipeline_parallel=${PP:-1} \

Option B (conditional append earlier, keeping this block unchanged):

 PTQ_ARGS=""
+if [ -n "$TP" ]; then PTQ_ARGS+=" --inference_tensor_parallel $TP "; fi
+if [ -n "$PP" ]; then PTQ_ARGS+=" --inference_pipeline_parallel $PP "; fi

106-118: nvfp4 GPU check is unreachable due to the earlier non‑fp8 early‑exit.

The block at Lines 111–118 never executes when QFORMAT=nvfp4. Fold it into the non‑fp8 branch so users still get the Blackwell warning.

Apply:

-if [[ "$QFORMAT" != "fp8" ]]; then
-    echo "For quant format $QFORMAT, please refer to the TensorRT-LLM documentation for deployment. Checkpoint saved to $SAVE_PATH."
-    exit 0
-fi
-
-if [[ "$QFORMAT" == *"nvfp4"* ]] || [[ "$KV_CACHE_QUANT" == *"nvfp4"* ]]; then
-    cuda_major=$(nvidia-smi --query-gpu=compute_cap --format=csv,noheader -i 0 | cut -d. -f1)
-
-    if [ "$cuda_major" -lt 10 ]; then
-        echo "Please deploy the NVFP4 checkpoint on a Blackwell GPU. Checkpoint export_path: $SAVE_PATH"
-        exit 0
-    fi
-fi
+if [[ "$QFORMAT" != "fp8" ]]; then
+    if [[ "$QFORMAT" == *"nvfp4"* ]] || [[ "$KV_CACHE_QUANT" == *"nvfp4"* ]]; then
+        cuda_major=$(nvidia-smi --query-gpu=compute_cap --format=csv,noheader -i 0 | cut -d. -f1)
+        if [ "$cuda_major" -lt 10 ]; then
+            echo "Please deploy the NVFP4 checkpoint on a Blackwell GPU. Checkpoint export_path: $SAVE_PATH"
+        fi
+    fi
+    echo "For quant format $QFORMAT, please refer to the TensorRT-LLM documentation for deployment. Checkpoint saved to $SAVE_PATH."
+    exit 0
+fi

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 146d1d9 and 4bed5de.

📒 Files selected for processing (2)

CHANGELOG.rst (0 hunks)
examples/vlm_ptq/scripts/huggingface_example.sh (2 hunks)

💤 Files with no reviewable changes (1)

CHANGELOG.rst

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: linux
GitHub Check: code-quality
GitHub Check: build-docs

coderabbitai · 2025-09-17T22:55:40Z

examples/vlm_ptq/scripts/huggingface_example.sh

 case $QFORMAT in
-    fp8|int4_awq|w4a8_awq|nvfp4)
+    fp8|int8_sq|int4_awq|w4a8_awq|nvfp4)
        ;;
    *)
-        echo "Unknown quant argument: Expected one of: [fp8, int4_awq, w4a8_awq, nvfp4]" >&2
+        echo "Unknown quant argument: Expected one of: [fp8, int8_sq, int4_awq, w4a8_awq, nvfp4]" >&2
        exit 1


⚠️ Potential issue

🧩 Analysis chain

Re‑enablement of int8_sq looks good.

Case arm and error message now include int8_sq; matches the PR goal.

Run to confirm downstream accepts this qformat:

🏁 Script executed:

#!/bin/bash # Verify hf_ptq.py accepts int8_sq for --qformat and exposes the new inference flags fd -t f hf_ptq.py rg -nP -C2 -- 'add_argument\((["'\''])--qformat\1.*choices=.*int8_sq' $(fd -t f hf_ptq.py) rg -nP -C2 -- 'add_argument\((["'\''])--inference_tensor_parallel\1' $(fd -t f hf_ptq.py) rg -nP -C2 -- 'add_argument\((["'\''])--inference_pipeline_parallel\1' $(fd -t f hf_ptq.py) # Check parser provides TP/PP defaults fd -t f parser.sh rg -n -C2 '\b(TP|PP)=' $(fd -t f parser.sh)

Length of output: 1012

Add int8_sq to --qformat choices in examples/llm_ptq/hf_ptq.py — hf_ptq.py's argparse for --qformat does not include int8_sq (no match found); --inference_tensor_parallel / --inference_pipeline_parallel are present and examples/llm_ptq/scripts/parser.sh sets TP=1 and PP=1.

🤖 Prompt for AI Agents

In examples/vlm_ptq/scripts/huggingface_example.sh around lines 37-42 the script accepts "int8_sq" in the case switch but examples/llm_ptq/hf_ptq.py's argparse for --qformat does not include "int8_sq", causing a mismatch; update the argparse definition in examples/llm_ptq/hf_ptq.py to include "int8_sq" among the allowed choices for --qformat (keeping other existing options intact), and ensure help text reflects this addition so the CLI and example script are consistent with TP/PP handling from parser.sh.

codecov · 2025-09-17T23:04:18Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.82%. Comparing base (146d1d9) to head (4bed5de).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #333      +/-   ##
==========================================
- Coverage   73.82%   73.82%   -0.01%     
==========================================
  Files         172      172              
  Lines       17438    17438              
==========================================
- Hits        12874    12873       -1     
- Misses       4564     4565       +1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: Chenjie Luo <[email protected]>

Signed-off-by: Chenjie Luo <[email protected]> Signed-off-by: Ye Yu <[email protected]>

Reinstate int8_sq support for vlm_example.

4bed5de

Signed-off-by: Chenjie Luo <[email protected]>

cjluo-nv requested a review from a team as a code owner September 17, 2025 22:51

cjluo-nv requested review from kevalmorabia97 and meenchen and removed request for meenchen September 17, 2025 22:51

cjluo-nv requested a review from Edwardf0t1 September 17, 2025 22:51

coderabbitai bot reviewed Sep 17, 2025

View reviewed changes

kevalmorabia97 approved these changes Sep 18, 2025

View reviewed changes

Edwardf0t1 approved these changes Sep 18, 2025

View reviewed changes

kevalmorabia97 merged commit 912f3dc into main Sep 18, 2025
22 checks passed

kevalmorabia97 deleted the chenjiel/reinstate_int8 branch September 18, 2025 05:13

kevalmorabia97 pushed a commit that referenced this pull request Sep 18, 2025

Reinstate int8_sq support for vlm_example. (#333)

1b55630

Signed-off-by: Chenjie Luo <[email protected]>

yeyu-nvidia pushed a commit that referenced this pull request Sep 18, 2025

Reinstate int8_sq support for vlm_example. (#333)

0f7b7ea

Signed-off-by: Chenjie Luo <[email protected]> Signed-off-by: Ye Yu <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reinstate int8_sq support for vlm_example. #333

Reinstate int8_sq support for vlm_example. #333

Uh oh!

cjluo-nv commented Sep 17, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Sep 17, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Sep 17, 2025

Uh oh!

codecov bot commented Sep 17, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Reinstate int8_sq support for vlm_example. #333

Reinstate int8_sq support for vlm_example. #333

Uh oh!

Conversation

cjluo-nv commented Sep 17, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Sep 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 17, 2025

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Sep 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

cjluo-nv commented Sep 17, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 17, 2025 •

edited

Loading

codecov bot commented Sep 17, 2025 •

edited

Loading