[NVBug: 5525758] Update VLM-PTQ readme #339

cjluo-nv · 2025-09-18T16:24:46Z

What does this PR do?

documentation

Overview: ?

Update the VLM PTQ example doc to reflect the example script change and new model support

Summary by CodeRabbit

New Features
- Added support for exporting quantized Hugging Face checkpoints for Phi-4-multimodal and Qwen2.5-VL in the VLM PTQ workflow.
Documentation
- Restructured Supported Models table with consolidated labels and updated support status.
- Clarified that only TensorRT-LLM checkpoint export is supported and added link to multimodal support docs.
- Simplified usage to a single generic Hugging Face script example.
- Added guidance on pre-quantized, ready-to-deploy checkpoints and deployment targets.

Signed-off-by: Chenjie Luo <[email protected]>

coderabbitai · 2025-09-18T16:24:55Z

Walkthrough

Adds release notes for new quantized HF checkpoint export support and restructures the examples/vlm_ptq README: updates model support table/labels, revises footnotes and links, consolidates Hugging Face script usage into a single generic invocation, and adds information about pre-quantized checkpoints and deployment targets.

Changes

Cohort / File(s)	Summary of changes
Release notes update `CHANGELOG.rst`	Documents support for Phi-4-multimodal and Qwen2.5-VL quantized Hugging Face checkpoint export in examples/vlm_ptq under version 0.37.
VLM PTQ documentation overhaul `examples/vlm_ptq/README.md`	Reworks Supported Models table (labels/columns/status), updates footnotes and adds link to TensorRT-LLM multimodal docs, replaces per-model HF commands with a single generic script invocation, and adds details on pre-quantized checkpoints and deployment targets.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

I thump my paw—new notes to read,
Tables tidy, trimmed of weed.
One script to run, less hop and fuss,
Checkpoints packed—deploy with us!
In carrot-orange docs I dwell,
Bun-approved, the changes tell. 🥕️🐇

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title "[NVBug: 5525758] Update VLM-PTQ readme" succinctly and accurately summarizes the PR's primary change—updating the VLM PTQ example documentation and documenting new model support—so it is specific, relevant, and clear for a teammate scanning history.
Docstring Coverage	✅ Passed	No functions found in the changes. Docstring coverage check skipped.

✨ Finishing touches

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch chenjiel/update_vlm_readme

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (5)

CHANGELOG.rst (1)
18-18: Call out VLM scope and sync with README.

Consider clarifying that this adds quantized HF checkpoint export for VLM variants (vision module typically excluded from quant), and ensure the Supported Models table in examples/vlm_ptq/README.md reflects the exact same models/labels.

Apply:
- Support Phi-4-multimodal and Qwen2.5-VL quantized HF checkpoint export in ``examples/vlm_ptq``.
+ Support Phi-4-multimodal and Qwen2.5‑VL quantized Hugging Face checkpoint export in ``examples/vlm_ptq`` (VLM variants; vision encoders typically excluded from quant).
examples/vlm_ptq/README.md (4)
39-46: Tighten model labels; avoid ambiguity.

“Qwen2, 2.5‑VL” mixes an LLM family with a VLM; prefer “Qwen2‑VL, Qwen2.5‑VL”.

If this row includes both instruct/base variants, say so or keep generic.

“Gemma3” likely needs “Gemma3‑vision” if this is the VLM. Otherwise, readers may assume LLM coverage.

Apply:
-| Qwen2, 2.5-VL | ✅ | ✅ | ✅ | ✅ | ✅ |
+| Qwen2-VL, Qwen2.5-VL | ✅ | ✅ | ✅ | ✅ | ✅ |
-| Gemma3 | ✅ | - | - | - | - |
+| Gemma3-vision | ✅ | - | - | - | - |
47-50: Modernize backend/version wording in footnotes.

Prefer “PyTorch backend” phrasing used by TRT‑LLM docs.

“TRT‑LLM v0.17 or later” is out of sync with 1.x versions referenced elsewhere (e.g., 1.0.0rc6, 1.1.0rc2). Recommend “1.0+”.

Apply:
-> *1.Only TensorRT-LLM checkpoint export is supported. Not compatible with the TensorRT-LLM torch backend* \
+> *1.Only TensorRT-LLM checkpoint export is supported; not compatible with the TensorRT-LLM PyTorch backend.* \
-> *3.A selective set of the popular models are internally tested. The actual model support list may be longer. NVFP4 inference requires Blackwell GPUs and TensorRT-LLM v0.17 or later.*
+> *3.A selective set of popular models are internally tested; the actual list may be longer. NVFP4 inference requires NVIDIA Blackwell GPUs and TensorRT-LLM 1.0+.*
Please confirm the minimum TRT‑LLM version needed for NVFP4 on Blackwell (if a specific 1.0.x/1.1.x is required).

63-65: Provide a concrete, copy‑pasteable example.

Add one real invocation to reduce friction, keeping the generic form as reference.

Apply:
 scripts/huggingface_example.sh --model <Hugging Face model card or checkpoint> --quant [fp8|nvfp4|int8_sq|int4_awq|w4a8_awq]
+#
+# Example:
+# scripts/huggingface_example.sh --model Qwen/Qwen2.5-VL-7B-Instruct --quant fp8
Confirm the script defaults (task, GPU count, export steps) so the example succeeds as‑is, or add required flags here.

69-71: Qualify framework deployability by precision.

vLLM/SGLang do not support every quant format uniformly (e.g., NVFP4). Add a qualifier to prevent over‑promising.

Apply:
-- Deployable on [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM), [vLLM](https://github.com/vllm-project/vllm) and [SGLang](https://github.com/sgl-project/sglang)
+- Deployable on [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM); selected precisions/formats are supported by [vLLM](https://github.com/vllm-project/vllm) and [SGLang](https://github.com/sgl-project/sglang) — check each framework’s precision support matrix.
If you have specific mappings (e.g., FP8/INT8 supported in vLLM/SGLang; NVFP4 via TensorRT‑LLM), consider adding a small table.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0d24c1b and 31d59a0.

📒 Files selected for processing (2)

CHANGELOG.rst (1 hunks)
examples/vlm_ptq/README.md (2 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: linux
GitHub Check: code-quality
GitHub Check: build-docs

🔇 Additional comments (1)

CHANGELOG.rst (1)

4-5: Replace placeholder date before release.

Update “2025-09-xx” to the actual release date before tagging.

coderabbitai · 2025-09-18T16:30:32Z

examples/vlm_ptq/README.md

+> *For detailed TensorRT-LLM torch backend multimodal support, please refer to [this doc](https://github.com/NVIDIA/TensorRT-LLM/blob/main/docs/source/models/supported-models.md#multimodal-feature-support-matrix-pytorch-backend)*



🛠️ Refactor suggestion

🧩 Analysis chain

Verify external link and naming consistency.

Confirm the anchor “multimodal‑feature‑support‑matrix‑pytorch‑backend” still exists and that “PyTorch backend” casing matches upstream docs.

🌐 Web query:

Does the TensorRT-LLM repo contain the "Multimodal feature support matrix — PyTorch backend" section at docs/source/models/supported-models.md with that anchor, and what exact section title/casing do they use?

💡 Result:

Short answer: No — there is no section with the exact anchor/title "Multimodal feature support matrix — PyTorch backend" at docs/source/models/supported-models.md. The repository/site uses the heading "Models (PyTorch Backend)". [1][2]

Sources:

TensorRT-LLM Support Matrix (shows "Models (PyTorch Backend)"). [1]

GitHub copy of support-matrix.md in the repo (shows "Support Matrix" / "Models" headings). [2]

Update README link and 'PyTorch Backend' casing to match upstream docs. The anchor "multimodal‑feature‑support‑matrix‑pytorch‑backend" does not exist; upstream uses the heading "Models (PyTorch Backend)". Change examples/vlm_ptq/README.md (lines 51–52) to link to the correct section/anchor and match the "PyTorch Backend" casing.

🤖 Prompt for AI Agents

In examples/vlm_ptq/README.md around lines 51–52, the README links to a non-existent anchor and uses incorrect casing "torch backend"; update the link anchor to match the upstream heading "Models (PyTorch Backend)" (use .../supported-models.md#models-pytorch-backend) and change the displayed text to "PyTorch Backend" (and ensure the surrounding phrase matches upstream casing).

meenchen

Change LGTM. qq, have you tested Phi-4-multimodal and Qwen2.5-VL HF checkpoint export?

codecov · 2025-09-18T16:38:00Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.82%. Comparing base (bbb2304) to head (31d59a0).
⚠️ Report is 3 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #339   +/-   ##
=======================================
  Coverage   73.82%   73.82%           
=======================================
  Files         172      172           
  Lines       17438    17438           
=======================================
  Hits        12874    12874           
  Misses       4564     4564

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: Chenjie Luo <[email protected]> Signed-off-by: Ye Yu <[email protected]>

Update VLM-PTQ readme

31d59a0

Signed-off-by: Chenjie Luo <[email protected]>

cjluo-nv requested a review from a team as a code owner September 18, 2025 16:24

cjluo-nv requested a review from ChenhanYu September 18, 2025 16:24

coderabbitai bot reviewed Sep 18, 2025

View reviewed changes

meenchen approved these changes Sep 18, 2025

View reviewed changes

cjluo-nv enabled auto-merge (squash) September 18, 2025 16:48

cjluo-nv merged commit b7ed8cd into main Sep 18, 2025
22 checks passed

cjluo-nv deleted the chenjiel/update_vlm_readme branch September 18, 2025 16:51

yeyu-nvidia pushed a commit that referenced this pull request Sep 18, 2025

[NVBug: 5525758] Update VLM-PTQ readme (#339)

f7425fc

Signed-off-by: Chenjie Luo <[email protected]> Signed-off-by: Ye Yu <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[NVBug: 5525758] Update VLM-PTQ readme #339

[NVBug: 5525758] Update VLM-PTQ readme #339

Uh oh!

cjluo-nv commented Sep 18, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Sep 18, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Sep 18, 2025

Uh oh!

meenchen left a comment

Uh oh!

codecov bot commented Sep 18, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

		> For detailed TensorRT-LLM torch backend multimodal support, please refer to [this doc](https://github.com/NVIDIA/TensorRT-LLM/blob/main/docs/source/models/supported-models.md#multimodal-feature-support-matrix-pytorch-backend)

[NVBug: 5525758] Update VLM-PTQ readme #339

[NVBug: 5525758] Update VLM-PTQ readme #339

Uh oh!

Conversation

cjluo-nv commented Sep 18, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Sep 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 18, 2025

Choose a reason for hiding this comment

Uh oh!

meenchen left a comment

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Sep 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

cjluo-nv commented Sep 18, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 18, 2025 •

edited

Loading

codecov bot commented Sep 18, 2025 •

edited

Loading