-
Notifications
You must be signed in to change notification settings - Fork 161
[NVBug: 5525758] Update VLM-PTQ readme #339
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Chenjie Luo <[email protected]>
WalkthroughAdds release notes for new quantized HF checkpoint export support and restructures the examples/vlm_ptq README: updates model support table/labels, revises footnotes and links, consolidates Hugging Face script usage into a single generic invocation, and adds information about pre-quantized checkpoints and deployment targets. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches🧪 Generate unit tests
Tip 👮 Agentic pre-merge checks are now available in preview!Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.
Please see the documentation for more information. Example: reviews:
pre_merge_checks:
custom_checks:
- name: "Undocumented Breaking Changes"
mode: "warning"
instructions: |
Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal). Please share your feedback with us on this Discord post. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (5)
CHANGELOG.rst (1)
18-18
: Call out VLM scope and sync with README.Consider clarifying that this adds quantized HF checkpoint export for VLM variants (vision module typically excluded from quant), and ensure the Supported Models table in examples/vlm_ptq/README.md reflects the exact same models/labels.
Apply:
- Support Phi-4-multimodal and Qwen2.5-VL quantized HF checkpoint export in ``examples/vlm_ptq``. + Support Phi-4-multimodal and Qwen2.5‑VL quantized Hugging Face checkpoint export in ``examples/vlm_ptq`` (VLM variants; vision encoders typically excluded from quant).examples/vlm_ptq/README.md (4)
39-46
: Tighten model labels; avoid ambiguity.
- “Qwen2, 2.5‑VL” mixes an LLM family with a VLM; prefer “Qwen2‑VL, Qwen2.5‑VL”.
- If this row includes both instruct/base variants, say so or keep generic.
- “Gemma3” likely needs “Gemma3‑vision” if this is the VLM. Otherwise, readers may assume LLM coverage.
Apply:
-| Qwen2, 2.5-VL | ✅ | ✅ | ✅ | ✅ | ✅ | +| Qwen2-VL, Qwen2.5-VL | ✅ | ✅ | ✅ | ✅ | ✅ | -| Gemma3 | ✅ | - | - | - | - | +| Gemma3-vision | ✅ | - | - | - | - |
47-50
: Modernize backend/version wording in footnotes.
- Prefer “PyTorch backend” phrasing used by TRT‑LLM docs.
- “TRT‑LLM v0.17 or later” is out of sync with 1.x versions referenced elsewhere (e.g., 1.0.0rc6, 1.1.0rc2). Recommend “1.0+”.
Apply:
-> *<sup>1.</sup>Only TensorRT-LLM checkpoint export is supported. Not compatible with the TensorRT-LLM torch backend* \ +> *<sup>1.</sup>Only TensorRT-LLM checkpoint export is supported; not compatible with the TensorRT-LLM PyTorch backend.* \ -> *<sup>3.</sup>A selective set of the popular models are internally tested. The actual model support list may be longer. NVFP4 inference requires Blackwell GPUs and TensorRT-LLM v0.17 or later.* +> *<sup>3.</sup>A selective set of popular models are internally tested; the actual list may be longer. NVFP4 inference requires NVIDIA Blackwell GPUs and TensorRT-LLM 1.0+.*Please confirm the minimum TRT‑LLM version needed for NVFP4 on Blackwell (if a specific 1.0.x/1.1.x is required).
63-65
: Provide a concrete, copy‑pasteable example.Add one real invocation to reduce friction, keeping the generic form as reference.
Apply:
scripts/huggingface_example.sh --model <Hugging Face model card or checkpoint> --quant [fp8|nvfp4|int8_sq|int4_awq|w4a8_awq] +# +# Example: +# scripts/huggingface_example.sh --model Qwen/Qwen2.5-VL-7B-Instruct --quant fp8Confirm the script defaults (task, GPU count, export steps) so the example succeeds as‑is, or add required flags here.
69-71
: Qualify framework deployability by precision.vLLM/SGLang do not support every quant format uniformly (e.g., NVFP4). Add a qualifier to prevent over‑promising.
Apply:
-- Deployable on [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM), [vLLM](https://github.com/vllm-project/vllm) and [SGLang](https://github.com/sgl-project/sglang) +- Deployable on [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM); selected precisions/formats are supported by [vLLM](https://github.com/vllm-project/vllm) and [SGLang](https://github.com/sgl-project/sglang) — check each framework’s precision support matrix.If you have specific mappings (e.g., FP8/INT8 supported in vLLM/SGLang; NVFP4 via TensorRT‑LLM), consider adding a small table.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
CHANGELOG.rst
(1 hunks)examples/vlm_ptq/README.md
(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: linux
- GitHub Check: code-quality
- GitHub Check: build-docs
🔇 Additional comments (1)
CHANGELOG.rst (1)
4-5
: Replace placeholder date before release.Update “2025-09-xx” to the actual release date before tagging.
> *For detailed TensorRT-LLM torch backend multimodal support, please refer to [this doc](https://github.com/NVIDIA/TensorRT-LLM/blob/main/docs/source/models/supported-models.md#multimodal-feature-support-matrix-pytorch-backend)* | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
🧩 Analysis chain
Verify external link and naming consistency.
Confirm the anchor “multimodal‑feature‑support‑matrix‑pytorch‑backend” still exists and that “PyTorch backend” casing matches upstream docs.
🌐 Web query:
Does the TensorRT-LLM repo contain the "Multimodal feature support matrix — PyTorch backend" section at docs/source/models/supported-models.md with that anchor, and what exact section title/casing do they use?
💡 Result:
Short answer: No — there is no section with the exact anchor/title "Multimodal feature support matrix — PyTorch backend" at docs/source/models/supported-models.md. The repository/site uses the heading "Models (PyTorch Backend)". [1][2]
Sources:
- TensorRT-LLM Support Matrix (shows "Models (PyTorch Backend)"). [1]
- GitHub copy of support-matrix.md in the repo (shows "Support Matrix" / "Models" headings). [2]
Update README link and 'PyTorch Backend' casing to match upstream docs. The anchor "multimodal‑feature‑support‑matrix‑pytorch‑backend" does not exist; upstream uses the heading "Models (PyTorch Backend)". Change examples/vlm_ptq/README.md (lines 51–52) to link to the correct section/anchor and match the "PyTorch Backend" casing.
🤖 Prompt for AI Agents
In examples/vlm_ptq/README.md around lines 51–52, the README links to a
non-existent anchor and uses incorrect casing "torch backend"; update the link
anchor to match the upstream heading "Models (PyTorch Backend)" (use
.../supported-models.md#models-pytorch-backend) and change the displayed text to
"PyTorch Backend" (and ensure the surrounding phrase matches upstream casing).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change LGTM. qq, have you tested Phi-4-multimodal and Qwen2.5-VL HF checkpoint export?
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #339 +/- ##
=======================================
Coverage 73.82% 73.82%
=======================================
Files 172 172
Lines 17438 17438
=======================================
Hits 12874 12874
Misses 4564 4564 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: Chenjie Luo <[email protected]> Signed-off-by: Ye Yu <[email protected]>
What does this PR do?
documentation
Overview: ?
Update the VLM PTQ example doc to reflect the example script change and new model support
Summary by CodeRabbit