Fix Qwen2.5-VL FP8 #272

DylanChen-NV · 2025-08-28T04:42:30Z

What does this PR do?

Type of change: Bug fix

Overview:

Fixed the issue that Qwen2.5-VL's eos token was mistakenly modified.

Usage

bash vlm_ptq/scripts/huggingface_example.sh --type qwen --model $HF_PATH --quant fp8 --tp 1 --export_fmt hf

Testing

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed.
Is this change backward compatible?: Yes/No
Did you write any new necessary tests?: Yes/No
Did you add or update any necessary documentation?: Yes/No
Did you update Changelog?: Yes/No

Additional Information

Summary by CodeRabbit

New Features
- Enhanced VLM example script to support Qwen using visual-length parameters, automatically computing max multimodal length based on batch size and visual feature dimension.
Bug Fixes
- Adjusted tokenizer behavior to avoid forcing pad/eos tokens for Qwen models when the checkpoint path indicates a VL variant, preventing incorrect token settings.
Chores
- Updated configuration logic to align Qwen handling with other VLMs, removing the previous quantization-specific path and standardizing argument computation.

copy-pr-bot · 2025-08-28T04:42:35Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

cjluo-nv · 2025-09-05T05:59:45Z

Could you update the model support table too from the original MR?

Signed-off-by: Dylan Chen <[email protected]>

coderabbitai · 2025-09-17T08:07:21Z

Walkthrough

Updates tokenizer setup to restrict Qwen-specific pad/eos token assignment when checkpoint paths contain “vl”, and revises the VLM Hugging Face example script to handle Qwen via visual-length parameters instead of PTQ arguments.

Changes

Cohort / File(s)	Summary
Tokenizer gating for Qwen (LLM PTQ) `examples/llm_ptq/example_utils.py`	Adds a condition to skip Qwen pad/eos token override when `ckpt_path` contains “vl” (case-insensitive). Previous unconditional Qwen-class check now gated by non-“vl” path. Other tokenizer logic unchanged.
Qwen handling in VLM HF script `examples/vlm_ptq/scripts/huggingface_example.sh`	Adds a `qwen` branch that sets `VISUAL_FEATURE=1280` and computes `--max_multimodal_len` via `BUILD_MAX_BATCH_SIZE * VISUAL_FEATURE` in `VLM_ARGS`. Removes prior Qwen-specific `PTQ_ARGS` (kv_cache_qformat none). Aligns Qwen with vila/phi/llava visual-length path.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant U as Caller
    participant GU as get_tokenizer
    participant HF as HF Tokenizer

    U->>GU: get_tokenizer(ckpt_path)
    GU->>HF: load tokenizer
    alt Tokenizer is Qwen AND "vl" not in ckpt_path (case-insensitive)
        GU->>HF: set pad_token/eos_token to id=151643
    else Other models or "vl" in path
        GU->>HF: use existing logic (e.g., Vila, fallback padding)
    end
    GU-->>U: tokenizer

sequenceDiagram
    autonumber
    participant Sh as huggingface_example.sh
    participant Env as ENV/Args

    Sh->>Env: read MODEL_TYPE
    alt MODEL_TYPE in {vila, phi, llava}
        Sh->>Env: set VISUAL_FEATURE & VLM_ARGS (visual-length path)
    else MODEL_TYPE == qwen
        Note over Sh,Env: New behavior
        Sh->>Env: set VISUAL_FEATURE=1280
        Sh->>Env: VLM_ARGS uses BUILD_MAX_BATCH_SIZE * VISUAL_FEATURE
        Sh-x Env: no PTQ_ARGS injection for qwen
    else other models
        Sh->>Env: existing handling
    end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

I twitch my ears at tokens’ tune,
Qwen now knows when “vl” is in the room.
Scripts hop paths with visual flair,
PTQ carrots left elsewhere.
Batch times pixels—lengths align,
A tidy trail in quantized pine.
Thump! Release feels crisp and fine.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title "Fix Qwen2.5-VL FP8" is concise and accurately reflects the primary intent of the changeset—a bug fix targeting Qwen2.5-VL FP8 behavior—and aligns with the PR objectives (fixing EOS/token handling and adding qwen model_type support). It is specific enough for a teammate scanning history and contains no extraneous noise.

✨ Finishing touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

examples/vlm_ptq/scripts/huggingface_example.sh (1)
148-151: Hardcoded VISUAL_FEATURE=1280 may drift; derive from HF preprocessor when available.

Optional: compute max_multimodal_len from the model’s preprocessor_config (with a fallback to 1280) to track upstream changes.

Example:
# Replace the constant with a best-effort query (fallback to 1280)
VISUAL_FEATURE=$(python - <<'PY'
import json, os, sys
p=os.path.join(os.environ.get("MODEL_PATH",""),"preprocessor_config.json")
try:
    d=json.load(open(p))
    print(d.get("image_seq_length") or d.get("max_image_tokens") or 1280)
except Exception:
    print(1280)
PY
)
VLM_ARGS=" --max_multimodal_len=$((BUILD_MAX_BATCH_SIZE * VISUAL_FEATURE)) "

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 682bf6d and 44cecce.

📒 Files selected for processing (2)

examples/llm_ptq/example_utils.py (1 hunks)
examples/vlm_ptq/scripts/huggingface_example.sh (1 hunks)

🧰 Additional context used

🪛 Shellcheck (0.10.0)

examples/vlm_ptq/scripts/huggingface_example.sh

[warning] 148-148: This pattern always overrides a later one on line 156.

(SC2221)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: linux
GitHub Check: build-docs
GitHub Check: code-quality

🔇 Additional comments (1)

examples/llm_ptq/example_utils.py (1)

59-63: Detect Qwen‑VL via AutoConfig; fall back to path only if config loading fails.

Path substring checks are brittle — use AutoConfig.model_type/architectures and only use the ckpt_path heuristic when config loading fails.

-    if "qwen" in type(tokenizer).__name__.lower() and "vl" not in ckpt_path.lower():
-        # qwen use token id 151643 as pad and eos tokens
-        tokenizer.pad_token = tokenizer.convert_ids_to_tokens(151643)
-        tokenizer.eos_token = tokenizer.convert_ids_to_tokens(151643)
+    qwen_tok = "qwen" in type(tokenizer).__name__.lower()
+    is_vl = False
+    try:
+        cfg = AutoConfig.from_pretrained(ckpt_path, trust_remote_code=trust_remote_code)
+        mt = (getattr(cfg, "model_type", "") or "").lower()
+        archs = [a.lower() for a in (getattr(cfg, "architectures", []) or [])]
+        is_vl = ("vl" in mt) or any("vl" in a for a in archs)
+    except Exception:
+        # Fall back to path heuristic only if config is unavailable
+        is_vl = "vl" in ckpt_path.lower()
+    if qwen_tok and not is_vl:
+        # Qwen text-only uses token id 151643 as both pad and eos
+        tokenizer.pad_token = tokenizer.convert_ids_to_tokens(151643)
+        tokenizer.eos_token = tokenizer.convert_ids_to_tokens(151643)

Sanity-check here failed: ModuleNotFoundError: No module named 'transformers'. Re-run the provided Python snippet in an environment/CI with transformers installed to verify one text-only Qwen (both pad/eos → id 151643) and one Qwen‑VL (VL preserves original eos).

coderabbitai · 2025-09-17T08:17:02Z

examples/vlm_ptq/scripts/huggingface_example.sh

+    "qwen")
+        VISUAL_FEATURE=1280
+        VLM_ARGS=" --max_multimodal_len=$((BUILD_MAX_BATCH_SIZE * VISUAL_FEATURE)) "
+        ;;


⚠️ Potential issue

🧩 Analysis chain

Duplicate 'qwen' case—later branch is unreachable and conflicts with new path.

The first qwen branch (lines 148–151) shadows the second (lines 156–158). ShellCheck SC2221 flags this; remove the later block.

Apply this diff:

@@ "mllama") PTQ_ARGS+=" --kv_cache_qformat none " VLM_ARGS=" --max_encoder_input_len=6404 --skip_run" ;; - "qwen") - PTQ_ARGS+=" --kv_cache_qformat none " - ;; esac

Run:

Also applies to: 156-158

🏁 Script executed:

#!/bin/bash shellcheck -s bash examples/vlm_ptq/scripts/huggingface_example.sh rg -nA1 -e '^\s*"qwen"\)' examples/vlm_ptq/scripts/huggingface_example.sh

Length of output: 13458

Remove or consolidate duplicate "qwen" case arms.

The "qwen") pattern appears multiple times in examples/vlm_ptq/scripts/huggingface_example.sh (line 148 and again at lines 156–158 and 224–225); the first occurrence (line 148) overrides the later ones, making them unreachable. Delete the later duplicate arms or merge their needed assignments (e.g., PTQ_ARGS addition and the TRT mapping) into the first "qwen" branch.

🧰 Tools

🪛 Shellcheck (0.10.0)

[warning] 148-148: This pattern always overrides a later one on line 156.

(SC2221)

🤖 Prompt for AI Agents

In examples/vlm_ptq/scripts/huggingface_example.sh around lines 148 to 151 (and duplicate arms at lines 156–158 and 224–225), there are multiple "qwen") case arms where the first occurrence overrides later ones making them unreachable; remove the duplicate "qwen") branches and consolidate their needed assignments into the first "qwen") block—merge any PTQ_ARGS additions and TRT mapping assignments from the later arms into the initial arm so all required variables are set in one place, then delete the subsequent duplicate case arms.

codecov · 2025-09-17T15:52:53Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.82%. Comparing base (3524732) to head (44cecce).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #272      +/-   ##
==========================================
- Coverage   73.85%   73.82%   -0.04%     
==========================================
  Files         172      172              
  Lines       17430    17438       +8     
==========================================
  Hits        12873    12873              
- Misses       4557     4565       +8

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

DylanChen-NV requested review from a team as code owners August 28, 2025 04:42

DylanChen-NV requested review from ChenhanYu and Edwardf0t1 August 28, 2025 04:42

DylanChen-NV force-pushed the qwen25vl_fp8 branch from 923c916 to 8091477 Compare August 28, 2025 04:50

kevalmorabia97 requested review from cjluo-nv and removed request for ChenhanYu and Edwardf0t1 August 29, 2025 12:05

DylanChen-NV closed this Aug 29, 2025

kevalmorabia97 reopened this Sep 5, 2025

cjluo-nv requested a review from i-riyad September 5, 2025 05:58

fix qwen 2.5 vl

44cecce

Signed-off-by: Dylan Chen <[email protected]>

DylanChen-NV force-pushed the qwen25vl_fp8 branch from 8091477 to 44cecce Compare September 17, 2025 08:07

coderabbitai bot reviewed Sep 17, 2025

View reviewed changes

kevalmorabia97 requested review from Edwardf0t1 and removed request for i-riyad September 17, 2025 12:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix Qwen2.5-VL FP8 #272

Fix Qwen2.5-VL FP8 #272

DylanChen-NV commented Aug 28, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

copy-pr-bot bot commented Aug 28, 2025

Uh oh!

cjluo-nv commented Sep 5, 2025

Uh oh!

coderabbitai bot commented Sep 17, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Sep 17, 2025

Uh oh!

codecov bot commented Sep 17, 2025

Uh oh!

Uh oh!

Fix Qwen2.5-VL FP8 #272

Are you sure you want to change the base?

Fix Qwen2.5-VL FP8 #272

Conversation

DylanChen-NV commented Aug 28, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Aug 28, 2025

Uh oh!

cjluo-nv commented Sep 5, 2025

Uh oh!

coderabbitai bot commented Sep 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 17, 2025

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Sep 17, 2025

Codecov Report

Uh oh!

Uh oh!

DylanChen-NV commented Aug 28, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 17, 2025 •

edited

Loading