-
Notifications
You must be signed in to change notification settings - Fork 162
Fix Qwen2.5-VL FP8 #272
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Fix Qwen2.5-VL FP8 #272
Conversation
923c916
to
8091477
Compare
Could you update the model support table too from the original MR? |
Signed-off-by: Dylan Chen <[email protected]>
8091477
to
44cecce
Compare
WalkthroughUpdates tokenizer setup to restrict Qwen-specific pad/eos token assignment when checkpoint paths contain “vl”, and revises the VLM Hugging Face example script to handle Qwen via visual-length parameters instead of PTQ arguments. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant U as Caller
participant GU as get_tokenizer
participant HF as HF Tokenizer
U->>GU: get_tokenizer(ckpt_path)
GU->>HF: load tokenizer
alt Tokenizer is Qwen AND "vl" not in ckpt_path (case-insensitive)
GU->>HF: set pad_token/eos_token to id=151643
else Other models or "vl" in path
GU->>HF: use existing logic (e.g., Vila, fallback padding)
end
GU-->>U: tokenizer
sequenceDiagram
autonumber
participant Sh as huggingface_example.sh
participant Env as ENV/Args
Sh->>Env: read MODEL_TYPE
alt MODEL_TYPE in {vila, phi, llava}
Sh->>Env: set VISUAL_FEATURE & VLM_ARGS (visual-length path)
else MODEL_TYPE == qwen
Note over Sh,Env: New behavior
Sh->>Env: set VISUAL_FEATURE=1280
Sh->>Env: VLM_ARGS uses BUILD_MAX_BATCH_SIZE * VISUAL_FEATURE
Sh-x Env: no PTQ_ARGS injection for qwen
else other models
Sh->>Env: existing handling
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests
Tip 👮 Agentic pre-merge checks are now available in preview!Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.
Please see the documentation for more information. Example: reviews:
pre_merge_checks:
custom_checks:
- name: "Undocumented Breaking Changes"
mode: "warning"
instructions: |
Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal). Please share your feedback with us on this Discord post. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
examples/vlm_ptq/scripts/huggingface_example.sh (1)
148-151
: Hardcoded VISUAL_FEATURE=1280 may drift; derive from HF preprocessor when available.Optional: compute max_multimodal_len from the model’s preprocessor_config (with a fallback to 1280) to track upstream changes.
Example:
# Replace the constant with a best-effort query (fallback to 1280) VISUAL_FEATURE=$(python - <<'PY' import json, os, sys p=os.path.join(os.environ.get("MODEL_PATH",""),"preprocessor_config.json") try: d=json.load(open(p)) print(d.get("image_seq_length") or d.get("max_image_tokens") or 1280) except Exception: print(1280) PY ) VLM_ARGS=" --max_multimodal_len=$((BUILD_MAX_BATCH_SIZE * VISUAL_FEATURE)) "
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
examples/llm_ptq/example_utils.py
(1 hunks)examples/vlm_ptq/scripts/huggingface_example.sh
(1 hunks)
🧰 Additional context used
🪛 Shellcheck (0.10.0)
examples/vlm_ptq/scripts/huggingface_example.sh
[warning] 148-148: This pattern always overrides a later one on line 156.
(SC2221)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: linux
- GitHub Check: build-docs
- GitHub Check: code-quality
🔇 Additional comments (1)
examples/llm_ptq/example_utils.py (1)
59-63
: Detect Qwen‑VL via AutoConfig; fall back to path only if config loading fails.Path substring checks are brittle — use AutoConfig.model_type/architectures and only use the ckpt_path heuristic when config loading fails.
- if "qwen" in type(tokenizer).__name__.lower() and "vl" not in ckpt_path.lower(): - # qwen use token id 151643 as pad and eos tokens - tokenizer.pad_token = tokenizer.convert_ids_to_tokens(151643) - tokenizer.eos_token = tokenizer.convert_ids_to_tokens(151643) + qwen_tok = "qwen" in type(tokenizer).__name__.lower() + is_vl = False + try: + cfg = AutoConfig.from_pretrained(ckpt_path, trust_remote_code=trust_remote_code) + mt = (getattr(cfg, "model_type", "") or "").lower() + archs = [a.lower() for a in (getattr(cfg, "architectures", []) or [])] + is_vl = ("vl" in mt) or any("vl" in a for a in archs) + except Exception: + # Fall back to path heuristic only if config is unavailable + is_vl = "vl" in ckpt_path.lower() + if qwen_tok and not is_vl: + # Qwen text-only uses token id 151643 as both pad and eos + tokenizer.pad_token = tokenizer.convert_ids_to_tokens(151643) + tokenizer.eos_token = tokenizer.convert_ids_to_tokens(151643)Sanity-check here failed: ModuleNotFoundError: No module named 'transformers'. Re-run the provided Python snippet in an environment/CI with transformers installed to verify one text-only Qwen (both pad/eos → id 151643) and one Qwen‑VL (VL preserves original eos).
"qwen") | ||
VISUAL_FEATURE=1280 | ||
VLM_ARGS=" --max_multimodal_len=$((BUILD_MAX_BATCH_SIZE * VISUAL_FEATURE)) " | ||
;; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Duplicate 'qwen' case—later branch is unreachable and conflicts with new path.
The first qwen branch (lines 148–151) shadows the second (lines 156–158). ShellCheck SC2221 flags this; remove the later block.
Apply this diff:
@@
"mllama")
PTQ_ARGS+=" --kv_cache_qformat none "
VLM_ARGS=" --max_encoder_input_len=6404 --skip_run"
;;
- "qwen")
- PTQ_ARGS+=" --kv_cache_qformat none "
- ;;
esac
Run:
Also applies to: 156-158
🏁 Script executed:
#!/bin/bash
shellcheck -s bash examples/vlm_ptq/scripts/huggingface_example.sh
rg -nA1 -e '^\s*"qwen"\)' examples/vlm_ptq/scripts/huggingface_example.sh
Length of output: 13458
Remove or consolidate duplicate "qwen" case arms.
The "qwen") pattern appears multiple times in examples/vlm_ptq/scripts/huggingface_example.sh (line 148 and again at lines 156–158 and 224–225); the first occurrence (line 148) overrides the later ones, making them unreachable. Delete the later duplicate arms or merge their needed assignments (e.g., PTQ_ARGS addition and the TRT mapping) into the first "qwen" branch.
🧰 Tools
🪛 Shellcheck (0.10.0)
[warning] 148-148: This pattern always overrides a later one on line 156.
(SC2221)
🤖 Prompt for AI Agents
In examples/vlm_ptq/scripts/huggingface_example.sh around lines 148 to 151 (and
duplicate arms at lines 156–158 and 224–225), there are multiple "qwen") case
arms where the first occurrence overrides later ones making them unreachable;
remove the duplicate "qwen") branches and consolidate their needed assignments
into the first "qwen") block—merge any PTQ_ARGS additions and TRT mapping
assignments from the later arms into the initial arm so all required variables
are set in one place, then delete the subsequent duplicate case arms.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #272 +/- ##
==========================================
- Coverage 73.85% 73.82% -0.04%
==========================================
Files 172 172
Lines 17430 17438 +8
==========================================
Hits 12873 12873
- Misses 4557 4565 +8 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
What does this PR do?
Type of change: Bug fix
Overview:
Usage
bash vlm_ptq/scripts/huggingface_example.sh --type qwen --model $HF_PATH --quant fp8 --tp 1 --export_fmt hf
Testing
Before your PR is "Ready for review"
Additional Information
Summary by CodeRabbit