-
Notifications
You must be signed in to change notification settings - Fork 169
Make vlm detection more robust in ptq workflow #286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughAdds a multimodal-model detection utility and updates the PTQ example to use it: when a model is detected as VLM, the script saves the original model config and attempts to save an AutoProcessor config (guarded); non-VLM export/quantization flow unchanged. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor U as User
participant S as hf_ptq.py
participant M as model_utils.is_multimodal_model
participant T as Transformers (AutoConfig/AutoProcessor)
participant FS as Filesystem
U->>S: Run PTQ export script
S->>S: Load model (full_model) and config
S->>M: is_multimodal_model(full_model)
M-->>S: True / False
alt is_vlm == true
S->>T: AutoConfig.from_pretrained(model_id)
T-->>S: AutoConfig
S->>FS: Save model config (save_pretrained)
S->>T: AutoProcessor.from_pretrained(model_id) (try)
alt processor load success
T-->>S: AutoProcessor
S->>FS: Save processor config (save_pretrained)
else processor load fails
T-->>S: Exception
S->>S: Log warning and continue
end
else is_vlm == false
Note over S: Proceed with existing non-VLM export/quantization flow
end
S-->>U: Complete export
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
examples/llm_ptq/hf_ptq.py (2)
572-581
: VLM detection is better; broaden heuristics slightly and avoid broad try/except.Some VLMs expose vision_tower rather than vision_config. You can avoid exceptions via safe getattr and include a couple more cheap signals (without making it brittle).
-# Check for VLMs by looking for vision_config in model config or language_model attribute -is_vlm = False -try: - is_vlm = hasattr(full_model.config, "vision_config") or hasattr( - full_model, "language_model" - ) -except Exception: - # Fallback to the original check if config access fails - is_vlm = hasattr(full_model, "language_model") +# Check for VLMs via a few robust signals +cfg = getattr(full_model, "config", None) +is_vlm = any( + [ + hasattr(full_model, "language_model"), + hasattr(full_model, "vision_tower"), + hasattr(cfg, "vision_config") if cfg is not None else False, + hasattr(cfg, "vision_tower") if cfg is not None else False, + ("vl" in (model_type or "")) or (model_type in {"mllama", "phi4mm"}), + ] +)
582-595
: Saving configs for VLMs: confirm intent for TensorRT-LLM exports; optionally gate by export_fmt and reuse existing processor.Mixing HF artifacts into a TensorRT-LLM export directory can be fine, but double-check it won’t confuse downstream tooling. Also, if a processor is already loaded (e.g., mllama/whisper), reuse it to avoid reloading.
- Confirm that placing config.json/preprocessor_config.json in a TRT-LLM export dir doesn’t interfere with trtllm tooling.
- If undesired, gate the save under args.export_fmt == "hf".
-if is_vlm: +if is_vlm: # Save original model config and the preprocessor config to the export path for VLMs. - - print(f"Saving original model and processor configs to {export_path}") - - AutoConfig.from_pretrained( - args.pyt_ckpt_path, trust_remote_code=args.trust_remote_code - ).save_pretrained(export_path) - - AutoProcessor.from_pretrained( - args.pyt_ckpt_path, trust_remote_code=args.trust_remote_code - ).save_pretrained(export_path) + if args.export_fmt == "hf": + print(f"Saving original model and processor configs to {export_path}") + # Lazy import to avoid hard dependency at module import time + from transformers import AutoConfig, AutoProcessor + AutoConfig.from_pretrained( + args.pyt_ckpt_path, trust_remote_code=args.trust_remote_code + ).save_pretrained(export_path) + # Prefer the already-loaded processor if available + proc = processor + if proc is None: + proc = AutoProcessor.from_pretrained( + args.pyt_ckpt_path, trust_remote_code=args.trust_remote_code + ) + proc.save_pretrained(export_path)Note that saving both an AutoProcessor and a tokenizer later may duplicate tokenizer artifacts; it’s harmless but a tiny bit noisy. If desired, skip tokenizer.save_pretrained when is_vlm and a processor was saved.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
examples/llm_ptq/hf_ptq.py
(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: build-docs
- GitHub Check: code-quality
🔇 Additional comments (1)
examples/llm_ptq/hf_ptq.py (1)
27-34
: AutoProcessor import safe under pinned transformers>=4.48
setup.py enforces “transformers>=4.48,<5.0” and AutoProcessor was introduced in v4.13.0, so the top‐level import won’t fail—lazy import not required. Ensure example docs/install scripts install transformers>=4.48.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
examples/llm_ptq/hf_ptq.py (1)
594-602
: Narrow the exception scope and prefer saving only the image processor when present.Aligns with unified export behavior and answers the earlier “why try/except” concern: some models don’t provide
AutoProcessor
. Catch only expected exceptions.- try: - print(f"Saving processor config to {export_path}") - AutoProcessor.from_pretrained( - args.pyt_ckpt_path, trust_remote_code=args.trust_remote_code - ).save_pretrained(export_path) - except Exception as e: - print(f"Warning: Could not save processor config: {e}") - print("This is normal for some VLM architectures that don't use AutoProcessor") + try: + processor = AutoProcessor.from_pretrained( + args.pyt_ckpt_path, trust_remote_code=args.trust_remote_code + ) + if hasattr(processor, "image_processor"): + print(f"Saving image processor config to {export_path}") + processor.image_processor.save_pretrained(export_path) + else: + print(f"Saving processor config to {export_path}") + processor.save_pretrained(export_path) + except (OSError, ValueError, ImportError) as e: + warnings.warn( + f"Could not save processor config: {e}. Some multimodal architectures don't provide AutoProcessor." + )
🧹 Nitpick comments (2)
examples/llm_ptq/hf_ptq.py (2)
572-583
: Tighten the VLM detection heuristic to avoid false positives and cover common patterns.Suggestion:
- Add a check for
hasattr(full_model, "vision_model")
(common in many VLM wrappers).- Drop
hasattr(config, "audio_processor")
— processors aren’t typically stored on config and may yield false positives.Apply:
- is_vlm = ( - hasattr(config, "vision_config") # Standard vision config (e.g., Qwen2.5-VL) - or hasattr(full_model, "language_model") # Language model attribute (e.g., LLaVA) - or getattr(config, "model_type", "") == "phi4mm" # Phi-4 multimodal - or hasattr(config, "vision_lora") # Vision LoRA configurations - or hasattr(config, "audio_processor") # Audio processing capabilities - or ( - hasattr(config, "embd_layer") and hasattr(config.embd_layer, "image_embd_layer") - ) # Image embedding layers - ) + is_vlm = ( + hasattr(config, "vision_config") # Standard vision config (e.g., Qwen2.5-VL) + or hasattr(full_model, "language_model") # Language model attribute (e.g., LLaVA) + or hasattr(full_model, "vision_model") # Explicit vision backbone on the model + or getattr(config, "model_type", "") == "phi4mm" # Phi-4 multimodal + or hasattr(config, "vision_lora") # Vision LoRA configurations + or ( + hasattr(config, "embd_layer") and hasattr(config.embd_layer, "image_embd_layer") + ) # Image embedding layers + )
589-591
: Prefer saving the in-memory config to avoid an extra network call and preserve local tweaks.
AutoConfig.from_pretrained
may re-download; savingfull_model.config
is faster and captures any in-process edits.- AutoConfig.from_pretrained( - args.pyt_ckpt_path, trust_remote_code=args.trust_remote_code - ).save_pretrained(export_path) + full_model.config.save_pretrained(export_path)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
examples/llm_ptq/hf_ptq.py
(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
examples/llm_ptq/hf_ptq.py (1)
modelopt/torch/export/unified_export_megatron.py (1)
save_pretrained
(298-462)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: build-docs
- GitHub Check: code-quality
🔇 Additional comments (2)
examples/llm_ptq/hf_ptq.py (2)
28-31
: Add AutoConfig/AutoProcessor imports — OK.These are required for saving configs/processors later. No issues.
577-577
: Phi-4 MM special-casing looks good.Explicitly checking
model_type == "phi4mm"
addresses the earlier concern that not all VLMs exposevision_config
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
modelopt/torch/export/model_utils.py (1)
74-111
: Simplifymodel_type
check and guard against missing configs
- Change signature to
def is_multimodal_model(model) -> bool
and retrieve config withconfig = getattr(model, "config", None)
; returnFalse
ifconfig
isNone
.- Normalize
model_type
withmt = str(getattr(config, "model_type", "")).lower()
and compare directly to"phi4mm"
(the canonical value) instead of a broader alias set.- Handle
embd_layer
being either an object or dict:embd_layer = getattr(config, "embd_layer", None) has_image_emb = ( hasattr(embd_layer, "image_embd_layer") or (isinstance(embd_layer, dict) and "image_embd_layer" in embd_layer) )- Ensure attribute-based heuristics verify that attributes aren’t just present but non-
None
(e.g.,getattr(config, "vision_config", None) is not None
).- Retain checks for
vision_tower
,vision_model
,multi_modal_projector
/mm_projector
,language_model
,vision_lora
, andaudio_processor
to cover other multimodal cases.examples/llm_ptq/hf_ptq.py (1)
573-593
: Make config/processor export resilient and align logging.Avoid hard failures when offline/missing processor and keep messages consistent with
warnings.warn
. Also rename local flag for clarity.Apply:
- # Check if the model is a multimodal/VLM model - is_vlm = is_multimodal_model(full_model) + # Check if the model is a multimodal (VLM or audio-text) model + is_multimodal = is_multimodal_model(full_model) - if is_vlm: + if is_multimodal: # Save original model config and the processor config to the export path for VLMs. - print(f"Saving original model config to {export_path}") - - AutoConfig.from_pretrained( - args.pyt_ckpt_path, trust_remote_code=args.trust_remote_code - ).save_pretrained(export_path) + try: + print(f"Saving original model config to {export_path}") + AutoConfig.from_pretrained( + args.pyt_ckpt_path, trust_remote_code=args.trust_remote_code + ).save_pretrained(export_path) + except (OSError, ValueError) as e: + warnings.warn(f"Could not save original model config: {e}") # Try to save processor config if available - try: - print(f"Saving processor config to {export_path}") - AutoProcessor.from_pretrained( - args.pyt_ckpt_path, trust_remote_code=args.trust_remote_code - ).save_pretrained(export_path) - except Exception as e: - print(f"Warning: Could not save processor config: {e}") - print("This is normal for some VLM architectures that don't use AutoProcessor") + try: + proc = AutoProcessor.from_pretrained( + args.pyt_ckpt_path, trust_remote_code=args.trust_remote_code + ) + print(f"Saving processor config to {export_path}") + proc.save_pretrained(export_path) + except (OSError, ValueError, ImportError) as e: + warnings.warn( + f"Could not save processor config: {e}. Some multimodal architectures don't use AutoProcessor." + )If you prefer to keep the
is_vlm
name, consider updating user-facing prints to say “multimodal” for audio-text models (e.g., Whisper).
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
examples/llm_ptq/hf_ptq.py
(3 hunks)modelopt/torch/export/model_utils.py
(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
examples/llm_ptq/hf_ptq.py (2)
modelopt/torch/export/model_utils.py (1)
is_multimodal_model
(74-111)modelopt/torch/export/unified_export_megatron.py (1)
save_pretrained
(298-462)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: linux
- GitHub Check: build-docs
- GitHub Check: code-quality
🔇 Additional comments (3)
modelopt/torch/export/model_utils.py (1)
63-63
: Public API export updated correctly.Adding
is_multimodal_model
to__all__
is appropriate.examples/llm_ptq/hf_ptq.py (2)
28-31
: Imports for HF config/processor look good.Needed for exporting original configs for multimodal models.
44-45
: Centralized detection via util is a good move.Using
is_multimodal_model
avoids scattering heuristics.
befce1e
to
1fd1154
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #286 +/- ##
=======================================
Coverage 73.94% 73.95%
=======================================
Files 172 172
Lines 17405 17405
=======================================
+ Hits 12870 12871 +1
+ Misses 4535 4534 -1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: Zhiyu Cheng <[email protected]>
Signed-off-by: Zhiyu Cheng <[email protected]>
Signed-off-by: Zhiyu Cheng <[email protected]>
1fd1154
to
2e75ee3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
examples/llm_ptq/hf_ptq.py (2)
576-583
: Avoid clobbering HF export config; write original config to a subdir (or gate by export_fmt).When
export_fmt == "hf"
,export_hf_checkpoint
will likely write its ownconfig.json
, overwriting the “original” one just saved here. Save originals under a subdirectory to preserve both, or only save originals for the TRT-LLM path.Apply:
- if is_vlm: - # Save original model config and the processor config to the export path for VLMs. - print(f"Saving original model config to {export_path}") + if is_vlm: + # Preserve the original HF configs without clobbering HF export outputs. + cfg_dir = export_path if args.export_fmt == "tensorrt_llm" else f"{export_path}/original_hf_config" + print(f"Saving original model config to {cfg_dir}") - AutoConfig.from_pretrained( - args.pyt_ckpt_path, trust_remote_code=args.trust_remote_code - ).save_pretrained(export_path) + AutoConfig.from_pretrained( + args.pyt_ckpt_path, trust_remote_code=args.trust_remote_code + ).save_pretrained(cfg_dir)
584-593
: Narrow the exception scope and prefer warnings for non-fatal save failures.Catching
Exception
is broad. Align with other exporters and only catch expected errors; emit a warning.Apply:
- # Try to save processor config if available - try: - print(f"Saving processor config to {export_path}") - AutoProcessor.from_pretrained( - args.pyt_ckpt_path, trust_remote_code=args.trust_remote_code - ).save_pretrained(export_path) - except Exception as e: - print(f"Warning: Could not save processor config: {e}") - print("This is normal for some VLM architectures that don't use AutoProcessor") + # Try to save processor config if available + try: + print(f"Saving processor config to {cfg_dir}") + AutoProcessor.from_pretrained( + args.pyt_ckpt_path, trust_remote_code=args.trust_remote_code + ).save_pretrained(cfg_dir) + except (OSError, ValueError, ImportError) as e: + warnings.warn( + f"Could not save processor config: {e}. " + "Some VLM architectures don't use AutoProcessor." + )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
examples/llm_ptq/hf_ptq.py
(3 hunks)modelopt/torch/export/model_utils.py
(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- modelopt/torch/export/model_utils.py
🧰 Additional context used
🧬 Code graph analysis (1)
examples/llm_ptq/hf_ptq.py (2)
modelopt/torch/export/model_utils.py (1)
is_multimodal_model
(74-111)modelopt/torch/export/unified_export_megatron.py (1)
save_pretrained
(298-462)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: linux
- GitHub Check: wait-checks / wait
- GitHub Check: build-docs
- GitHub Check: code-quality
🔇 Additional comments (3)
examples/llm_ptq/hf_ptq.py (3)
27-34
: Imports for AutoConfig/AutoProcessor look good.These are used in the new VLM handling block; no issues.
44-45
: Good move: centralize VLM detection via util.Using
is_multimodal_model
aligns with earlier feedback to factor this out and should reduce false negatives across VLM families.
573-575
: Correct detection scope.Checking
full_model
(not the possibly narrowedmodel
) is the right choice for robust VLM detection.
What does this PR do?
Type of change: ? Minor code change
Overview: Make vlm detection more robust in ptq workflow
Usage
Testing
Additional Information
Summary by CodeRabbit
New Features
Refactor
Chores