[quantization] Introduce wrapper for Qwen3VLVisionModel#536
[quantization] Introduce wrapper for Qwen3VLVisionModel#536mhs4670go merged 1 commit intoSamsung:mainfrom
Conversation
18b9abd to
4a8bb88
Compare
Reference Code of
|
4a8bb88 to
97d5ec3
Compare
|
When I ran the example script with transformers 4.57.3, I got below errors. |
| def _get_vision_grid_thw(qcfg: Optional[PTQConfig]) -> torch.Tensor: | ||
| """Extract vision_grid_thw from config for precomputing RoPE embeddings""" | ||
| if qcfg and hasattr(qcfg, "vision_grid_thw"): | ||
| grid_thw = torch.tensor([getattr(qcfg, "vision_grid_thw")]) |
There was a problem hiding this comment.
I'm not sure if PTQConfig attribute is the right place to store vision_grid_thw as the latter has nothing to do with quantization, but I couldn't come up with a better idea.
There was a problem hiding this comment.
That's right! I was gonna work to resolve this after merging this PR.
right place to store vision_grid_thw as the latter has nothing to do with quantization,
I think it's okay to give this to PTQConfig. Because the information is needed to wrap the module.
059685e to
7eb7aae
Compare
Hi @mhs4670go |
|
Thank you for the great work. But, I think old/new style version branching looks fragile here. We already use runtime capability detection in torch_compat.py, so it would be better to follow the same pattern for transformers as well. In this case, the wrapper does not really care about an “old” or “new” version. What it actually needs to know is whether @functools.lru_cache(maxsize=None)
def qwen3_vl_has_deepstack_model_output() -> bool:
try:
module = importlib.import_module(
"transformers.models.qwen3_vl.modeling_qwen3_vl"
)
except ImportError:
return False
return hasattr(module, "BaseModelOutputWithDeepstackFeatures")Then in the wrapper: self.has_deepstack_model_output = qwen3_vl_has_deepstack_model_output()and later: if self.has_deepstack_model_output:
...
else:
...This keeps the code aligned with our existing compatibility policy:
I would also recommend placing this in a new file such as: so future transformers-specific probes can live in one place. Here's the full verison that I implemented. Please include this to the PR. """
Runtime capability-detection helpers for Hugging Face `transformers`.
Instead of branching on specific package versions such as
`transformers >= 5.x`, use these helpers to detect whether the exact
symbol or behavior required by the code is available at runtime.
Each probe is cached once per process with `functools.lru_cache`,
so repeated checks have negligible overhead.
"""
import functools
import importlib
@functools.lru_cache(maxsize=None)
def qwen3_vl_has_deepstack_model_output() -> bool:
"""
Return whether Qwen3-VL exposes
`BaseModelOutputWithDeepstackFeatures` in its modeling module.
This wrapper only needs to know whether the structured return type is
available. Using feature detection keeps the code resilient to
backports, forward ports, and non-linear package versioning.
Returns
-------
bool
``True`` if
`transformers.models.qwen3_vl.modeling_qwen3_vl`
defines `BaseModelOutputWithDeepstackFeatures`,
otherwise ``False``.
"""
try:
module = importlib.import_module(
"transformers.models.qwen3_vl.modeling_qwen3_vl"
)
except ImportError:
return False
return hasattr(module, "BaseModelOutputWithDeepstackFeatures") |
793bc21 to
08ab2ea
Compare
This change introduces QuantQwen3VLVisionModel wrapper to support post-training quantization of Qwen3VLVisionModel operation. TICO-DCO-1.0-Signed-off-by: d.savchenkov <d.savchenkov@partner.samsung.com>
08ab2ea to
c832fad
Compare
👍 Done |
This change introduces QuantQwen3VLVisionModel wrapper to support post-training quantization of Qwen3VLVisionModel operation.
Why?
Qwen3VLVisionModelmodule is used in the image encoder part of VLMs.Trying to quantize
Qwen3VLVisionModelvia PTQ generates exceptionPTQQuantizer: no quantization wrapper for Qwen3VLVisionModel.What
This change introduces:
class QuantQwen3VLVisionModel(tico/quantization/wrapq/wrappers/qwen_vl/quant_vision_model.py)_CORE_MODULES(tico/quantization/wrapq/wrappers/registry.py)class TestQuantQwen3VLVisionModel(test/quantization/wrapq/wrappers/nn/test_quant_vision_model.py)Qwen3VLVisionModelquantization and conversion to Circle (tico/quantization/wrapq/examples/qwen/quantize_vision_model.py).Design
tico/quantization/wrapq/wrappers/llama/quant_decoder_layer_prefill.py, namely, the precomputation of position embeddings beforehand to avoid their computation during inference.pos_embed_template.rope_cos_templateandrope_sin_template.QuantQwen3VLVisionModelimplementation deliberately uses numerous static methods (independent of theselfobject). This brings the benefits of functional programming making the dependencies and the data flow explicit and makes the code more unit-testable.Unit Tests
Unit tests results with coverage information:
Coverage info (irrelevant files skipped):
Example Script
PEIR depends on the number of
Qwen3VLVisionBlocks (Qwen3VLVisionConfig.depth). So below I'm providing several script runs for different depths.