[quantization] Introduce wrapper for Qwen3VLTextModel#572
[quantization] Introduce wrapper for Qwen3VLTextModel#572mhs4670go merged 1 commit intoSamsung:mainfrom
Conversation
5b853b7 to
fd824f2
Compare
67d178d to
396a338
Compare
| from transformers.cache_utils import Cache | ||
|
|
||
|
|
||
| def apply_interleaved_mrope(self, freqs, mrope_section): |
There was a problem hiding this comment.
Note for reviewers
This function replaces the original Qwen3VLTextRotaryEmbedding.apply_interleaved_mrope implementation that uses slice(offset, length, 3) that emits slice_scatter operator with step=3 when the model is exported. See this comment for details.
| # Convert to quantized version | ||
| quantized_model = tico.quantization.convert(prepared_model, inplace=True) | ||
|
|
||
| # Compute PEIR (Peak Error-to-Input Ratio) between quantized model and original model |
There was a problem hiding this comment.
FYI, Peak Error-to-Interval Ratio.
| h_w_bands.append(freqs_bands) | ||
|
|
||
| # Now we need to build the interleaved output | ||
| # Original T dimension has indices 0-63 |
There was a problem hiding this comment.
Maybe we can replace this line with like below.
"Original T dimension indices range from 0 to (head_dim // 2 - 1)"
| if deepstack_visual_embeds is not None and layer_idx in range( | ||
| len(deepstack_visual_embeds) | ||
| ): | ||
| deepstack_visual_embeds = self._fq( |
There was a problem hiding this comment.
Does this work? This line assigns a single value to the whole list.
deepstack_visual_embeds[layer_idx] = self._fq(..)There was a problem hiding this comment.
👍Yes, indeed. Thanks for catching that. Strangely enugh, this bug didn't lead to unit test failures. I'll investigate and try to cover that case.
| # Original T dimension has indices 0-63 | ||
| # We want to replace specific indices with H/W bands | ||
|
|
||
| # The interleaving pattern: T0, H1, W2, T3, T4, H5, W6, T7, ... |
There was a problem hiding this comment.
IIUC, this example pattern is misleading because fallback to T is not permanent.
Even after multiple T positions (due to missing H/W bands), later indices may still produce valid H/W values, resulting in a non-monotonic and unintuitive layout.
This change introduces QuantQwen3VLTextModel wrapper to support post-training quantization of Qwen3VLTextModel operation. TICO-DCO-1.0-Signed-off-by: d.savchenkov <d.savchenkov@partner.samsung.com>
mhs4670go
left a comment
There was a problem hiding this comment.
LGTM
I think it's a good start point as basic wrapper. Let's revise the way of export after we decide how to inference on the accelerator like we do with llama;prefill/decode
This change introduces
QuantQwen3VLTextModelwrapper to support post-training quantization ofQwen3VLTextModelmodule.Why?
Qwen3VLTextModelis an essential part of Qwen model.Trying to quantize
Qwen3VLTextModelvia PTQ generates exceptionPTQQuantizer: no quantization wrapper for Qwen3VLTextModel.What
This change introduces:
QuantQwen3VLTextModel(tico/quantization/wrapq/wrappers/qwen_vl/quant_text_model.py).class TestQuantQwen3VLTextModel(test/quantization/wrapq/wrappers/qwen_vl/test_quant_text_model.py) - skipped iftransformerspackage is not installed._CORE_MODULES(tico/quantization/wrapq/wrappers/registry.py).Qwen3VLTextModelquantization and conversion to Circle (tico/quantization/wrapq/examples/qwen/quantize_text_model.py).Unit Tests
Unit tests results with coverage information:
Coverage info (irrelevant files skipped):
Script for testing quantization and conversion to Circle