[quantization] Introduce wrapper for Qwen3VLTextModel by dvsav · Pull Request #572 · Samsung/TICO

dvsav · 2026-03-23T13:43:42Z

This change introduces QuantQwen3VLTextModel wrapper to support post-training quantization of Qwen3VLTextModel module.

Why?

Qwen3VLTextModel is an essential part of Qwen model.
Trying to quantize Qwen3VLTextModel via PTQ generates exception PTQQuantizer: no quantization wrapper for Qwen3VLTextModel.

What

This change introduces:

Class QuantQwen3VLTextModel (tico/quantization/wrapq/wrappers/qwen_vl/quant_text_model.py).
Unit tests: class TestQuantQwen3VLTextModel (test/quantization/wrapq/wrappers/qwen_vl/test_quant_text_model.py) - skipped if transformers package is not installed.
New entry in _CORE_MODULES (tico/quantization/wrapq/wrappers/registry.py).
Example of Qwen3VLTextModel quantization and conversion to Circle (tico/quantization/wrapq/examples/qwen/quantize_text_model.py).

Unit Tests

Unit tests results with coverage information:

$ coverage run -m pytest test/quantization/wrapq/wrappers/qwen_vl/test_quant_text_model.py -v
================================================================================================================ test session starts ===========
platform linux -- Python 3.10.12, pytest-8.4.0, pluggy-1.6.0 -- /home/d.savchenkov/myenv/bin/python3
cachedir: .pytest_cache
rootdir: /home/d.savchenkov/TICO
configfile: pyproject.toml
plugins: anyio-4.12.0, mock-3.15.1, xdist-3.7.0, cov-6.2.1
collected 15 items                                                                                                                                                                                                                                  

test/quantization/wrapq/wrappers/qwen_vl/test_quant_text_model.py::TestQuantQwen3VLTextModel::test_deepstack_injection             PASSED [  6%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_text_model.py::TestQuantQwen3VLTextModel::test_different_batch_sizes           PASSED [ 13%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_text_model.py::TestQuantQwen3VLTextModel::test_different_sequence_lengths      PASSED [ 20%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_text_model.py::TestQuantQwen3VLTextModel::test_embedding_layer_quantization    PASSED [ 26%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_text_model.py::TestQuantQwen3VLTextModel::test_forward_diff                    PASSED [ 33%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_text_model.py::TestQuantQwen3VLTextModel::test_inputs_embeds_path              PASSED [ 40%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_text_model.py::TestQuantQwen3VLTextModel::test_layers_wrapped                  PASSED [ 46%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_text_model.py::TestQuantQwen3VLTextModel::test_mode_transitions                PASSED [ 53%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_text_model.py::TestQuantQwen3VLTextModel::test_no_cache_mode                   PASSED [ 60%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_text_model.py::TestQuantQwen3VLTextModel::test_norm_wrapped                    PASSED [ 66%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_text_model.py::TestQuantQwen3VLTextModel::test_observer_count                  PASSED [ 73%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_text_model.py::TestQuantQwen3VLTextModel::test_output_shape                    PASSED [ 80%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_text_model.py::TestQuantQwen3VLTextModel::test_per_module_override             PASSED [ 86%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_text_model.py::TestQuantQwen3VLTextModel::test_registration_in_registry        PASSED [ 93%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_text_model.py::TestQuantQwen3VLTextModel::test_rotary_emb_not_wrapped          PASSED [100%]

========================================================= 15 passed, 2 warnings in 8.35s =======================================================

Coverage info (irrelevant files skipped):

$ coverage report -m
Name                                                                    Stmts   Miss  Cover   Missing
-----------------------------------------------------------------------------------------------------
...
tico/quantization/wrapq/wrappers/qwen_vl/quant_text_attn.py               135      5    96%   196-197, 201-203
tico/quantization/wrapq/wrappers/qwen_vl/quant_text_decoder_layer.py       42      0   100%
tico/quantization/wrapq/wrappers/qwen_vl/quant_text_mlp.py                 43      0   100%
tico/quantization/wrapq/wrappers/qwen_vl/quant_text_model.py               96      7    93%   150, 156-158, 183-184, 187-188
...
-----------------------------------------------------------------------------------------------------
TOTAL                                                                   11374   7195    37%

Script for testing quantization and conversion to Circle

$ python tico/quantization/wrapq/examples/qwen/quantize_text_model.py
┌───────────── Quantization Error Summary ─────────────
│ Mean |diff|: 0.132488
│ PEIR       : 8.692164 %
└──────────────────────────────────────────────────────
    ┌────────────────────────────────────────────┐
 4.0┤                                            │
    │                                    ••••••  │
 2.5┤                                ••••••••    │
    │                            •••••••••       │
 1.0┤                        ••••••••••          │
    │                     •••••••••              │
-0.6┤                 ••••••••••                 │
    │              •••••••••                     │
-2.1┤           ••••••••                         │
    │        •••••••                             │
-3.6┤     •••••                                  │
    │  •                                         │
-5.1┤                                            │
    └┬──────────┬──────────┬─────────┬──────────┬┘
   -5.1       -2.9       -0.6       1.7       4.0 

Circle model saved as 'qwen3vl_text_model.q.circle'

dvsav · 2026-03-25T13:05:34Z

tico/quantization/wrapq/wrappers/qwen_vl/quant_text_model.py

+from transformers.cache_utils import Cache
+
+
+def apply_interleaved_mrope(self, freqs, mrope_section):


Note for reviewers

This function replaces the original Qwen3VLTextRotaryEmbedding.apply_interleaved_mrope implementation that uses slice(offset, length, 3) that emits slice_scatter operator with step=3 when the model is exported. See this comment for details.

mhs4670go · 2026-03-26T02:44:40Z

tico/quantization/wrapq/examples/qwen/quantize_text_model.py

+    # Convert to quantized version
+    quantized_model = tico.quantization.convert(prepared_model, inplace=True)
+
+    # Compute PEIR (Peak Error-to-Input Ratio) between quantized model and original model


FYI, Peak Error-to-Interval Ratio.

mhs4670go · 2026-03-26T02:55:12Z

tico/quantization/wrapq/wrappers/qwen_vl/quant_text_model.py

+        h_w_bands.append(freqs_bands)
+
+    # Now we need to build the interleaved output
+    # Original T dimension has indices 0-63


Maybe we can replace this line with like below.

"Original T dimension indices range from 0 to (head_dim // 2 - 1)"

mhs4670go · 2026-03-26T03:00:47Z

tico/quantization/wrapq/wrappers/qwen_vl/quant_text_model.py

+            if deepstack_visual_embeds is not None and layer_idx in range(
+                len(deepstack_visual_embeds)
+            ):
+                deepstack_visual_embeds = self._fq(


Does this work? This line assigns a single value to the whole list.

deepstack_visual_embeds[layer_idx] = self._fq(..)

👍Yes, indeed. Thanks for catching that. Strangely enugh, this bug didn't lead to unit test failures. I'll investigate and try to cover that case.

mhs4670go · 2026-03-26T03:11:31Z

tico/quantization/wrapq/wrappers/qwen_vl/quant_text_model.py

+    # Original T dimension has indices 0-63
+    # We want to replace specific indices with H/W bands
+
+    # The interleaving pattern: T0, H1, W2, T3, T4, H5, W6, T7, ...


IIUC, this example pattern is misleading because fallback to T is not permanent.

Even after multiple T positions (due to missing H/W bands), later indices may still produce valid H/W values, resulting in a non-monotonic and unintuitive layout.

This change introduces QuantQwen3VLTextModel wrapper to support post-training quantization of Qwen3VLTextModel operation. TICO-DCO-1.0-Signed-off-by: d.savchenkov <d.savchenkov@partner.samsung.com>

mhs4670go

LGTM

I think it's a good start point as basic wrapper. Let's revise the way of export after we decide how to inference on the accelerator like we do with llama;prefill/decode

dvsav mentioned this pull request Mar 23, 2026

Qwen3-VL: Implement quantization wrappers #483

Open

dvsav changed the title ~~Quant text model~~ [quantization] Introduce wrapper for Qwen3VLTextModel Mar 23, 2026

dvsav force-pushed the quant_text_model branch 6 times, most recently from 5b853b7 to fd824f2 Compare March 24, 2026 14:17

dvsav mentioned this pull request Mar 24, 2026

[quantization] Introduce wrappers for Qwen3VLTextDecoderLayer and Qwen3VLTextModel #535

Draft

dvsav force-pushed the quant_text_model branch 2 times, most recently from 67d178d to 396a338 Compare March 25, 2026 12:56

dvsav commented Mar 25, 2026

View reviewed changes

dvsav marked this pull request as ready for review March 25, 2026 13:06

mhs4670go reviewed Mar 26, 2026

View reviewed changes

dvsav force-pushed the quant_text_model branch from 396a338 to 84b8b70 Compare March 26, 2026 07:31

[quantization] Introduce wrapper for Qwen3VLTextModel

bb5856d

This change introduces QuantQwen3VLTextModel wrapper to support post-training quantization of Qwen3VLTextModel operation. TICO-DCO-1.0-Signed-off-by: d.savchenkov <d.savchenkov@partner.samsung.com>

dvsav force-pushed the quant_text_model branch from 84b8b70 to bb5856d Compare March 26, 2026 07:45

mhs4670go approved these changes Mar 26, 2026

View reviewed changes

mhs4670go merged commit 404c9a4 into Samsung:main Mar 26, 2026
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[quantization] Introduce wrapper for Qwen3VLTextModel#572

[quantization] Introduce wrapper for Qwen3VLTextModel#572
mhs4670go merged 1 commit intoSamsung:mainfrom
dvsav:quant_text_model

dvsav commented Mar 23, 2026 •

edited

Loading

Uh oh!

dvsav Mar 25, 2026 •

edited

Loading

Uh oh!

mhs4670go Mar 26, 2026

Uh oh!

mhs4670go Mar 26, 2026

Uh oh!

mhs4670go Mar 26, 2026

Uh oh!

dvsav Mar 26, 2026

Uh oh!

mhs4670go Mar 26, 2026

Uh oh!

mhs4670go left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		from transformers.cache_utils import Cache


		def apply_interleaved_mrope(self, freqs, mrope_section):

Conversation

dvsav commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why?

What

Unit Tests

Script for testing quantization and conversion to Circle

Uh oh!

dvsav Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Note for reviewers

Uh oh!

mhs4670go Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

mhs4670go Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

mhs4670go Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

dvsav Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

mhs4670go Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

mhs4670go left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dvsav commented Mar 23, 2026 •

edited

Loading

dvsav Mar 25, 2026 •

edited

Loading