Skip to content

[quantization] Introduce wrapper for Qwen3VLTextModel#572

Merged
mhs4670go merged 1 commit intoSamsung:mainfrom
dvsav:quant_text_model
Mar 26, 2026
Merged

[quantization] Introduce wrapper for Qwen3VLTextModel#572
mhs4670go merged 1 commit intoSamsung:mainfrom
dvsav:quant_text_model

Conversation

@dvsav
Copy link
Copy Markdown
Contributor

@dvsav dvsav commented Mar 23, 2026

This change introduces QuantQwen3VLTextModel wrapper to support post-training quantization of Qwen3VLTextModel module.

Why?

Qwen3VLTextModel is an essential part of Qwen model.
Trying to quantize Qwen3VLTextModel via PTQ generates exception PTQQuantizer: no quantization wrapper for Qwen3VLTextModel.

What

This change introduces:

  • Class QuantQwen3VLTextModel (tico/quantization/wrapq/wrappers/qwen_vl/quant_text_model.py).
  • Unit tests: class TestQuantQwen3VLTextModel (test/quantization/wrapq/wrappers/qwen_vl/test_quant_text_model.py) - skipped if transformers package is not installed.
  • New entry in _CORE_MODULES (tico/quantization/wrapq/wrappers/registry.py).
  • Example of Qwen3VLTextModel quantization and conversion to Circle (tico/quantization/wrapq/examples/qwen/quantize_text_model.py).

Unit Tests

Unit tests results with coverage information:

$ coverage run -m pytest test/quantization/wrapq/wrappers/qwen_vl/test_quant_text_model.py -v
================================================================================================================ test session starts ===========
platform linux -- Python 3.10.12, pytest-8.4.0, pluggy-1.6.0 -- /home/d.savchenkov/myenv/bin/python3
cachedir: .pytest_cache
rootdir: /home/d.savchenkov/TICO
configfile: pyproject.toml
plugins: anyio-4.12.0, mock-3.15.1, xdist-3.7.0, cov-6.2.1
collected 15 items                                                                                                                                                                                                                                  

test/quantization/wrapq/wrappers/qwen_vl/test_quant_text_model.py::TestQuantQwen3VLTextModel::test_deepstack_injection             PASSED [  6%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_text_model.py::TestQuantQwen3VLTextModel::test_different_batch_sizes           PASSED [ 13%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_text_model.py::TestQuantQwen3VLTextModel::test_different_sequence_lengths      PASSED [ 20%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_text_model.py::TestQuantQwen3VLTextModel::test_embedding_layer_quantization    PASSED [ 26%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_text_model.py::TestQuantQwen3VLTextModel::test_forward_diff                    PASSED [ 33%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_text_model.py::TestQuantQwen3VLTextModel::test_inputs_embeds_path              PASSED [ 40%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_text_model.py::TestQuantQwen3VLTextModel::test_layers_wrapped                  PASSED [ 46%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_text_model.py::TestQuantQwen3VLTextModel::test_mode_transitions                PASSED [ 53%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_text_model.py::TestQuantQwen3VLTextModel::test_no_cache_mode                   PASSED [ 60%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_text_model.py::TestQuantQwen3VLTextModel::test_norm_wrapped                    PASSED [ 66%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_text_model.py::TestQuantQwen3VLTextModel::test_observer_count                  PASSED [ 73%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_text_model.py::TestQuantQwen3VLTextModel::test_output_shape                    PASSED [ 80%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_text_model.py::TestQuantQwen3VLTextModel::test_per_module_override             PASSED [ 86%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_text_model.py::TestQuantQwen3VLTextModel::test_registration_in_registry        PASSED [ 93%]
test/quantization/wrapq/wrappers/qwen_vl/test_quant_text_model.py::TestQuantQwen3VLTextModel::test_rotary_emb_not_wrapped          PASSED [100%]

========================================================= 15 passed, 2 warnings in 8.35s =======================================================

Coverage info (irrelevant files skipped):

$ coverage report -m
Name                                                                    Stmts   Miss  Cover   Missing
-----------------------------------------------------------------------------------------------------
...
tico/quantization/wrapq/wrappers/qwen_vl/quant_text_attn.py               135      5    96%   196-197, 201-203
tico/quantization/wrapq/wrappers/qwen_vl/quant_text_decoder_layer.py       42      0   100%
tico/quantization/wrapq/wrappers/qwen_vl/quant_text_mlp.py                 43      0   100%
tico/quantization/wrapq/wrappers/qwen_vl/quant_text_model.py               96      7    93%   150, 156-158, 183-184, 187-188
...
-----------------------------------------------------------------------------------------------------
TOTAL                                                                   11374   7195    37%

Script for testing quantization and conversion to Circle

$ python tico/quantization/wrapq/examples/qwen/quantize_text_model.py
┌───────────── Quantization Error Summary ─────────────
│ Mean |diff|: 0.132488
│ PEIR       : 8.692164 %
└──────────────────────────────────────────────────────
    ┌────────────────────────────────────────────┐
 4.0┤                                            │
    │                                    ••••••  │
 2.5┤                                ••••••••    │
    │                            •••••••••       │
 1.0┤                        ••••••••••          │
    │                     •••••••••              │
-0.6┤                 ••••••••••                 │
    │              •••••••••                     │
-2.1┤           ••••••••                         │
    │        •••••••                             │
-3.6┤     •••••                                  │
    │  •                                         │
-5.1┤                                            │
    └┬──────────┬──────────┬─────────┬──────────┬┘
   -5.1       -2.9       -0.6       1.7       4.0 

Circle model saved as 'qwen3vl_text_model.q.circle'

@dvsav dvsav changed the title Quant text model [quantization] Introduce wrapper for Qwen3VLTextModel Mar 23, 2026
@dvsav dvsav force-pushed the quant_text_model branch 6 times, most recently from 5b853b7 to fd824f2 Compare March 24, 2026 14:17
@dvsav dvsav force-pushed the quant_text_model branch 2 times, most recently from 67d178d to 396a338 Compare March 25, 2026 12:56
from transformers.cache_utils import Cache


def apply_interleaved_mrope(self, freqs, mrope_section):
Copy link
Copy Markdown
Contributor Author

@dvsav dvsav Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for reviewers

This function replaces the original Qwen3VLTextRotaryEmbedding.apply_interleaved_mrope implementation that uses slice(offset, length, 3) that emits slice_scatter operator with step=3 when the model is exported. See this comment for details.

@dvsav dvsav marked this pull request as ready for review March 25, 2026 13:06
# Convert to quantized version
quantized_model = tico.quantization.convert(prepared_model, inplace=True)

# Compute PEIR (Peak Error-to-Input Ratio) between quantized model and original model
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, Peak Error-to-Interval Ratio.

h_w_bands.append(freqs_bands)

# Now we need to build the interleaved output
# Original T dimension has indices 0-63
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can replace this line with like below.

"Original T dimension indices range from 0 to (head_dim // 2 - 1)"

if deepstack_visual_embeds is not None and layer_idx in range(
len(deepstack_visual_embeds)
):
deepstack_visual_embeds = self._fq(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work? This line assigns a single value to the whole list.

deepstack_visual_embeds[layer_idx] = self._fq(..)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍Yes, indeed. Thanks for catching that. Strangely enugh, this bug didn't lead to unit test failures. I'll investigate and try to cover that case.

# Original T dimension has indices 0-63
# We want to replace specific indices with H/W bands

# The interleaving pattern: T0, H1, W2, T3, T4, H5, W6, T7, ...
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, this example pattern is misleading because fallback to T is not permanent.

Even after multiple T positions (due to missing H/W bands), later indices may still produce valid H/W values, resulting in a non-monotonic and unintuitive layout.

@dvsav dvsav force-pushed the quant_text_model branch from 396a338 to 84b8b70 Compare March 26, 2026 07:31
This change introduces QuantQwen3VLTextModel wrapper to support post-training quantization of Qwen3VLTextModel operation.

TICO-DCO-1.0-Signed-off-by: d.savchenkov <d.savchenkov@partner.samsung.com>
@dvsav dvsav force-pushed the quant_text_model branch from 84b8b70 to bb5856d Compare March 26, 2026 07:45
Copy link
Copy Markdown
Contributor

@mhs4670go mhs4670go left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

I think it's a good start point as basic wrapper. Let's revise the way of export after we decide how to inference on the accelerator like we do with llama;prefill/decode

@mhs4670go mhs4670go merged commit 404c9a4 into Samsung:main Mar 26, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants