[quantization] Fallback for complex models by stamalakhov · Pull Request #583 · Samsung/TICO

stamalakhov · 2026-03-25T10:30:20Z

This PR:

quantizes the whole model (e.g. vlm) as a single layer in GPTQ
adds example script for general vlm models

log of `python tico/quantization/wrapq/examples/quantize_full_vlm_model_with_gptq.py --model HuggingFaceTB/SmolVLM-256M-Instruct --nsamples_for_qcalibration 20 --eval_tasks vqav2 --gptq_mse mse`


Namespace(model='HuggingFaceTB/SmolVLM-256M-Instruct', device='cuda', dtype='float32', seed=42, trust_remote_code=False, hf_token=None, cache_dir='/mnt/storage/transformers_cache', nsamples_for_qcalibration=20, nsamples_for_evaluation=50, calib_seq_len=2048, max_seq_len=2048, linear_weight_bits=4, gptq_mse='mse', eval_tasks='vqav2', sensitivity_path=None)
=== Config ===
Model            : HuggingFaceTB/SmolVLM-256M-Instruct
Device           : cuda
DType            : float32
Calib seq len    : 2048
Max seq len      : 2048

Loading FP model …
Evaluating original model
Evaluating original model
vqav2: EM=0.5600  (n=50)
Original EM: 0.5600  (dataset=vqav2, n=50)
Resolving data files: 100%|█████████████████████████████████████████████████████████████| 68/68 [00:00<00:00, 7662.06it/s]
Resolving data files: 100%|█████████████████████████████████████████████████████████████| 36/36 [00:00<00:00, 6735.73it/s]
Resolving data files: 100%|██████████████████████████████████████████████████████████| 143/143 [00:00<00:00, 10294.45it/s]
Resolving data files: 100%|█████████████████████████████████████████████████████████████| 68/68 [00:00<00:00, 6965.07it/s]
Resolving data files: 100%|█████████████████████████████████████████████████████████████| 36/36 [00:00<00:00, 7293.38it/s]
Resolving data files: 100%|██████████████████████████████████████████████████████████| 143/143 [00:00<00:00, 13307.27it/s]
[info] Loaded dataset: lmms-lab/VQAv2 (testdev, streaming), size=20
Applying GPTQ …
Quantizing layers: 100%|████████████████████████████████████████████████████████████████| 1/1 [02:02<00:00, 122.53s/layer]
Evaluating quantized model                                                                                                
vqav2: EM=0.5600  (n=50)
|model|vqav2|
|--|--|
|original|0.5600|
|quantized|0.5600|

Please see other results in #559
Draft: #559
Related: #548

TICO-DCO-1.0-Signed-off-by: s.malakhov s.malakhov@partner.samsung.com

This PR: 1. quantizes the whole model (e.g. vlm) as a single layer in GPTQ 2. adds example script for general vlm models TICO-DCO-1.0-Signed-off-by: s.malakhov <s.malakhov@partner.samsung.com>

Torrero

LGTM

mhs4670go

LGTM

stamalakhov self-assigned this Mar 25, 2026

stamalakhov mentioned this pull request Mar 25, 2026

[quantization] [draft] GPTQ for VLM #559

Closed

3 tasks

[quantization] Fallback for complex models

6779d3e

This PR: 1. quantizes the whole model (e.g. vlm) as a single layer in GPTQ 2. adds example script for general vlm models TICO-DCO-1.0-Signed-off-by: s.malakhov <s.malakhov@partner.samsung.com>

stamalakhov force-pushed the vlm_gptq_pr branch from 134da62 to 6779d3e Compare March 25, 2026 10:37

Torrero approved these changes Mar 25, 2026

View reviewed changes

mhs4670go approved these changes Mar 26, 2026

View reviewed changes

mhs4670go merged commit 04b1927 into Samsung:main Mar 26, 2026
7 checks passed

stamalakhov deleted the vlm_gptq_pr branch March 26, 2026 04:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[quantization] Fallback for complex models#583

[quantization] Fallback for complex models#583
mhs4670go merged 1 commit intoSamsung:mainfrom
stamalakhov:vlm_gptq_pr

stamalakhov commented Mar 25, 2026

Uh oh!

Torrero left a comment

Uh oh!

mhs4670go left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

stamalakhov commented Mar 25, 2026

Uh oh!

Torrero left a comment

Choose a reason for hiding this comment

Uh oh!

mhs4670go left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants