Skip to content

[quantization] [draft] GPTQ for VLM#559

Closed
stamalakhov wants to merge 1 commit intoSamsung:mainfrom
stamalakhov:gptq_forVLM
Closed

[quantization] [draft] GPTQ for VLM#559
stamalakhov wants to merge 1 commit intoSamsung:mainfrom
stamalakhov:gptq_forVLM

Conversation

@stamalakhov
Copy link
Copy Markdown
Contributor

@stamalakhov stamalakhov commented Mar 17, 2026

This PR is the first try out for full quantization of VLM model by GPTQ+PTQ.

TODO:

  1. m.b. make it less resource intensive (right now it makes inference for the whole model, not in layerwise fashion)
  2. support PTQ quantization
  3. synchronize GPTQ/PTQ Conv3d quantization
  4. support convert to circle
model orig_accuracy_vqav2 minmax_quantize_accuracy_vqav2 GPTQ_mse_accuracy_vqav2 GPTQ_smse_accuracy_vqav2
Qwen2_2B 0.8900 0.8260 0.8450 0.8740
Qwen3_2B 0.8570 0.7970 0.8470 0.8390
Qwen3_4B 0.8950 0.8450 0.8910 0.8820
some_details

all models above were quantized using GPTQ+mse/GPTQ+smse:

  1. weights of torch.nn.Linear, torch.nn.Conv2D, torch.nn.Conv1D, nn.ConvTranspose2d to 4 bits,
  2. activations were left at float32
  3. accuracy was computed on the first 1000 samples on vqav2

for 256:

model vqav2_on_1000_samples
Qwen2_2B_original 0.8900
Qwen2_2B_GPTQ_mse_256_qsamples 0.8630
Qwen2_2B_GPTQ_smse_256_qsamples 0.8780
Qwen3_2B_original 0.8570
Qwen3_2B_GPTQ_mse_256_qsamples 0.8430
Qwen3_2B_GPTQ_smse_256_qsamples 0.8520

Related: #548

TICO-DCO-1.0-Signed-off-by: s.malakhov s.malakhov@partner.samsung.com

@stamalakhov stamalakhov self-assigned this Mar 17, 2026
@stamalakhov stamalakhov force-pushed the gptq_forVLM branch 7 times, most recently from 156b2e2 to 44de20c Compare March 20, 2026 08:45
This PR is the first try-out for full quantization of VLM model by GPTQ+PTQ.

TICO-DCO-1.0-Signed-off-by: s.malakhov <s.malakhov@partner.samsung.com>
@stamalakhov
Copy link
Copy Markdown
Contributor Author

I believe we can close this draft. Everything was merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants