Support 3D Weights in GPTQ Algorithm #3835

anzr299 · 2026-01-12T08:35:23Z

Changes

The approach here is quite straightfoward, _quantize_weights works as usual for 2D weights. The difference is in calculate hessian where the hessian is 3D in both 3D and 2D weights case. By default hessian has the shape (1, hidden_dim, hidden_dim).
Before this was just (hidden_dim, hidden_dim). For 3D, it is (num_experts/batch, hidden_dim, hidden_dim).

Now, this 3D hessian or "batched" hessian is looped over and the 2D weight is extracted and passed to the old _quantize_weights function as usual and scale/zp are returned. These scales and zp are then stacked together in a collector variable. For 2D case, it is flattened. For 3D the stacked scale, zp are returned.

NOTE: Scale Estimation + GPTQ support is not added for 3D weights yet

Reason for changes

Support 3D weights for models like MoE in GPTQ

Related tickets

175789 & 175212

Tests

Model: Qwen/Qwen3-30B-A3B
NNCF Backend: OpenVINO
Higher is better.
Task: gsm8k
Limit: 100
Max New Tokens: 10000
OpenVINO version: 2026.0.0.dev20251111 (with WA for 176465)
n-shots: 5(default)

Precision Type	Filter	Value
INT4 SYM GS128 (with GPTQ) Calibrated on GSM8k with 128 samples	flexible-extract	0.79
	strict-match	0.64
INT4 SYM GS128 (with GPTQ after bug fix commit `dc355fe`) Calibrated on GSM8k with 128 samples	flexible-extract	0.78
	strict-match	0.57
INT4 SYM GS128	flexible-extract	0.55
	strict-match	0.29
FP32	flexible-extract	0.92
	strict-match	0.82

Comparison of accuracy with meta-llama/Llama-3.2-1B-Instruct on Develop and this branch

Variant	bits_per_byte	byte_perplexity	word_perplexity
This Branch (GPTQ)	0.7965	1.7368	19.1466
develop (GPTQ)	0.7965	1.7368	19.1466

andreyanufr · 2026-01-19T14:32:30Z

src/nncf/quantization/algorithms/weight_compression/gptq.py

 import math
 from typing import Optional, TypeVar

+import numpy as np


@AlexanderDokuchaev Is it possible to use np here ?

if not I can use the same approach as

nncf/src/nncf/quantization/algorithms/weight_compression/activation_stats.py

Line 46 in d35c32b

reduce(mul, shape[:act_ch_axis] + shape[act_ch_axis % len(shape) + 1 :], 1) for shape in stats.shape_values

andreyanufr · 2026-01-20T09:34:49Z

src/nncf/quantization/algorithms/weight_compression/gptq.py

-            wc_params.node_with_weight, wc_params.weight_port_id, model, graph
-        )
-        weight_tensor = fns.astype(weight_tensor, TensorDataType.float32)
+        if len(hessian.shape) == 3 and hessian.shape[0] == 1:


In which model does this happen?

For now this is just a safety check. Since for 3D case also we pass 2D hessian to this function. I added it when the older test called this function manually. Would it be better to remove it?

Copilot

Pull request overview

This PR adds support for 3D weights in the GPTQ algorithm to enable quantization of models with 3D weight tensors, such as Mixture-of-Experts (MoE) models. The implementation extends the existing GPTQ algorithm to handle batched Hessian matrices.

Changes:

Extended _calculate_hessian to support 3D weight tensors by creating batched Hessian matrices
Refactored _quantize_weights to accept tensors directly instead of fetching them, enabling batch processing
Added loop-based quantization in apply method to process each batch of 3D weights separately

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File	Description
src/nncf/quantization/algorithms/weight_compression/gptq.py	Implements core GPTQ algorithm changes to support 3D weights through batched Hessian calculation and iterative quantization
tests/openvino/native/quantization/test_gptq.py	Adds parameterized test coverage for both 2D and 3D weight cases with reference implementation validation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/nncf/quantization/algorithms/weight_compression/gptq.py

tests/openvino/native/quantization/test_gptq.py

ljaljushkin

Nice addition!

src/nncf/quantization/algorithms/weight_compression/gptq.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

anzr299 added 3 commits January 9, 2026 13:29

init

5a287f8

fix

4076d08

make gptq work

26e8403

anzr299 requested a review from a team as a code owner January 12, 2026 08:35

anzr299 marked this pull request as draft January 12, 2026 08:35

anzr299 added 2 commits January 12, 2026 13:01

set default value for is_3d_weights arg in calculate_hessian

2c8dd51

fix gptq test

3d48185

github-actions bot added the NNCF OpenVINO Pull requests that updates NNCF OpenVINO label Jan 12, 2026

anzr299 added 3 commits January 12, 2026 17:35

pass only specific batch of inputs for MoE case

8b46a1a

Merge branch 'openvinotoolkit:develop' into an/gptq/support_3d_weights

c52c21e

fix gptq zero point check

8ac8fdf

MaximProshin added the Code Freeze label Jan 19, 2026

anzr299 added 3 commits January 19, 2026 15:06

fix bugs

dc355fe

add test

01b1498

Merge branch 'openvinotoolkit:develop' into an/gptq/support_3d_weights

e3e21fb

anzr299 marked this pull request as ready for review January 19, 2026 11:52

andreyanufr reviewed Jan 19, 2026

View reviewed changes

remove numpy dependency

c93419e

andreyanufr reviewed Jan 20, 2026

View reviewed changes

ljaljushkin requested a review from Copilot January 20, 2026 18:40

Copilot AI reviewed Jan 20, 2026

View reviewed changes

ljaljushkin requested changes Jan 20, 2026

View reviewed changes

src/nncf/quantization/algorithms/weight_compression/gptq.py Outdated Show resolved Hide resolved

andreyanufr approved these changes Jan 21, 2026

View reviewed changes

anzr299 and others added 3 commits January 21, 2026 12:18

remove old todo

a57918c

remove old hessian warning

ad00297

Apply suggestion from @Copilot

d2091d0

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

ljaljushkin approved these changes Jan 21, 2026

View reviewed changes

anzr299 added 2 commits January 21, 2026 15:08

apply lintrunner

a9f54ff

Merge branch 'openvinotoolkit:develop' into an/gptq/support_3d_weights

3cbb0fb

MaximProshin merged commit 61ea196 into openvinotoolkit:develop Jan 21, 2026
18 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support 3D Weights in GPTQ Algorithm #3835

Support 3D Weights in GPTQ Algorithm #3835

Uh oh!

anzr299 commented Jan 12, 2026 •

edited

Loading

Uh oh!

andreyanufr Jan 19, 2026

Uh oh!

anzr299 Jan 19, 2026

Uh oh!

anzr299 Jan 19, 2026

Uh oh!

andreyanufr Jan 20, 2026

Uh oh!

anzr299 Jan 20, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ljaljushkin left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Support 3D Weights in GPTQ Algorithm #3835

Support 3D Weights in GPTQ Algorithm #3835

Uh oh!

Conversation

anzr299 commented Jan 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Reason for changes

Related tickets

Tests

Uh oh!

andreyanufr Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

anzr299 Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

anzr299 Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

andreyanufr Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

anzr299 Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ljaljushkin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

anzr299 commented Jan 12, 2026 •

edited

Loading