-
Notifications
You must be signed in to change notification settings - Fork 279
Support 3D Weights in GPTQ Algorithm #3835
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support 3D Weights in GPTQ Algorithm #3835
Conversation
| import math | ||
| from typing import Optional, TypeVar | ||
|
|
||
| import numpy as np |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AlexanderDokuchaev Is it possible to use np here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if not I can use the same approach as
| reduce(mul, shape[:act_ch_axis] + shape[act_ch_axis % len(shape) + 1 :], 1) for shape in stats.shape_values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| wc_params.node_with_weight, wc_params.weight_port_id, model, graph | ||
| ) | ||
| weight_tensor = fns.astype(weight_tensor, TensorDataType.float32) | ||
| if len(hessian.shape) == 3 and hessian.shape[0] == 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In which model does this happen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now this is just a safety check. Since for 3D case also we pass 2D hessian to this function. I added it when the older test called this function manually. Would it be better to remove it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds support for 3D weights in the GPTQ algorithm to enable quantization of models with 3D weight tensors, such as Mixture-of-Experts (MoE) models. The implementation extends the existing GPTQ algorithm to handle batched Hessian matrices.
Changes:
- Extended
_calculate_hessianto support 3D weight tensors by creating batched Hessian matrices - Refactored
_quantize_weightsto accept tensors directly instead of fetching them, enabling batch processing - Added loop-based quantization in
applymethod to process each batch of 3D weights separately
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| src/nncf/quantization/algorithms/weight_compression/gptq.py | Implements core GPTQ algorithm changes to support 3D weights through batched Hessian calculation and iterative quantization |
| tests/openvino/native/quantization/test_gptq.py | Adds parameterized test coverage for both 2D and 3D weight cases with reference implementation validation |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
ljaljushkin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice addition!
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Changes
The approach here is quite straightfoward,
_quantize_weightsworks as usual for 2D weights. The difference is in calculate hessian where the hessian is 3D in both 3D and 2D weights case. By default hessian has the shape (1, hidden_dim, hidden_dim).Before this was just (hidden_dim, hidden_dim). For 3D, it is (num_experts/batch, hidden_dim, hidden_dim).
Now, this 3D hessian or "batched" hessian is looped over and the 2D weight is extracted and passed to the old
_quantize_weightsfunction as usual and scale/zp are returned. These scales and zp are then stacked together in a collector variable. For 2D case, it is flattened. For 3D the stacked scale, zp are returned.NOTE: Scale Estimation + GPTQ support is not added for 3D weights yet
Reason for changes
Support 3D weights for models like MoE in GPTQ
Related tickets
175789 & 175212
Tests
Model: Qwen/Qwen3-30B-A3B
NNCF Backend: OpenVINO
Higher is better.
Task: gsm8k
Limit: 100
Max New Tokens: 10000
OpenVINO version: 2026.0.0.dev20251111 (with WA for 176465)
n-shots: 5(default)
Comparison of accuracy with meta-llama/Llama-3.2-1B-Instruct on Develop and this branch