[GPU] Add u2 weight quantization backend support #33243

ruskaruma · 2025-12-14T22:05:20Z

Summary

This PR enables unsigned 2-bit (u2) quantized weight support in the Intel GPU plugin, aligning GPU behavior with the CPU plugin's existing u2 implementation. The change is intentionally limited to backend enablement and correctness using reference kernels; performance optimizations and tooling are deferred to follow-up work.

Background

Support for u2 compressed weights was added to the CPU plugin in September 2024 via oneDNN integration. Since the GPU plugin uses a different execution backend (kernel_selector + OpenCL), equivalent support requires a separate implementation.

Related Issues

[Feature Request]: Do we have a method for 2-bit quantization? #32716 (GPU backend enablement)
Related: [CPU] FullyConnected acceleration with u2 weights decompression #31467 (CPU u2 support - merged Sep 2024)

Implements:

u2/i2 type support in kernel selector (Datatype, WeightsType, ParamsKey)
JIT constants and helpers for packed 2-bit types
OpenCL utilities for unpacking u2/i2 to f16/f32
Reference FullyConnected kernel with u2 decompression
Linear layout reorder kernels (oiyx, ioyx)
Graph pattern matching for u2 constants
Functional tests and IR serialization validation

Intentionally deferred to future work:

Performance optimizations (blocked layouts, SIMD, subgroup ops)
Benchmarking vs u4/fp16
NNCF integration and tooling
Additional ops (Convolution, Gather, etc.)
Signed i2 execution (infrastructure ready, pending Core element::i2 support)

All changes are gated under #ifdef COMPRESSED_WEIGHTS_INT2 and do not affect existing paths.

For the design this follows the CPU implementation strategy: prioritize correctness with a reference kernel, restrict support to linear layouts to keep the change minimal, and include i2 infrastructure early to avoid future type-system churn once Core support lands.

Testing was done with 16 functional cases (smoke_MatMulSharedCompressedWeightsU2), verified IR serialization and binary size to confirm true 2-bit storage, and observed no regressions with all changes guarded under #ifdef COMPRESSED_WEIGHTS_INT2.

Note: This PR mirrors the CPU enablement strategy: minimal, correct, and isolated. Performance and broader coverage are intentionally split to keep this change safe and reviewable.

Part of openvinotoolkit#32716. Defines UINT2/INT2 enums, updates parameter key bitfields to support 2-bit types, and implements JIT constant generation for packed 2-bit integer types (emulated via int32).

Part of openvinotoolkit#32716. - Adds 'int2_utils.cl' for bit-wise unpacking of u2/i2 values. - Updates 'fully_connected_gpu_bfyx_ref' to support on-the-fly decompression. - Implements 'ReorderWeightsKernelInt2' for 2-bit weight formatting. - Enables u2 graph pattern matching in 'compressed_weights_pattern.hpp'.

Part of openvinotoolkit#32716. Adds 'shared_matmul_weights_decompression_u2' validating end-to-end MatMul inference with 2-bit unsigned integer weights on GPU.

Addresses openvinotoolkit#32716

ruskaruma · 2025-12-14T22:14:42Z

greetings @isanghao and @ljaljushkin,

this PR focuses on backend enablement with reference kernels performance optimization is intentionally deferred to keep the change reviewable and low-risk.

would appreciate your reviews if you dont have any constraints. happy to make any necessary changes needed. thanks!

isanghao · 2025-12-16T08:58:04Z

Hi @ruskaruma, thanks for the PR. Could you share the target model or end-user user scenario of the feature? I'm trying to understand how this will benefit end-user in long term.

ruskaruma · 2025-12-17T09:17:00Z

Hi @isanghao, thank you for taking the time to review this PR. That’s a very valid question.

At the moment, there aren’t any prod ready 2bit models in OpenVINO. NNCF doesnt support 2bit quantization yet, and GPTQ or the AWQ at 2bit is still experimental. CPU already has u2 support via oneDNN, but without this change the GPU backend would fall back or implicitly expand weights.

The primary purpose of this PR is backend parity and correctness.My intent here bas been towards establishing the minimal infra needed so future optimization or tooling work can be done incrementally, without coupling everything into a single large change. I intentionally kept this PR narrowly scoped and correctness-focused, since combining backend enablement, optimized kernels, and quantization tooling in one submission would be harder to review and riskier to merge. For now, the goal is simply to keep CPU and GPU behavior aligned.

As mentioned in the PR description, this work is part of a broader, staged approach. I would be very interested to hear your thoughts on whether this direction makes sense, and I appreciate the perspective behind the question.

upstreaming the branch

ruskaruma · 2025-12-17T09:23:44Z

also, i noticed that some of the CI checks are currently failing. ive already identified the underlying issue and am working on resolving it.

ruskaruma added 5 commits December 14, 2025 14:16

feat: add 2-bit quantization type infrastructure

cbd2fd2

Part of openvinotoolkit#32716. Defines UINT2/INT2 enums, updates parameter key bitfields to support 2-bit types, and implements JIT constant generation for packed 2-bit integer types (emulated via int32).

test: add functional tests for u2 weights decompression

80863f2

Part of openvinotoolkit#32716. Adds 'shared_matmul_weights_decompression_u2' validating end-to-end MatMul inference with 2-bit unsigned integer weights on GPU.

test: add IR serialization round-trip validation for u2

65547e1

Addresses openvinotoolkit#32716

refactor: remove debug comments from reorder_weights_int2

aaee0dd

Addresses openvinotoolkit#32716

ruskaruma requested review from a team as code owners December 14, 2025 22:05

github-actions bot added the category: GPU OpenVINO GPU plugin label Dec 14, 2025

sys-openvino-ci added the ExternalPR External contributor label Dec 14, 2025

ruskaruma added 2 commits December 15, 2025 03:48

minor changes

e7b2d54

Merge branch 'master' into 2bit-quant

8bd7cca

Merge remote-tracking branch 'upstream/master' into 2bit-quant

8a2f573

upstreaming the branch

ruskaruma added 2 commits December 17, 2025 17:48

fix: syntax error

439461b

fix: syntax error PR openvinotoolkit#33243

2fee549

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GPU] Add u2 weight quantization backend support #33243

[GPU] Add u2 weight quantization backend support #33243

ruskaruma commented Dec 14, 2025

Uh oh!

ruskaruma commented Dec 14, 2025

Uh oh!

isanghao commented Dec 16, 2025

Uh oh!

ruskaruma commented Dec 17, 2025

Uh oh!

ruskaruma commented Dec 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[GPU] Add u2 weight quantization backend support #33243

Are you sure you want to change the base?

[GPU] Add u2 weight quantization backend support #33243

Conversation

ruskaruma commented Dec 14, 2025

Summary

Background

Related Issues

Uh oh!

ruskaruma commented Dec 14, 2025

Uh oh!

isanghao commented Dec 16, 2025

Uh oh!

ruskaruma commented Dec 17, 2025

Uh oh!

ruskaruma commented Dec 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants