-
Notifications
You must be signed in to change notification settings - Fork 3k
[CPU] FullyConnected acceleration with u2 weights decompression #31467
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CPU] FullyConnected acceleration with u2 weights decompression #31467
Conversation
c123076 to
adf6623
Compare
a328b79 to
d785baa
Compare
d785baa to
734a8f0
Compare
c256504 to
00fe735
Compare
|
@maxnick Hi Maksim, could you please take a look? |
37c631e to
f64db7c
Compare
493ae47 to
6b8f102
Compare
851da03 to
fb9389a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces FullyConnected acceleration with u2 (2-bit unsigned) weights decompression, adding support for a new precision type to improve performance in weight-compressed neural networks.
- Added u2 element type support across the CPU plugin infrastructure
- Extended FullyConnected operations to handle u2 weights with decompression
- Added comprehensive test coverage for u2 precision conversion and matrix multiplication
Reviewed Changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| src/plugins/intel_cpu/thirdparty/onednn | Updated oneDNN submodule to support u2 operations |
| src/plugins/intel_cpu/src/plugin.cpp | Added u2 to supported precision types |
| src/plugins/intel_cpu/src/nodes/fullyconnected.cpp | Extended FullyConnected to support u2 compressed weights |
| src/plugins/intel_cpu/src/nodes/executors/type_mask.hpp | Added u2 type mask definition |
| src/plugins/intel_cpu/src/nodes/executors/fullyconnected_implementations.cpp | Updated type mappings to include u2 support |
| src/plugins/intel_cpu/src/nodes/executors/dnnl/dnnl_fullyconnected_primitive.cpp | Enhanced DNNL primitive to handle u2 weights decompression |
| src/plugins/intel_cpu/src/utils/plain_tensor.hpp | Added u2 pointer handling with 4x sub-byte multiplier |
| src/plugins/intel_cpu/src/nodes/common/cpu_convert.cpp | Implemented u2 to other types conversion functions |
| src/plugins/intel_cpu/src/dnnl_extension_utils.cpp | Added u2 data type mapping utilities |
| src/tests/functional/plugin/shared/src/subgraph/weights_decompression_builders.cpp | Updated test builders to handle u2 precision ranges |
| src/plugins/intel_cpu/tests/functional/custom/subgraph_tests/src/x64/matmul_weights_decompression.cpp | Added u2-specific test cases for matrix multiplication |
| src/plugins/intel_cpu/tests/functional/custom/subgraph_tests/src/classes/matmul_weights_decompression.cpp | Enhanced test case naming to include fusion flag |
| src/plugins/intel_cpu/tests/functional/custom/single_layer_tests/instances/common/conversion.cpp | Added u2 conversion test instances |
| src/plugins/intel_cpu/tests/functional/custom/single_layer_tests/classes/conversion.cpp | Extended ARM64 precision checks to include u2 |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| if (any_of(m_dt, ov::element::i4, ov::element::u4)) { | ||
| return 2; | ||
| } | ||
| if (m_dt == ov::element::u2) { | ||
| return 4; | ||
| } | ||
| return 1; |
Copilot
AI
Sep 8, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sub-byte multiplier logic should be consolidated into a single conditional statement to avoid multiple if statements checking related precision types. Consider combining these checks or using a switch statement for better maintainability.
| if (any_of(m_dt, ov::element::i4, ov::element::u4)) { | |
| return 2; | |
| } | |
| if (m_dt == ov::element::u2) { | |
| return 4; | |
| } | |
| return 1; | |
| switch (m_dt) { | |
| case ov::element::i4: | |
| case ov::element::u4: | |
| return 2; | |
| case ov::element::u2: | |
| return 4; | |
| default: | |
| return 1; | |
| } |
| if (zpPtr && none_of(zpPtr->getDesc().getPrecision(), | ||
| ov::element::u8, | ||
| ov::element::u4, | ||
| ov::element::u2, | ||
| ov::element::dynamic)) { |
Copilot
AI
Sep 8, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The precision check list is becoming long and scattered across multiple lines. Consider extracting this into a helper function or constant to improve readability and maintainability.
| [[maybe_unused]] static uint8_t get_u2(uint8_t val, uint8_t shift) { | ||
| return static_cast<uint8_t>((val & (0x3 << shift)) >> shift); | ||
| } |
Copilot
AI
Sep 8, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The magic number 0x3 should be replaced with a named constant or documented comment explaining it represents the 2-bit mask for extracting u2 values.
maxnick
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general LGTM.
c677bec to
1ec8a2d
Compare
Update initMatMulDecompressionSubgraph and modify weight decompression kernels Apply clang format Fix issues revealed by ci Fix issue on avx2 platform Revert common transformation related changes Align subgraph tests with the new model pattern Fix issue on loading u2 zero points Support dynamic quantization for u2 Extend subgraph test cases and fix issues Apply review comments Update sub_byte_data_type_multiplier
1ec8a2d to
3144983
Compare
### Details: - *FullyConnected acceleration with u2 weights decompression.* - *OneDNN PR: openvinotoolkit/oneDNN#289 ### Tickets: - *[CVS-169357](https://jira.devtools.intel.com/browse/CVS-169357)*
abe0d47
Details:
Tickets: