Skip to content

Fix weight layout detection for MatMul with transpose in OpenVINO backend#3958

Open
naraen-ram wants to merge 1 commit intoopenvinotoolkit:developfrom
naraen-ram:fix-matmul-transpose-compression
Open

Fix weight layout detection for MatMul with transpose in OpenVINO backend#3958
naraen-ram wants to merge 1 commit intoopenvinotoolkit:developfrom
naraen-ram:fix-matmul-transpose-compression

Conversation

@naraen-ram
Copy link

Fixes #3230

Description

Fix incorrect weight layout detection for MatMul layers when transpose is applied via OpenVINO graph rather than constant attributes.

Previously, constant_layer_attrs["transpose"] did not reflect graph-level transpose nodes, which could lead to incorrect channel axis detection during weight compression.

This change checks input_attributes metadata to correctly determine transpose state before computing layout.

Testing

Reproduced issue using custom Transpose → MatMul OpenVINO model.

Verified using:
pytest tests/openvino/native/quantization/test_weights_compression.py -k matmul

Result:
12 passed, 0 failed

Impact

Fixes weight compression correctness for:

  • AWQ
  • Mixed precision
  • Scale estimation
  • LoRA correction

@naraen-ram
Copy link
Author

Hi @ljaljushkin ,

Just a follow-up on this PR. I’d appreciate a review whenever convenient.

This contribution is part of my preparation for GSoC 2026, so getting feedback would help me move forward. Happy to make any requested changes.

Thanks!

@daniil-lyakhov daniil-lyakhov self-assigned this Mar 9, 2026
@daniil-lyakhov daniil-lyakhov self-requested a review March 9, 2026 17:56
@daniil-lyakhov
Copy link
Collaborator

Hi @naraen-ram, thank you for your contribution.

  1. Please describe a use case when this fails a model compression/quantization
  2. Layer attributes should not reflect neighboring nodes in any way, the transpose layer attribute is openvino MatMul specific parameter, it does not correlate with anything

@naraen-ram
Copy link
Author

Hi @daniil-lyakhov,

During my investigation of the issue, I reproduced a scenario where the transpose appears as a separate Transpose → MatMul pattern in the OpenVINO graph rather than through the transpose_a / transpose_b attributes of the MatMul operator. In this case, the graph contains a Transpose node feeding into MatMul, while the MatMul attributes still report transpose_a=False. My initial implementation attempted to account for this situation during layout detection.

After revisiting the issue description again, I realized that the original task specifically refers to supporting MatMul(transpose_a=True) cases rather than transpose as a separate node.

I'll reproduce a model that produces transpose_a=True in the MatMul attributes and adjust the implementation accordingly.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Good First Issue][NNCF]: Support transposed input for data-aware weight compression methods

2 participants