Skip to content

[CPU]Add Support For GatedDeltaNet#34447

Open
zhangYiIntel wants to merge 26 commits intoopenvinotoolkit:masterfrom
zhangYiIntel:yi3/support_gdn
Open

[CPU]Add Support For GatedDeltaNet#34447
zhangYiIntel wants to merge 26 commits intoopenvinotoolkit:masterfrom
zhangYiIntel:yi3/support_gdn

Conversation

@zhangYiIntel
Copy link
Contributor

Details:

  • Add Internal Op GatedDeltaNet
  • Add GatedDeltaNet Fusion
  • Add GatedDeltaNet CPU kernel

Tickets:

AI Assistance:

  • AI assistance used: no
  • If yes, summarize how AI was used and what human validation was performed (build/tests/manual checks).

@github-actions github-actions bot added category: Core OpenVINO Core (aka ngraph) category: IE Tests OpenVINO Test: plugins and common category: CPU OpenVINO CPU plugin category: build OpenVINO cmake script / infra category: Python API OpenVINO Python bindings category: transformations OpenVINO Runtime library - Transformations labels Mar 3, 2026
@yuxu42 yuxu42 requested a review from Copilot March 4, 2026 03:28
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces CPU support for the GatedDeltaNet operation, a recurrent linear attention variant. It adds:

  1. An internal OpenVINO GatedDeltaNet op (core) with a new ov::op::GatedDeltaNet class, shape inference, and Python bindings.
  2. A pattern-matching fusion transformation (GatedDeltaNetFusion) that replaces a Loop-based subgraph with the fused op.
  3. A CPU kernel (recurrent_linear_attn) with AVX512F/AVX2/scalar code paths, a CPU node wrapper, and functional tests.

Changes:

  • New internal op GatedDeltaNet with validation, shape inference, config struct, and Python binding
  • New GatedDeltaNetFusion transformation that fuses the Loop-based GDN subgraph into the internal op
  • New CPU node and recurrent_linear_attn kernel (AVX512F + fallback), along with functional smoke tests

Reviewed changes

Copilot reviewed 24 out of 24 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
src/core/dev_api/openvino/op/gated_delta_net.hpp New internal op declaration with Config struct
src/core/src/op/gated_delta_net.cpp Op implementation: validation, shape inference, clone_with_new_inputs
src/common/transformations/include/transformations/common_optimizations/fuse_gated_delta_net.hpp Fusion pass declaration
src/common/transformations/src/transformations/common_optimizations/fuse_gated_delta_net.cpp Fusion pass implementation (Loop → GatedDeltaNet)
src/plugins/intel_cpu/src/nodes/gated_delta_net.h CPU node header
src/plugins/intel_cpu/src/nodes/gated_delta_net.cpp CPU node execute logic
src/plugins/intel_cpu/src/nodes/kernels/linear_attn/recurrent_linear_attn.hpp Kernel function declaration
src/plugins/intel_cpu/src/nodes/kernels/linear_attn/recurrent_linear_attn.cpp AVX512F/AVX2/scalar kernel implementation
src/plugins/intel_cpu/src/transformations/transformation_pipeline.cpp Registers fusion pass in PreLpt pipeline
src/plugins/intel_cpu/src/nodes_factory.cpp Registers GatedDeltaNet CPU node (x64 only)
src/plugins/intel_cpu/src/graph_optimizer.cpp Excludes GatedDeltaNet from tail precision optimization
src/plugins/intel_cpu/src/cpu_types.h Adds GatedDeltaNet to Type enum
src/plugins/intel_cpu/src/cpu_types.cpp Adds type name mapping for GatedDeltaNet
src/plugins/intel_cpu/src/extension.cpp Registers op extension for serialization
src/plugins/intel_cpu/CMakeLists.txt Adds cross-compiled build entry for kernel
src/bindings/python/src/pyopenvino/graph/ops/gated_delta_net.hpp Python binding header
src/bindings/python/src/pyopenvino/graph/ops/gated_detla_net.cpp Python binding implementation (typo in filename)
src/bindings/python/src/pyopenvino/pyopenvino.cpp Registers Python binding class
src/bindings/python/src/openvino/op/__init__.pyi Exports _GatedDeltaNet to Python stubs
src/bindings/python/src/openvino/_pyopenvino/op/__init__.pyi Adds Python stub for _GatedDeltaNet
src/tests/functional/plugin/shared/include/shared_test_classes/subgraph/gated_delta_net.hpp Test class declaration
src/tests/functional/plugin/shared/include/subgraph_tests/gated_delta_net.hpp Test body with CompareWithRefs
src/tests/functional/plugin/shared/src/subgraph/gated_delta_net.cpp Test setup (reference Loop model + fused model)
src/plugins/intel_cpu/tests/functional/shared_tests_instances/subgraph_tests/gated_delta_net.cpp CPU-specific test instantiation

You can also share your feedback on Copilot code review. Take the survey.

@zhangYiIntel zhangYiIntel force-pushed the yi3/support_gdn branch 2 times, most recently from bd6fe37 to b8641b9 Compare March 4, 2026 08:39
@zhangYiIntel zhangYiIntel force-pushed the yi3/support_gdn branch 2 times, most recently from 60a0acf to 9a550e8 Compare March 6, 2026 01:50
@zhangYiIntel zhangYiIntel marked this pull request as ready for review March 6, 2026 02:38
@zhangYiIntel zhangYiIntel requested review from a team as code owners March 6, 2026 02:38
@mlukasze mlukasze requested a review from Copilot March 6, 2026 05:50
return scale_node.get_node_shared_ptr();
}
} else {
return ov::op::v0::Constant::create(default_scale_type, ov::Shape{}, {1.0});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When it is possible? sclae_node pattern is any_input().
Maybe it would be better to retrieve the scale node outside the helper function and check that it is not nullptr? For example, in that case we would not need to pass the default_scale_type matcher to the function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


} // namespace

TEST_F(TransformationTestsF, GatedDeltaNetFusion_BuildLoopedGDNMode) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to add the tests without Convert layers, with different Transposes on inputs and outputs, with different input and output shapes (when the inputs rank or output rank can be changed)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The loop body are coupled with the transposes, the combination is not limited. Unlike SDPA case which has fixed order in QK matmul no matter q/k/v transposed or not, GDN's loop body is very diverse, I think in this release it's better to focus on current Qwen3Next pattern.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 25 out of 25 changed files in this pull request and generated 3 comments.

"The head size in key and query should be the same, but got ",
k_head_size,
" and ",
v_head_size,
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The head-size mismatch validation compares k_head_size vs q_head_size, but the error message prints v_head_size as the second value (line 107). This will mislead users when debugging invalid models.

Please update the message to report q_head_size as the second operand.

Suggested change
v_head_size,
q_head_size,

Copilot uses AI. Check for mistakes.
@zhangYiIntel zhangYiIntel force-pushed the yi3/support_gdn branch 2 times, most recently from fca11d0 to de40b3b Compare March 12, 2026 01:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: build OpenVINO cmake script / infra category: Core OpenVINO Core (aka ngraph) category: CPU OpenVINO CPU plugin category: IE Tests OpenVINO Test: plugins and common category: transformations OpenVINO Runtime library - Transformations Code Freeze

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants