[CPU]Add Support For GatedDeltaNet#34447
[CPU]Add Support For GatedDeltaNet#34447zhangYiIntel wants to merge 26 commits intoopenvinotoolkit:masterfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR introduces CPU support for the GatedDeltaNet operation, a recurrent linear attention variant. It adds:
- An internal OpenVINO
GatedDeltaNetop (core) with a newov::op::GatedDeltaNetclass, shape inference, and Python bindings. - A pattern-matching fusion transformation (
GatedDeltaNetFusion) that replaces a Loop-based subgraph with the fused op. - A CPU kernel (
recurrent_linear_attn) with AVX512F/AVX2/scalar code paths, a CPU node wrapper, and functional tests.
Changes:
- New internal op
GatedDeltaNetwith validation, shape inference, config struct, and Python binding - New
GatedDeltaNetFusiontransformation that fuses the Loop-based GDN subgraph into the internal op - New CPU node and
recurrent_linear_attnkernel (AVX512F + fallback), along with functional smoke tests
Reviewed changes
Copilot reviewed 24 out of 24 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
src/core/dev_api/openvino/op/gated_delta_net.hpp |
New internal op declaration with Config struct |
src/core/src/op/gated_delta_net.cpp |
Op implementation: validation, shape inference, clone_with_new_inputs |
src/common/transformations/include/transformations/common_optimizations/fuse_gated_delta_net.hpp |
Fusion pass declaration |
src/common/transformations/src/transformations/common_optimizations/fuse_gated_delta_net.cpp |
Fusion pass implementation (Loop → GatedDeltaNet) |
src/plugins/intel_cpu/src/nodes/gated_delta_net.h |
CPU node header |
src/plugins/intel_cpu/src/nodes/gated_delta_net.cpp |
CPU node execute logic |
src/plugins/intel_cpu/src/nodes/kernels/linear_attn/recurrent_linear_attn.hpp |
Kernel function declaration |
src/plugins/intel_cpu/src/nodes/kernels/linear_attn/recurrent_linear_attn.cpp |
AVX512F/AVX2/scalar kernel implementation |
src/plugins/intel_cpu/src/transformations/transformation_pipeline.cpp |
Registers fusion pass in PreLpt pipeline |
src/plugins/intel_cpu/src/nodes_factory.cpp |
Registers GatedDeltaNet CPU node (x64 only) |
src/plugins/intel_cpu/src/graph_optimizer.cpp |
Excludes GatedDeltaNet from tail precision optimization |
src/plugins/intel_cpu/src/cpu_types.h |
Adds GatedDeltaNet to Type enum |
src/plugins/intel_cpu/src/cpu_types.cpp |
Adds type name mapping for GatedDeltaNet |
src/plugins/intel_cpu/src/extension.cpp |
Registers op extension for serialization |
src/plugins/intel_cpu/CMakeLists.txt |
Adds cross-compiled build entry for kernel |
src/bindings/python/src/pyopenvino/graph/ops/gated_delta_net.hpp |
Python binding header |
src/bindings/python/src/pyopenvino/graph/ops/gated_detla_net.cpp |
Python binding implementation (typo in filename) |
src/bindings/python/src/pyopenvino/pyopenvino.cpp |
Registers Python binding class |
src/bindings/python/src/openvino/op/__init__.pyi |
Exports _GatedDeltaNet to Python stubs |
src/bindings/python/src/openvino/_pyopenvino/op/__init__.pyi |
Adds Python stub for _GatedDeltaNet |
src/tests/functional/plugin/shared/include/shared_test_classes/subgraph/gated_delta_net.hpp |
Test class declaration |
src/tests/functional/plugin/shared/include/subgraph_tests/gated_delta_net.hpp |
Test body with CompareWithRefs |
src/tests/functional/plugin/shared/src/subgraph/gated_delta_net.cpp |
Test setup (reference Loop model + fused model) |
src/plugins/intel_cpu/tests/functional/shared_tests_instances/subgraph_tests/gated_delta_net.cpp |
CPU-specific test instantiation |
You can also share your feedback on Copilot code review. Take the survey.
src/plugins/intel_cpu/src/nodes/kernels/linear_attn/recurrent_linear_attn.cpp
Outdated
Show resolved
Hide resolved
src/common/transformations/src/transformations/common_optimizations/fuse_gated_delta_net.cpp
Outdated
Show resolved
Hide resolved
src/tests/functional/plugin/shared/src/subgraph/gated_delta_net.cpp
Outdated
Show resolved
Hide resolved
src/bindings/python/src/pyopenvino/graph/ops/gated_detla_net.cpp
Outdated
Show resolved
Hide resolved
src/plugins/intel_cpu/src/nodes/kernels/linear_attn/recurrent_linear_attn.cpp
Outdated
Show resolved
Hide resolved
bd6fe37 to
b8641b9
Compare
60a0acf to
9a550e8
Compare
src/common/transformations/src/transformations/common_optimizations/fuse_gated_delta_net.cpp
Show resolved
Hide resolved
src/common/transformations/src/transformations/common_optimizations/fuse_gated_delta_net.cpp
Outdated
Show resolved
Hide resolved
| return scale_node.get_node_shared_ptr(); | ||
| } | ||
| } else { | ||
| return ov::op::v0::Constant::create(default_scale_type, ov::Shape{}, {1.0}); |
There was a problem hiding this comment.
When it is possible? sclae_node pattern is any_input().
Maybe it would be better to retrieve the scale node outside the helper function and check that it is not nullptr? For example, in that case we would not need to pass the default_scale_type matcher to the function.
|
|
||
| } // namespace | ||
|
|
||
| TEST_F(TransformationTestsF, GatedDeltaNetFusion_BuildLoopedGDNMode) { |
There was a problem hiding this comment.
I would suggest to add the tests without Convert layers, with different Transposes on inputs and outputs, with different input and output shapes (when the inputs rank or output rank can be changed)
There was a problem hiding this comment.
The loop body are coupled with the transposes, the combination is not limited. Unlike SDPA case which has fixed order in QK matmul no matter q/k/v transposed or not, GDN's loop body is very diverse, I think in this release it's better to focus on current Qwen3Next pattern.
src/common/transformations/src/transformations/common_optimizations/fuse_gated_delta_net.cpp
Outdated
Show resolved
Hide resolved
...common/transformations/include/transformations/common_optimizations/fuse_gated_delta_net.hpp
Outdated
Show resolved
Hide resolved
...common/transformations/include/transformations/common_optimizations/fuse_gated_delta_net.hpp
Outdated
Show resolved
Hide resolved
c05ae61 to
a0e8bd0
Compare
src/common/transformations/src/transformations/common_optimizations/fuse_gated_delta_net.cpp
Outdated
Show resolved
Hide resolved
src/core/src/op/gated_delta_net.cpp
Outdated
| "The head size in key and query should be the same, but got ", | ||
| k_head_size, | ||
| " and ", | ||
| v_head_size, |
There was a problem hiding this comment.
The head-size mismatch validation compares k_head_size vs q_head_size, but the error message prints v_head_size as the second value (line 107). This will mislead users when debugging invalid models.
Please update the message to report q_head_size as the second operand.
| v_head_size, | |
| q_head_size, |
src/plugins/intel_cpu/src/nodes/kernels/linear_attn/recurrent_linear_attn.cpp
Outdated
Show resolved
Hide resolved
src/common/transformations/src/transformations/common_optimizations/fuse_gated_delta_net.cpp
Outdated
Show resolved
Hide resolved
src/common/transformations/src/transformations/common_optimizations/fuse_gated_delta_net.cpp
Outdated
Show resolved
Hide resolved
src/common/transformations/src/transformations/common_optimizations/fuse_gated_delta_net.cpp
Outdated
Show resolved
Hide resolved
src/common/transformations/src/transformations/common_optimizations/fuse_gated_delta_net.cpp
Outdated
Show resolved
Hide resolved
...common/transformations/include/transformations/common_optimizations/fuse_gated_delta_net.hpp
Show resolved
Hide resolved
fca11d0 to
de40b3b
Compare
de40b3b to
2d55899
Compare
745088c to
50993ae
Compare
50993ae to
0f3d64d
Compare
Details:
Tickets:
AI Assistance: