CANN: GATED_LINEAR_ATTN #11

ewykric · 2025-12-03T08:49:28Z

Description:

在CANN后端实现并优化门控线性注意力(GATED_LINEAR_ATTN)操作。本次实现包含以下主要内容：

添加了必要的头文件引用（ aclnnop/aclnn_mv.h ）以支持矩阵向量乘法操作
实现了完整的 ggml_cann_gated_linear_attn 函数，为华为昇腾AI处理器提供高效的门控线性注意力计算支持
该实现严格遵循ggml API接口规范，确保与现有框架无缝集成
该功能的实现使CANN后端能够高效处理使用门控线性注意力机制的模型，如Llama-3.1-Nemotron等现代大语言模型。

Testing:

使用官方测试框架对实现进行了全面验证：
[2501213363@cninfer04 llama.cpp]$ ./build/bin/test-backend-ops test -b CANN0 -o GATED_LINEAR_ATTN
测试环境：

硬件平台：Ascend310P3
设备内存：44280 MB (43087 MB free)
后端：CANN0
测试结果：
GATED_LINEAR_ATTN(type=f32,head_count=32,head_size=64,n_seq_tokens=1,n_seqs=1): OK
GATED_LINEAR_ATTN(type=f32,head_count=32,head_size=64,n_seq_tokens=32,n_seqs=1): OK
GATED_LINEAR_ATTN(type=f32,head_count=32,head_size=64,n_seq_tokens=32,n_seqs=4): OK
GATED_LINEAR_ATTN(type=f32,head_count=32,head_size=64,n_seq_tokens=128,n_seqs=4): OK
11821/11821 tests passed
Backend CANN0: OK
9/9 backends passed
测试覆盖了不同批次大小、序列长度的组合，验证了实现的正确性和鲁棒性。所有测试用例均成功通过，表明CANN后端的GATED_LINEAR_ATTN实现与预期行为完全一致。

Notes:

核心实现细节

实现了完整的门控线性注意力三步计算流程：
1. 计算k*v外积
2. 应用门控并更新状态矩阵
3. 计算最终输出（状态矩阵转置与查询向量的矩阵向量乘法）

性能优化措施

预分配缓冲区：避免在循环中重复分配内存，减少内存分配开销
可重用张量：创建固定的缓冲区张量，在迭代过程中重复使用
预创建参数数组：提前创建重复模式等参数数组，避免重复构造
资源及时释放：确保临时张量和资源在使用完毕后立即释放，优化内存使用

技术特点

支持任意批次大小(B)、序列长度(L)、注意力头数(H)和头维度(D)
正确处理张量内存布局和偏移量计算
实现了高效的状态更新机制，避免不必要的数据复制
通过矩阵转置和矩阵向量乘法优化计算效率

兼容性

完全兼容现有的ggml GATED_LINEAR_ATTN操作接口
支持与CUDA、SYCL等其他后端相同的输入输出规范
实现了相同的缩放因子应用逻辑，确保计算精度一致性
此实现将使llama.cpp在华为昇腾AI处理器上能够高效运行使用门控线性注意力机制的现代大语言模型，拓展了框架的硬件支持范围。

…rg#17764) * Squashed commit of the following: commit b3c6bf4 Author: Abhijit Ramesh <[email protected]> Date: Mon Dec 1 18:29:00 2025 -0800 ggml webgpu: fix xielu parameter passing (noemotiovon#11) The XIELU operation was incorrectly using static_cast to convert float parameters to uint32_t, which converted numeric values instead of preserving IEEE 754 bit patterns. This caused incorrect values to be interpreted by the GPU shader. * Use reinterpret_cast to preserve float bit patterns when passing through uint32_t params buffer * Update WGSL shader parameter types from u32 to f32 * Re-enable XIELU support (was disabled due to numerical issues) Fixes NMSE test failures for XIELU operation on WebGPU backend. commit 5ca9b5e Author: neha-ha <[email protected]> Date: Tue Nov 18 12:17:00 2025 -0800 Refactored pipelines and workgroup calculations (noemotiovon#10) * refactored pipelines * refactored workgroup calculation * removed commented out block of prior maps * Clean up ceiling division pattern --------- Co-authored-by: Neha Abbas <[email protected]> Co-authored-by: Reese Levine <[email protected]> Author: James Contini <[email protected]> Date: Wed Oct 29 23:13:06 2025 -0700 formatted embed wgsl and ggml-webgpu.cpp commit e1f6bae Author: James Contini <[email protected]> Date: Wed Oct 29 23:08:37 2025 -0700 implemented REPL_Template support and removed bug in unary operators kernel commit 8c70b8f Author: James Contini <[email protected]> Date: Wed Oct 15 16:14:20 2025 -0700 responded and dealt with PR comments commit f9282c6 Author: James Contini <[email protected]> Date: Sun Oct 12 13:41:41 2025 -0700 removed unnecesarry checking if node->src[1] exists for unary operators commit 4cf28d7 Author: James Contini <[email protected]> Date: Sun Oct 12 13:32:45 2025 -0700 All operators (inlcluding xielu) working commit 74c6add Author: James Contini <[email protected]> Date: Fri Oct 10 13:16:48 2025 -0700 fixed autoconfig commit 3627499 Author: James Contini <[email protected]> Date: Fri Oct 10 13:10:46 2025 -0700 removed vestigial files commit cb08583 Author: James Contini <[email protected]> Date: Fri Oct 10 12:59:32 2025 -0700 abides by editor-config commit 5360e28 Author: James Contini <[email protected]> Date: Fri Oct 10 12:45:57 2025 -0700 rms_norm double declaration bug atoned commit 7b09baa Merge: 8a6ec84 74b8fc1 Author: James Contini <[email protected]> Date: Fri Oct 10 11:50:03 2025 -0700 resolving merge conflicts commit 8a6ec84 Author: James Contini <[email protected]> Date: Wed Oct 8 18:06:47 2025 -0700 unary operators pass ggml tests commit c3ae382 Author: James Contini <[email protected]> Date: Wed Oct 1 16:22:40 2025 -0700 neg passes backend test commit aa1c9b2 Author: James Contini <[email protected]> Date: Tue Sep 30 23:55:27 2025 -0700 neg f16xf32xip builds and runs, havent actually ran a model that uses neg kernel yet though Co-authored-by: James Contini <[email protected]> Co-authored-by: Neha Abbas <[email protected]> Co-authored-by: Abhijit Ramesh <[email protected]> * Remove extra code and format * Add ops documentation (finally) * Update ggml/src/ggml-webgpu/wgsl-shaders/embed_wgsl.py Co-authored-by: Sigbjørn Skjæret <[email protected]> --------- Co-authored-by: James Contini <[email protected]> Co-authored-by: Neha Abbas <[email protected]> Co-authored-by: Abhijit Ramesh <[email protected]> Co-authored-by: Sigbjørn Skjæret <[email protected]>

赵禹昇 and others added 2 commits October 31, 2025 16:50

support gated linear attn

004f090

CANN: aclnn_ops.cpp gatedlinearattn optimization

ff7919e

ewykric changed the title ~~Feature/gatedlinearattn~~ CANN: GATED_LINEAR_ATTN Dec 3, 2025

CANN: aclnn_ops.cpp gatedlinearattn optimization

0652329

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CANN: GATED_LINEAR_ATTN #11

CANN: GATED_LINEAR_ATTN #11

Uh oh!

ewykric commented Dec 3, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

CANN: GATED_LINEAR_ATTN #11

Are you sure you want to change the base?

CANN: GATED_LINEAR_ATTN #11

Uh oh!

Conversation

ewykric commented Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description:

Testing:

Notes:

核心实现细节

性能优化措施

技术特点

兼容性

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ewykric commented Dec 3, 2025 •

edited

Loading