CANN: GATED_LINEAR_ATTN #11
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description:
在CANN后端实现并优化门控线性注意力(GATED_LINEAR_ATTN)操作。本次实现包含以下主要内容:
该功能的实现使CANN后端能够高效处理使用门控线性注意力机制的模型,如Llama-3.1-Nemotron等现代大语言模型。
Testing:
使用官方测试框架对实现进行了全面验证:
[2501213363@cninfer04 llama.cpp]$ ./build/bin/test-backend-ops test -b CANN0 -o GATED_LINEAR_ATTN
测试环境:
硬件平台:Ascend310P3
设备内存:44280 MB (43087 MB free)
后端:CANN0
测试结果:
GATED_LINEAR_ATTN(type=f32,head_count=32,head_size=64,n_seq_tokens=1,n_seqs=1): OK
GATED_LINEAR_ATTN(type=f32,head_count=32,head_size=64,n_seq_tokens=32,n_seqs=1): OK
GATED_LINEAR_ATTN(type=f32,head_count=32,head_size=64,n_seq_tokens=32,n_seqs=4): OK
GATED_LINEAR_ATTN(type=f32,head_count=32,head_size=64,n_seq_tokens=128,n_seqs=4): OK
11821/11821 tests passed
Backend CANN0: OK
9/9 backends passed
测试覆盖了不同批次大小、序列长度的组合,验证了实现的正确性和鲁棒性。所有测试用例均成功通过,表明CANN后端的GATED_LINEAR_ATTN实现与预期行为完全一致。
Notes:
核心实现细节
性能优化措施
技术特点
兼容性
此实现将使llama.cpp在华为昇腾AI处理器上能够高效运行使用门控线性注意力机制的现代大语言模型,拓展了框架的硬件支持范围。