Implement SparseK Attention mechanism — new GGML operator with CPU backend (GPU planned next) #16817

yael-works · 2025-10-28T13:16:40Z

New Attention Mechanism: SparseK Attention (CPU Backend)

This PR introduces a new attention mechanism called SparseK Attention, implemented from scratch as a new operator within the GGML framework, currently with CPU backend support.

Overview

SparseK Attention is a selective and efficient attention mechanism inspired by Flash Attention, but introduces additional sparsity through:

Top-K filtering – keeps only the strongest attention weights.
Local windowing – limits attention to a configurable local context.
Global stride – adds periodic global connections between tokens.

Implementation Details

Added new operator: GGML_OP_SPARSEK_ATTN defined in ggml.h and ggml.c.
Implemented construction function ggml_sparsek_attn() that creates a computation node with parameters (k_top, win_local, stride_global).
Added full CPU backend implementation in:
- ggml-cpu/ops.h
- ggml-cpu/ops.cpp
- ggml-cpu.c

The CPU version includes:

Scaled dot-product computation QKᵀ / √d
Dynamic Top-K filtering
Softmax normalization
Multiplication with V

Next Steps

Our next goal is to extend SparseK Attention to the SYCL (GPU) backend in order to:

Measure and compare performance between CPU and GPU implementations.
Optimize kernel execution for sparse attention patterns.
Validate correctness and scaling on Intel GPUs.

We are submitting this initial CPU implementation first to ensure review, integration, and baseline correctness before introducing GPU acceleration.

Co-Authors

Co-authored-by: Yael Shuker ([email protected])
Co-authored-by: Gitty Burstein ([email protected])

…or definition and tensor creation, backend implementation pending to ggml.c/h Co-authored-by: Yael Shuker <[email protected]> Co-authored-by: Gitty Burstein <[email protected]>

Co-authored-by: Yael Shuker <[email protected]> Co-authored-by: Gitty Burstein <[email protected]>

GittyBurstein · 2025-10-28T13:23:40Z

Hi @CISC and @NeoZhangJianyu,

We’d appreciate it if you could review our PR implementing the new SPARSEK Attention operator.
We ran internal validation tests we created ourselves, and all passed successfully.

This contribution was developed jointly by both of us (@yael-works and @GittyBurstein ).
Please make sure the PR reflects both contributors — if needed, we can adjust the commit authors accordingly.

Thanks in advance for your time and feedback!

CISC · 2025-10-28T13:35:43Z

We are talking about this SparseK, right?

yael-works · 2025-10-28T13:38:26Z

yes! @CISC

…gml.h Co-authored-by: Yael Shuker <[email protected]> Co-authored-by: Gitty Burstein <[email protected]>

Co-authored-by: Yael Shuker <[email protected]> Co-authored-by: Gitty Burstein <[email protected]>

Co-authored-by: Yael <[email protected]> Co-authored-by: Tamar <[email protected]>

CISC · 2025-10-30T10:52:36Z

You need to rebase to fix Server CI failures, also please fix whitespaces:
https://github.com/ggml-org/llama.cpp/actions/runs/18935125175/job/54060021809

Co-authored-by: Yael <[email protected]> Co-authored-by: Gitty <[email protected]>

…l <[email protected]> Co-authored-by: Gitty <[email protected]>

…-ops.cpp Co-authored-by: Gitty Burstein <[email protected]> Co-authored-by: Yael Shuker <[email protected]>

Co-authored-by: Gitty Burstein <[email protected]> Co-authored-by: Yael Shuker <[email protected]>

tests/test-backend-ops.cpp

Co-authored-by: Sigbjørn Skjæret <[email protected]>

yael-works added 3 commits October 28, 2025 11:25

Add skeleton for GGML_OP_SPARSEK_ATTN (SparseK Attention): new operat…

66248d2

…or definition and tensor creation, backend implementation pending to ggml.c/h Co-authored-by: Yael Shuker <[email protected]> Co-authored-by: Gitty Burstein <[email protected]>

Add CPU support for SparseK Attention (without performance checks)

5d6d3b7

Co-authored-by: Yael Shuker <[email protected]> Co-authored-by: Gitty Burstein <[email protected]>

Merge branch 'master' into feature/sparsek-attn-sycl

46325c7

yael-works requested review from ggerganov and slaren as code owners October 28, 2025 13:16

github-actions bot added testing Everything test related ggml changes relating to the ggml tensor library for machine learning labels Oct 28, 2025

DajanaV mentioned this pull request Oct 28, 2025

UPSTREAM PR #16817: Implement SparseK Attention mechanism — new GGML operator with CPU backend (GPU planned next) auroralabs-loci/llama.cpp#4

Closed

yael-works and others added 6 commits October 29, 2025 12:47

fix: add missing prototypes for ggml_sparsek_attn_set/get_params in g…

a5daf2f

…gml.h Co-authored-by: Yael Shuker <[email protected]> Co-authored-by: Gitty Burstein <[email protected]>

fix SparseK CPU operator implementation

39a117f

Co-authored-by: Yael Shuker <[email protected]> Co-authored-by: Gitty Burstein <[email protected]>

fix SparseK CPU operator implementation

612fdca

Co-authored-by: Yael <[email protected]> Co-authored-by: Tamar <[email protected]>

trigger refresh

b0194f4

test commit from Gitty

d02d937

remove test file

5fa78a2

Gitty Burstein and others added 4 commits October 30, 2025 13:35

feat: implement SparseK attention core logic

b19c244

Co-authored-by: Yael <[email protected]> Co-authored-by: Gitty <[email protected]>

Implement final optimized SparseK Attention (CPU) Co-authored-by: Yae…

49c7e4b

…l <[email protected]> Co-authored-by: Gitty <[email protected]>

style: remove trailing whitespace and fix indentation in test-backend…

939bbd9

…-ops.cpp Co-authored-by: Gitty Burstein <[email protected]> Co-authored-by: Yael Shuker <[email protected]>

delete Trailing whitespace

1983ab3

Co-authored-by: Gitty Burstein <[email protected]> Co-authored-by: Yael Shuker <[email protected]>

CISC reviewed Oct 30, 2025

View reviewed changes

tests/test-backend-ops.cpp Outdated Show resolved Hide resolved

tests/test-backend-ops.cpp Outdated Show resolved Hide resolved

tests/test-backend-ops.cpp Outdated Show resolved Hide resolved

GittyBurstein and others added 3 commits October 31, 2025 01:56

Update tests/test-backend-ops.cpp

202c5d1

Co-authored-by: Sigbjørn Skjæret <[email protected]>

Update tests/test-backend-ops.cpp

9712967

Co-authored-by: Sigbjørn Skjæret <[email protected]>

Update tests/test-backend-ops.cpp

77f4088

Co-authored-by: Sigbjørn Skjæret <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement SparseK Attention mechanism — new GGML operator with CPU backend (GPU planned next) #16817

Implement SparseK Attention mechanism — new GGML operator with CPU backend (GPU planned next) #16817

yael-works commented Oct 28, 2025

Uh oh!

GittyBurstein commented Oct 28, 2025 •

edited

Loading

Uh oh!

CISC commented Oct 28, 2025

Uh oh!

yael-works commented Oct 28, 2025 •

edited

Loading

Uh oh!

CISC commented Oct 30, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Implement SparseK Attention mechanism — new GGML operator with CPU backend (GPU planned next) #16817

Are you sure you want to change the base?

Implement SparseK Attention mechanism — new GGML operator with CPU backend (GPU planned next) #16817

Conversation

yael-works commented Oct 28, 2025

New Attention Mechanism: SparseK Attention (CPU Backend)

Overview

Implementation Details

Next Steps

Co-Authors

Uh oh!

GittyBurstein commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CISC commented Oct 28, 2025

Uh oh!

yael-works commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CISC commented Oct 30, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

GittyBurstein commented Oct 28, 2025 •

edited

Loading

yael-works commented Oct 28, 2025 •

edited

Loading