[SYCL] fix llama_kv_cache hang when kv_cache is huge: 5GB by arthw · Pull Request #21283 · ggml-org/llama.cpp

arthw · 2026-04-02T00:12:45Z

In llama_kv_cache, when the cache size is huge, like 5GB, the code will hang.
The root cause is the memset() can't support more than 4GB.
Verified on Arc770.

fix llama_kv_cache hang when kv_cache is huge: 5GB

fe9309c

loci-dev mentioned this pull request Apr 2, 2026

UPSTREAM PR #21283: [SYCL] fix llama_kv_cache hang when kv_cache is huge: 5GB auroralabs-loci/llama.cpp#1326

Open

github-actions bot added ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language labels Apr 2, 2026

ggerganov merged commit 4888137 into ggml-org:master Apr 2, 2026
45 of 46 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SYCL] fix llama_kv_cache hang when kv_cache is huge: 5GB#21283

[SYCL] fix llama_kv_cache hang when kv_cache is huge: 5GB#21283
ggerganov merged 1 commit intoggml-org:masterfrom
arthw:fix_buffer_clear

arthw commented Apr 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

arthw commented Apr 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants