Skip to content

UPSTREAM PR #21283: [SYCL] fix llama_kv_cache hang when kv_cache is huge: 5GB#1326

Open
loci-dev wants to merge 1 commit intomainfrom
loci/pr-21283-fix_buffer_clear
Open

UPSTREAM PR #21283: [SYCL] fix llama_kv_cache hang when kv_cache is huge: 5GB#1326
loci-dev wants to merge 1 commit intomainfrom
loci/pr-21283-fix_buffer_clear

Conversation

@loci-dev
Copy link
Copy Markdown

@loci-dev loci-dev commented Apr 2, 2026

Note

Source pull request: ggml-org/llama.cpp#21283

In llama_kv_cache, when the cache size is huge, like 5GB, the code will hang.
The root cause is the memset() can't support more than 4GB.
Verified on Arc770.

@loci-review
Copy link
Copy Markdown

loci-review bot commented Apr 2, 2026

No meaningful performance changes were detected across 123165 analyzed functions in the following binaries: build.bin.libllama.so, build.bin.llama-bench, build.bin.libmtmd.so, build.bin.llama-cvector-generator, build.bin.llama-tts, build.bin.llama-quantize, build.bin.llama-qwen2vl-cli, build.bin.llama-tokenize, build.bin.llama-gemma3-cli, build.bin.llama-gguf-split, build.bin.llama-llava-cli, build.bin.llama-minicpmv-cli, build.bin.libggml-base.so, build.bin.libggml-cpu.so, build.bin.libggml.so.

🔎 Full breakdown: Loci Inspector
💬 Questions? Tag @loci-dev

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants