Skip to content

Commit 5d884e6

Browse files
committed
kv-cache : pad the size of the small SWA cache for performance
1 parent 22c8c3c commit 5d884e6

File tree

1 file changed

+3
-1
lines changed

1 file changed

+3
-1
lines changed

src/llama-kv-cache-iswa.cpp

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,9 @@ llama_kv_cache_iswa::llama_kv_cache_iswa(
4545

4646
const uint32_t size_base = kv_size;
4747

48-
uint32_t size_swa = std::min(size_base, GGML_PAD(hparams.n_swa*(unified ? n_seq_max : 1) + n_ubatch, n_pad));
48+
// note: the SWA cache is always padded to 256 for performance
49+
// https://github.com/ggml-org/llama.cpp/issues/17037
50+
uint32_t size_swa = std::min(size_base, GGML_PAD(hparams.n_swa*(unified ? n_seq_max : 1) + n_ubatch, 256));
4951

5052
// when using full-size SWA cache, we set the SWA cache size to be equal to the base cache size
5153
if (swa_full) {

0 commit comments

Comments
 (0)