We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
There was an error while loading. Please reload this page.
1 parent d2c30c6 commit 940af8dCopy full SHA for 940af8d
src/llama-kv-cache-iswa.cpp
@@ -47,7 +47,7 @@ llama_kv_cache_iswa::llama_kv_cache_iswa(
47
48
// note: the SWA cache is always padded to 256 for performance
49
// https://github.com/ggml-org/llama.cpp/issues/17037
50
- uint32_t size_swa = std::min(size_base, GGML_PAD(hparams.n_swa*(unified ? n_seq_max : 1) + n_ubatch, 256));
+ uint32_t size_swa = GGML_PAD(std::min(size_base, hparams.n_swa*(unified ? n_seq_max : 1) + n_ubatch), 256);
51
52
// when using full-size SWA cache, we set the SWA cache size to be equal to the base cache size
53
if (swa_full) {
0 commit comments