Skip to content

Commit 39b27f0

Browse files
authored
(revert) kv-cache : do not quantize SWA KV cache (#21332)
This reverts commit 17193cc.
1 parent f49e917 commit 39b27f0

File tree

1 file changed

+1
-2
lines changed

1 file changed

+1
-2
lines changed

src/llama-kv-cache-iswa.cpp

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -66,9 +66,8 @@ llama_kv_cache_iswa::llama_kv_cache_iswa(
6666

6767
LLAMA_LOG_INFO("%s: creating SWA KV cache, size = %u cells\n", __func__, size_swa);
6868

69-
// note: the SWA cache is never quantized because it is relatively small
7069
kv_swa = std::make_unique<llama_kv_cache>(
71-
model, GGML_TYPE_F16, GGML_TYPE_F16,
70+
model, type_k, type_v,
7271
v_trans, offload, unified, size_swa, n_seq_max, n_pad,
7372
hparams.n_swa, hparams.swa_type, filter_swa, reuse);
7473
}

0 commit comments

Comments
 (0)