We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
There was an error while loading. Please reload this page.
1 parent 7f14ac1 commit 714bef0Copy full SHA for 714bef0
src/llama-kv-cache.cpp
@@ -441,6 +441,12 @@ void llama_kv_cache_unified::defrag_sched(float thold) {
441
442
void llama_kv_cache_unified::set_full() {
443
n = size;
444
+
445
+ // when simulating a full KV cache, the specific value of the "head" pointer is not important because we are not
446
+ // going to write any data - we just want to measure the memory needed by the graph in such state.
447
+ // we should only guarantee that the head position won't cause out-of-bounds view of the K, V tensors, so
448
+ // setting it to 0 is the simplest way to achieve that
449
+ // ref: https://github.com/ggml-org/llama.cpp/issues/13359
450
head = 0;
451
}
452
0 commit comments