Skip to content

Conversation

ggerganov
Copy link
Member

@ggerganov ggerganov commented Jun 3, 2025

cont #13746 (comment)

Overview

  • Remove virtual llama_kv_cache::update()
  • Remove virtual llama_kv_cache::defrag_sched()
  • Add virtual llama_kv_cache::init_update()
  • llama_kv_cache_unified::defrag_prepare() is now const

The logic for shifting and defragmenting the KV cache is now implemented using a memory state (i.e. llama_memory_state) for consistency with the decoding states that were introduced in #13746. The idea is that calling init_update() will check if any updates have to be performed without mutating the KV cache (a.k.a. the memory). We can then apply the created memory update state to perform the necessary updates:

const auto kv_state = kv_self->init_update(this, optimize);
if (kv_state->get_status() == LLAMA_MEMORY_STATUS_NO_UPDATE) {
// no updates need to be performed
return false;
}
if (!kv_state->apply()) {
LLAMA_LOG_ERROR("%s: failed to apply memory update\n", __func__);
}
}

This change generalizes the concept of updating the memory module. So far we have been doing KV cache shifts and defrags, but in the future we can do additional operations through this mechanism.

We also start to avoid the explicit "defrag" term as it is too specific for the unified KV cache. Instead, the init_update() method takes a bool optimize flag that can mean different things depending on the underlying memory implementation.

Next PRs

@ggerganov ggerganov requested a review from slaren June 4, 2025 07:33
@ggerganov ggerganov force-pushed the gg/kv-cache-refactor-update branch from ddc998b to 503dda2 Compare June 4, 2025 07:34
@ggerganov ggerganov merged commit 3e63a58 into master Jun 4, 2025
52 checks passed
@ggerganov ggerganov deleted the gg/kv-cache-refactor-update branch June 4, 2025 15:58
furyhawk pushed a commit to furyhawk/llama.cpp that referenced this pull request Jun 6, 2025
* kv-cache : refactor update mechanism

ggml-ci

* memory : improve status handling

* defrag : reset head + add comments

ggml-ci

* cont : minor fixes

ggml-ci
shefben added a commit to shefben/llama.cpp that referenced this pull request Jun 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants