Skip to content

Conversation

@ggerganov
Copy link
Member

@ggerganov ggerganov commented Jun 4, 2025

cont #13988

  • Merge llama_kv_cache into llama_memory_i
  • llama_kv_cache_unified now implements llama_memory_i
  • llama_kv_cache_recurrent now implements llama_memory_i
  • Add new llama_memory_ public API to libllama
  • The old llama_kv_self_* public API is now simply routing to the new llama_memory_ API and it will be deprecated in the next PR

TODO

  • Implement the new llama_memory_ public API

Next PRs

  • Deprecate the llama_kv_self_* public API in favor of the new llama_memory_ API

Base automatically changed from gg/kv-cache-refactor-update to master June 4, 2025 15:58
@ggerganov ggerganov force-pushed the gg/llama-memory-public branch from fe4b1b3 to bca2671 Compare June 5, 2025 06:16
@ggerganov ggerganov force-pushed the gg/llama-memory-public branch from bca2671 to f149a8e Compare June 5, 2025 06:36
@ggerganov ggerganov marked this pull request as ready for review June 5, 2025 06:36

// general concept of LLM memory
// the KV cache is a type of LLM memory, but there can be other types
struct llama_memory_i {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed this from class to struct to be compatible with the C-header declaration.

@ggerganov ggerganov requested a review from slaren June 5, 2025 06:38
llama_kv_cache * kv_self = static_cast<llama_kv_cache *>(memory.get());
return kv_self;
llama_memory_t llama_context::get_memory() const {
return static_cast<llama_memory_t>(memory.get());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This cast shouldn't be necessary.

// deprecated
llama_kv_cache * llama_get_kv_self(llama_context * ctx) {
return ctx->get_kv_self();
return static_cast<llama_kv_cache *>(ctx->get_memory());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is not a safe cast, so it should be a checked with dynamic_cast

@ggerganov ggerganov merged commit 7f37b6c into master Jun 5, 2025
49 of 52 checks passed
@ggerganov ggerganov deleted the gg/llama-memory-public branch June 5, 2025 12:29
furyhawk pushed a commit to furyhawk/llama.cpp that referenced this pull request Jun 6, 2025
…ml-org#14006)

* memory : merge llama_kv_cache into llama_memory + new `llama_memory` API

ggml-ci

* context : fix casts

ggml-ci
shefben added a commit to shefben/llama.cpp that referenced this pull request Jun 6, 2025
gabe-l-hart added a commit to gabe-l-hart/ollama that referenced this pull request Jun 24, 2025
The kv cache hierarchy was squashed so that now all of the llama-kv-cache-*
implementations inherit directly from llama_memory_i and there is no
intermediary llama_kv_cache base class.

ggml-org/llama.cpp#14006

The llava.* tool files were migrated to mtmd.* files

ggml-org/llama.cpp#13460

Branch: GraniteFour

Signed-off-by: Gabe Goodhart <[email protected]>
gabe-l-hart added a commit to gabe-l-hart/ollama that referenced this pull request Jun 25, 2025
The kv cache hierarchy was squashed so that now all of the llama-kv-cache-*
implementations inherit directly from llama_memory_i and there is no
intermediary llama_kv_cache base class.

ggml-org/llama.cpp#14006

Branch: GraniteFour

Signed-off-by: Gabe Goodhart <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants