Skip to content

Conversation

MikeLP
Copy link

@MikeLP MikeLP commented Jul 12, 2025

Replace llama_kv_cache_clear -> llama_kv_self_clear.
Revert back until llama_kv_cache_clear function will be fixed

okaris and others added 2 commits July 4, 2025 18:08
Replace llama_kv_cache_clear -> llama_kv_self_clear
@Copilot Copilot AI review requested due to automatic review settings July 12, 2025 11:57
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes an embedding generation error by replacing calls to the wrong cache-clear function with the correct one.

  • Calls to llama_kv_cache_clear have been updated to llama_kv_self_clear in the embedding flow.
  • Ensures the context cache is properly cleared before and after decoding batches.
Comments suppressed due to low confidence (1)

llama_cpp/llama.py:982

  • Add or update unit tests for the embed function to verify that embeddings are generated correctly with llama_kv_self_clear and that the cache is fully cleared before and after decoding.
        data: Union[List[List[float]], List[List[List[float]]]] = []

# decode and fetch embeddings
data: Union[List[List[float]], List[List[List[float]]]] = []

def decode_batch(seq_sizes: List[int]):
Copy link

Copilot AI Jul 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Consider adding a brief comment explaining why llama_kv_self_clear is used here instead of the previous llama_kv_cache_clear, to clarify the intended cache-clearing behavior for future maintainers.

Suggested change
def decode_batch(seq_sizes: List[int]):
def decode_batch(seq_sizes: List[int]):
# Clear the self-attention key-value cache to prepare for decoding the next batch.
# `llama_kv_self_clear` is used here instead of `llama_kv_cache_clear` because it specifically
# clears the cache for self-attention mechanisms, which is required for accurate embedding generation.

Copilot uses AI. Check for mistakes.

@MikeLP
Copy link
Author

MikeLP commented Jul 13, 2025

We can close this PR if llama_kv_cache_clear error will be fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants