Update llama.py - Fix embedding generation error #5

MikeLP · 2025-07-12T11:57:32Z

Replace llama_kv_cache_clear -> llama_kv_self_clear.
Revert back until llama_kv_cache_clear function will be fixed

…in Gemma3ChatHandler

… type

Replace llama_kv_cache_clear -> llama_kv_self_clear

Copilot

Pull Request Overview

This PR fixes an embedding generation error by replacing calls to the wrong cache-clear function with the correct one.

Calls to llama_kv_cache_clear have been updated to llama_kv_self_clear in the embedding flow.
Ensures the context cache is properly cleared before and after decoding batches.

Comments suppressed due to low confidence (1)

llama_cpp/llama.py:982

Add or update unit tests for the embed function to verify that embeddings are generated correctly with llama_kv_self_clear and that the cache is fully cleared before and after decoding.

        data: Union[List[List[float]], List[List[List[float]]]] = []

Copilot · 2025-07-12T11:57:58Z

llama_cpp/llama.py

        # decode and fetch embeddings
        data: Union[List[List[float]], List[List[List[float]]]] = []

        def decode_batch(seq_sizes: List[int]):


[nitpick] Consider adding a brief comment explaining why llama_kv_self_clear is used here instead of the previous llama_kv_cache_clear, to clarify the intended cache-clearing behavior for future maintainers.

Suggested change

def decode_batch(seq_sizes: List[int]):

def decode_batch(seq_sizes: List[int]):

# Clear the self-attention key-value cache to prepare for decoding the next batch.

# `llama_kv_self_clear` is used here instead of `llama_kv_cache_clear` because it specifically

# clears the cache for self-attention mechanisms, which is required for accurate embedding generation.

MikeLP · 2025-07-13T03:09:19Z

We can close this PR if llama_kv_cache_clear error will be fixed

lsorber and others added 30 commits July 4, 2025 11:51

feat: add streaming tool use

d30de24

fix: remove strict=True to support Python 3.9

8ce5e4f

feat: improve tool use robustness

d7215f3

test: skip if insufficient resources on macOS

50accd4

fix: apply missing _convert_text_completion_logprobs_to_chat

9f8bd21

feat: Add Gemma3 chat handler (#1976)

e41ae12

resolve the image embedding issue in gemma3

150a4a5

fix: added n_ctx check for prompt requirements when embedding images …

60443dd

…in Gemma3ChatHandler

fix: modify the gemma3 chat template to be compatible with openai api

5d52b03

fix: add compatibility with v0.3.9 for Gemma3ChatHandler

126a13d

feat: abstract context creation and expose for recreation

f31ac2e

feat: add usage to streamin response

c3debdf

switch to llama.cpp fork and llama : expose C API to get layer device…

3d776cd

… type

chore: empty commit to trigger rebuild downstream

ceb2a7e

c definitions

6d80d61

chore: bump empty commit

8d7001e

migrate llava to mtmd

3fc6b15

port kv_cache to new memory

5d8583b

cleanup

5dfb439

fixes

03ce53b

migrate clip to mtmd

ffff841

migrate clip to mtmd

4cf4b15

add general purpose function calling handler

22a16bd

add general purpose function calling handler

b2ca084

add general purpose function calling handler

3d7bc26

fix recreate context

4f27cc3

bump llama.cpp

8bbdc8b

fix deprecated

36cb6a1

fixes

d930cfe

fixes

f006860

okaris and others added 2 commits July 4, 2025 18:08

fixes

605998e

Update llama.py - Fix embedding generation error

3d30f0c

Replace llama_kv_cache_clear -> llama_kv_self_clear

Copilot AI review requested due to automatic review settings July 12, 2025 11:57

Copilot AI reviewed Jul 12, 2025

View reviewed changes

okaris force-pushed the main branch 2 times, most recently from ef28569 to a096d51 Compare September 6, 2025 07:19

okaris force-pushed the main branch from c2e40fe to 066638c Compare September 17, 2025 20:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update llama.py - Fix embedding generation error #5

Update llama.py - Fix embedding generation error #5

Uh oh!

MikeLP commented Jul 12, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jul 12, 2025

Uh oh!

MikeLP commented Jul 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

-        def decode_batch(seq_sizes: List[int]):
+        def decode_batch(seq_sizes: List[int]):
+            # Clear the self-attention key-value cache to prepare for decoding the next batch.
+            # `llama_kv_self_clear` is used here instead of `llama_kv_cache_clear` because it specifically
+            # clears the cache for self-attention mechanisms, which is required for accurate embedding generation.

Update llama.py - Fix embedding generation error #5

Are you sure you want to change the base?

Update llama.py - Fix embedding generation error #5

Uh oh!

Conversation

MikeLP commented Jul 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Copilot AI Jul 12, 2025

Choose a reason for hiding this comment

Uh oh!

MikeLP commented Jul 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

MikeLP commented Jul 12, 2025 •

edited

Loading