Skip to content

Cache hit total_cost injection breaks downstream cache keys for multi-turn conversationsΒ #35308

@aunitt

Description

@aunitt

Checked other resources

  • I added a very descriptive title to this issue.
  • I searched the LangChain documentation with the integrated search.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

from langchain_core.caches import SQLiteCache
from langchain_core.globals import set_llm_cache
from langchain_community.chat_message_histories import ChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_google_genai import ChatGoogleGenerativeAI

# Set up SQLite cache
set_llm_cache(SQLiteCache(database_path=".cache.db"))

# Build a chain with message history
llm = ChatGoogleGenerativeAI(model="gemini-2.0-flash")
store = {}

def get_session_history(session_id):
    if session_id not in store:
        store[session_id] = ChatMessageHistory()
    return store[session_id]

chain_with_history = RunnableWithMessageHistory(llm, get_session_history)

# Run 1 (cold cache): Both calls go to API
config = {"configurable": {"session_id": "test"}}
chain_with_history.invoke("What is 2+2?", config=config)        # Call 1 - cached
chain_with_history.invoke("Now multiply by 3", config=config)    # Call 2 - cached (history includes Call 1 response)

# Run 2 (warm cache): Clear in-memory history, re-run
store.clear()
config = {"configurable": {"session_id": "test"}}
chain_with_history.invoke("What is 2+2?", config=config)        # Call 1 - CACHE HIT, but response now has total_cost: 0
chain_with_history.invoke("Now multiply by 3", config=config)    # Call 2 - CACHE MISS! History differs due to total_cost

Error Message and Stack Trace (if applicable)

No error β€” the second call silently misses the cache and makes an unnecessary API call.

Description

PR #32437 introduced code in _convert_cached_generations that injects "total_cost": 0 into usage_metadata of AIMessages on cache hits:

# libs/core/langchain_core/language_models/chat_models.py, lines 669-678
if hasattr(gen, "message") and isinstance(gen.message, AIMessage):
    # We zero out cost on cache hits
    gen.message = gen.message.model_copy(
        update={
            "usage_metadata": {
                **(gen.message.usage_metadata or {}),
                "total_cost": 0,
            }
        }
    )

The original API response does NOT include total_cost in usage_metadata. When the modified AIMessage (with total_cost: 0) is added to conversation history, any subsequent cache lookup for a follow-up message in the same conversation will fail because the serialised prompt now differs from what was originally cached.

The cascade

  1. Run 1 (cold cache): Call 1 goes to API β†’ response has usage_metadata with 5 keys (no total_cost). This AIMessage goes into history. Call 2 is cached with this history.

  2. Run 2 (warm cache): Call 1 β†’ cache hit β†’ _convert_cached_generations injects total_cost: 0 (now 6 keys). Modified AIMessage goes into history. Call 2's prompt now differs by 17 bytes ("total_cost": 0, ) β†’ cache MISS.

  3. Run 3: Same injection, but Call 2 now matches the Run 2 cached entry β†’ cache hit again.

Evidence

We confirmed this in a real pipeline processing 100 papers. For the same paper, two refine_coreferences entries exist in the SQLite cache β€” one from Run 1 (72,301 bytes prompt) and one from Run 2 (72,318 bytes prompt). The only difference is the "total_cost": 0, string (17 bytes).

Note on existing id normalisation

The codebase already handles a similar issue with the id field β€” it strips id from messages before computing cache keys (lines 1151-1158). However, usage_metadata is not similarly normalised, so the total_cost injection pollutes downstream cache keys.

Suggested fix

Either:

  1. Don't inject total_cost into the AIMessage's usage_metadata (track it separately for LangSmith)
  2. Strip/normalise usage_metadata fields (like total_cost) from AIMessages in conversation history before computing cache keys, similar to how id is already handled

System Info

langchain-core==1.2.13
langchain-google-genai==2.1.5
Python 3.10.13
macOS Darwin 24.6.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions