-
Notifications
You must be signed in to change notification settings - Fork 21k
Description
Checked other resources
- I added a very descriptive title to this issue.
- I searched the LangChain documentation with the integrated search.
- I used the GitHub search to find a similar question and didn't find it.
- I am sure that this is a bug in LangChain rather than my code.
- The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).
Example Code
from langchain_core.caches import SQLiteCache
from langchain_core.globals import set_llm_cache
from langchain_community.chat_message_histories import ChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_google_genai import ChatGoogleGenerativeAI
# Set up SQLite cache
set_llm_cache(SQLiteCache(database_path=".cache.db"))
# Build a chain with message history
llm = ChatGoogleGenerativeAI(model="gemini-2.0-flash")
store = {}
def get_session_history(session_id):
if session_id not in store:
store[session_id] = ChatMessageHistory()
return store[session_id]
chain_with_history = RunnableWithMessageHistory(llm, get_session_history)
# Run 1 (cold cache): Both calls go to API
config = {"configurable": {"session_id": "test"}}
chain_with_history.invoke("What is 2+2?", config=config) # Call 1 - cached
chain_with_history.invoke("Now multiply by 3", config=config) # Call 2 - cached (history includes Call 1 response)
# Run 2 (warm cache): Clear in-memory history, re-run
store.clear()
config = {"configurable": {"session_id": "test"}}
chain_with_history.invoke("What is 2+2?", config=config) # Call 1 - CACHE HIT, but response now has total_cost: 0
chain_with_history.invoke("Now multiply by 3", config=config) # Call 2 - CACHE MISS! History differs due to total_costError Message and Stack Trace (if applicable)
No error β the second call silently misses the cache and makes an unnecessary API call.
Description
PR #32437 introduced code in _convert_cached_generations that injects "total_cost": 0 into usage_metadata of AIMessages on cache hits:
# libs/core/langchain_core/language_models/chat_models.py, lines 669-678
if hasattr(gen, "message") and isinstance(gen.message, AIMessage):
# We zero out cost on cache hits
gen.message = gen.message.model_copy(
update={
"usage_metadata": {
**(gen.message.usage_metadata or {}),
"total_cost": 0,
}
}
)The original API response does NOT include total_cost in usage_metadata. When the modified AIMessage (with total_cost: 0) is added to conversation history, any subsequent cache lookup for a follow-up message in the same conversation will fail because the serialised prompt now differs from what was originally cached.
The cascade
-
Run 1 (cold cache): Call 1 goes to API β response has
usage_metadatawith 5 keys (nototal_cost). This AIMessage goes into history. Call 2 is cached with this history. -
Run 2 (warm cache): Call 1 β cache hit β
_convert_cached_generationsinjectstotal_cost: 0(now 6 keys). Modified AIMessage goes into history. Call 2's prompt now differs by 17 bytes ("total_cost": 0,) β cache MISS. -
Run 3: Same injection, but Call 2 now matches the Run 2 cached entry β cache hit again.
Evidence
We confirmed this in a real pipeline processing 100 papers. For the same paper, two refine_coreferences entries exist in the SQLite cache β one from Run 1 (72,301 bytes prompt) and one from Run 2 (72,318 bytes prompt). The only difference is the "total_cost": 0, string (17 bytes).
Note on existing id normalisation
The codebase already handles a similar issue with the id field β it strips id from messages before computing cache keys (lines 1151-1158). However, usage_metadata is not similarly normalised, so the total_cost injection pollutes downstream cache keys.
Suggested fix
Either:
- Don't inject
total_costinto the AIMessage'susage_metadata(track it separately for LangSmith) - Strip/normalise
usage_metadatafields (liketotal_cost) from AIMessages in conversation history before computing cache keys, similar to howidis already handled
System Info
langchain-core==1.2.13
langchain-google-genai==2.1.5
Python 3.10.13
macOS Darwin 24.6.0