High token usage due to Prompt Cache in agents (large cached fixed context) – bug or expected behavior? #11615

MaximilianoGutierrez · 2026-02-03T14:12:34Z

MaximilianoGutierrez
Feb 3, 2026

We are observing unexpectedly high token usage when using agents in LibreChat, even for very simple queries (e.g. “What time is it?”).

Specifically, we see metrics like:

promptTokens.input: very low (e.g. 6 tokens)

promptTokens.write: very high (20k–30k tokens)

promptTokens.read: 0

completionTokens: normal

This suggests that a very large Prompt Cache is being written, corresponding to some fixed cached context (system prompt, tools, RAG, model wrappers, etc.), rather than the user’s actual input.

Some relevant observations:

The issue only happens with agents, not with simple chat sessions.

Modifying the agent’s instructions does not reduce promptTokens.write.

Changing the model (e.g. from Claude Sonnet to another model) significantly reduces the cache size, which suggests the prompt is assembled differently depending on the model.

With OpenAI models (e.g. GPT-4o-mini), no prompt caching (write/read) is observed, although the prompt itself is still large.

We could not find a clear way to “clear” or invalidate the prompt cache, other than indirect changes or waiting for the TTL to expire.

Our main questions are:

Is this behavior expected, or is it a known bug?

Which components exactly are included in this cached fixed context for agents?

Is there a recommended way to:

clear or invalidate the prompt cache,

prevent certain blocks (RAG, tools, global system prompts) from being cached,

or disable prompt caching per agent or globally?

Does this behavior apply only to agents, or should it also be expected in other modes?

Have there been other reports of unexpectedly high promptTokens.write usage in similar setups?

The goal is to understand how to control or reduce the cached fixed context, especially for agents that are also used for simple conversational queries, in order to avoid unnecessary token costs.

PieterDC1997 · 2026-03-17T09:52:07Z

PieterDC1997
Mar 17, 2026

I can confirm this issue with some benchmark data. Using an agent with
claude-sonnet-4-6, inter-turn prompt caching does not work — every turn
writes the full context to cache but never reads from the previous turn's cache.

Evidence (consecutive turns, same conversation):

Turn at 07:20:04: input=58,284, cache_read=0, cache_write=58,278
Turn at 07:20:20: input=58,402, cache_read=0, cache_write=58,396

Only 16 seconds apart. Anthropic cache TTL is 5 minutes. The second turn
should have been a cache hit but instead writes everything again.

Within-turn caching DOES work — when tool calls create multiple sub-turns
within a single user message, subsequent sub-turns correctly read from the
first sub-turn's cache.

Cost impact: I benchmarked the same conversation with caching ON vs OFF:

Turn type	Cache ON	Cache OFF	Winner
Multi-sub-turn (tool calls)	Cheaper	More expensive	Cache ON
Single sub-turn (simple Q&A)	More expensive	Cheaper	Cache OFF

For simple turns, caching is 25% MORE expensive than no caching — you
pay the $3.75/MTok write premium with zero reads to offset it. The "hello"
test: $0.284 with cache vs $0.255 without.

Overall cache ON still wins (~$1.85 vs ~$2.19 for a full conversation)
because tool-heavy turns benefit from within-turn reads. But fixing
inter-turn caching would be a ~10x cost reduction on repeated context
($0.30 reads vs $3.75 writes).

The root cause is likely the agent framework reconstructing the message
payload differently each turn, breaking Anthropic's prefix match requirement.

Could you provide some insights on this issue, please? Or what am I missing here? Many thanks!

Pieter

1 reply

danny-avila Mar 17, 2026
Maintainer

See: #12209 (comment)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

High token usage due to Prompt Cache in agents (large cached fixed context) – bug or expected behavior? #11615

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

High token usage due to Prompt Cache in agents (large cached fixed context) – bug or expected behavior? #11615

Uh oh!

MaximilianoGutierrez Feb 3, 2026

Replies: 1 comment · 1 reply

Uh oh!

PieterDC1997 Mar 17, 2026

Uh oh!

danny-avila Mar 17, 2026 Maintainer

MaximilianoGutierrez
Feb 3, 2026

Replies: 1 comment 1 reply

PieterDC1997
Mar 17, 2026

danny-avila Mar 17, 2026
Maintainer