You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I feel like I might be missing a critical piece of the puzzle when working with long contexts, multiple agents, and structured outputs, so I wanted to bring this up.
Here's the situation: assume I have a very large document (>30k tokens) that needs to be provided as context to multiple agents. Each agent performs a specialized task and uses Instructor to produce a structured output. Due to the nature of the tasks, all agents need access to the entire document, so typical RAG approaches aren’t an option.
Prompt caching feels like the perfect fit here—providing me with faster and cheaper agent interactions. But there's a problem: each agent uses a different tool, and as implemented by the major providers (OpenAI, Anthropic, Gemini), prompt caching wouldn’t work in this setup.
The issue stems from how Instructor (and, as a matter of fact, most libraries and the providers themselves) enforces structured outputs: either via TOOLS mode (tool calling) or JSON mode (placing the schema in the system prompt). The way prompt caching is implemented by these providers is fundamentally prefix-based. And since the tools and system prompts are part of that prefix (as a matter of fact, at the very beginning), any variation in tools or system prompt content completely breaks caching.
This feels to me like a very common use case, and haven't found much discussion about it. The only straightforward solution I see is giving up on the stronger structure guarantees of tool calling and instead instructing the model via user messages to produce structured outputs (something like plain JSON and MD_JSON). However - correct me if I am wrong - Instructor doesn't seem to support this kind of solution out of the box for these providers: for instance, considering Anthropic, ANTHROPIC_TOOLS uses tools whereas ANTHROPIC_JSON places the schema in the system prompt.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Hey there,
I feel like I might be missing a critical piece of the puzzle when working with long contexts, multiple agents, and structured outputs, so I wanted to bring this up.
Here's the situation: assume I have a very large document (>30k tokens) that needs to be provided as context to multiple agents. Each agent performs a specialized task and uses Instructor to produce a structured output. Due to the nature of the tasks, all agents need access to the entire document, so typical RAG approaches aren’t an option.
Prompt caching feels like the perfect fit here—providing me with faster and cheaper agent interactions. But there's a problem: each agent uses a different tool, and as implemented by the major providers (OpenAI, Anthropic, Gemini), prompt caching wouldn’t work in this setup.
The issue stems from how Instructor (and, as a matter of fact, most libraries and the providers themselves) enforces structured outputs: either via TOOLS mode (tool calling) or JSON mode (placing the schema in the system prompt). The way prompt caching is implemented by these providers is fundamentally prefix-based. And since the tools and system prompts are part of that prefix (as a matter of fact, at the very beginning), any variation in tools or system prompt content completely breaks caching.
This feels to me like a very common use case, and haven't found much discussion about it. The only straightforward solution I see is giving up on the stronger structure guarantees of tool calling and instead instructing the model via user messages to produce structured outputs (something like plain JSON and MD_JSON). However - correct me if I am wrong - Instructor doesn't seem to support this kind of solution out of the box for these providers: for instance, considering Anthropic, ANTHROPIC_TOOLS uses tools whereas ANTHROPIC_JSON places the schema in the system prompt.
Am I missing something here?
Beta Was this translation helpful? Give feedback.
All reactions