Skip to content

[FEATURE] Allow users to easily set CachePoints within an Agent's execution #1015

@dbschmigelski

Description

@dbschmigelski

Problem Statement

Developers currently face cost/latency challenges when building applications that require sequential LLM operations on the same context.

When an agent processes a large amount of data and needs to perform additional operations like structured output formatting, the SDK forces a complete reprocessing of the entire message history. For example, if an agent analyzes 100,000 tokens of content and then needs to generate a structured output, the system must process those same 100,000 tokens again, doubling the token consumption and associated costs.

This makes it difficult or impossible to build cost-effective applications that require both long-running analysis and structured outputs, as there is no built-in way to cache or efficiently reuse the already processed context.

Message caching through CachePoint is already supported. But, this addresses caching only at the start of a message. If an agent generates a large number of tokens during its execution there is no easy way to set cache points within the run. It is technically possible with Hooks, but this is a common scenario that should be made easier by the SDK.

Proposed Solution

No response

Use Case

This is useful when structured output is needed after an Agent generated many tokens during its run.

Alternatives Solutions

No response

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    area-structured-outputRelated to the structured output apienhancementNew feature or requestto-refineIssue needs to be discussed with the team and the team has come to an effort estimate consensus

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions