Skip to content

Add cache control to Anthropic Bedrock provider#1615

Open
laminar-coding-agent[bot] wants to merge 7 commits intodevfrom
agent/lam-1435/add-cache-control-to-anthropic-bedrock/0b8f36
Open

Add cache control to Anthropic Bedrock provider#1615
laminar-coding-agent[bot] wants to merge 7 commits intodevfrom
agent/lam-1435/add-cache-control-to-anthropic-bedrock/0b8f36

Conversation

@laminar-coding-agent
Copy link
Copy Markdown
Contributor

@laminar-coding-agent laminar-coding-agent bot commented Apr 9, 2026

No description provided.

Add CachePointBlock (type: default) to system prompt, tools, and the
last message content block in Bedrock Converse API requests. This
enables Anthropic's prompt caching so repeated prefixes (system
instructions, tool definitions, conversation history) are cached and
reused across requests, reducing latency and input token costs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 9, 2026

Greptile Summary

This PR adds Anthropic prompt caching to the Bedrock provider by appending CachePointBlock entries to three locations in Converse API requests: the last user message's content blocks, the system instruction blocks, and the tool definitions list. It also updates the usage accounting to roll cache_read_input_tokens and cache_write_input_tokens into prompt_token_count / total_token_count, and updates default/size-based Bedrock model IDs to newer Claude 4 variants.

Key changes:

  • build_cache_point() helper extracted (DRY, addressed prior feedback)
  • Cache point now applied to the last message, not the first (addressed prior feedback)
  • Debug println! removed (addressed prior feedback)
  • thinking_enabled hoisted to the top of generate_content so it is available before any cache-point injection
  • Usage metadata now incorporates cache token counts so downstream token tracking captures full prompt costs
  • main.rs log lines renamed from Gemini-specific wording to generic LLM client

Confidence Score: 3/5

Not yet safe to merge — the !thinking_enabled guard on cache-point injection is still absent despite a prior reported fix, which will cause Bedrock API errors on any extended-thinking request.

Several previous concerns are genuinely resolved (DRY helper extracted, debug print removed, cache point now placed on the last message). However, commit 67de219 was supposed to gate all three cache-point injection sites on !thinking_enabled, yet the guards are not present in the current HEAD (ad2f450). Any call to generate_content with a thinking-enabled generation_config will now append cache point blocks and receive a Bedrock validation error, regressing a scenario that was already identified and supposedly fixed.

app-server/src/signals/provider/bedrock/mod.rs — specifically the three cache-point injection sites (last user message, system blocks, tool list) need !thinking_enabled guards.

Vulnerabilities

No security concerns identified. The change only modifies how cache-point markers are appended to Bedrock Converse API requests; no credentials, secrets, or user-supplied data flow through the new code paths in an unsafe way.

Important Files Changed

Filename Overview
app-server/src/signals/provider/bedrock/mod.rs Core caching implementation: cache points appended to last user message, system blocks, and tool list. Previous concerns (DRY, println, message ordering) are resolved, but the !thinking_enabled guard that was reportedly added in commit 67de219 is absent from all three injection sites in the current HEAD, meaning extended-thinking requests will still receive cache point blocks and will be rejected by Bedrock.
app-server/src/signals/provider/mod.rs Default Bedrock model updated to global.anthropic.claude-sonnet-4-6; size-based mappings added for Haiku, Sonnet, and Opus 4 variants. No logic issues; purely a configuration update.
app-server/src/main.rs Log messages renamed from Gemini-specific wording to generic LLM client. No functional changes.

Reviews (6): Last reviewed commit: "Add cache point to tool definitions in B..." | Re-trigger Greptile

@laminar-coding-agent
Copy link
Copy Markdown
Contributor Author

After reviewing the code and the Greptile bot's findings, I don't think changes are needed here. Both P2 findings are non-issues:

  1. Duplicated builder code: The CachePointBlock builder pattern appears 3 times, each wrapping the result in a different enum variant (ContentBlock::CachePoint, SystemContentBlock::CachePoint, Tool::CachePoint). Per the project's coding style guidelines: "Three similar lines of code is better than a premature abstraction." Extracting a helper for this would add indirection without meaningful benefit — each call site is 4 lines and the types differ.

  2. No model-compatibility guard: The Greptile summary itself acknowledges this is safe: "The minimum-token threshold is enforced server-side — requests below the threshold are processed without caching and no error is returned, so the unconditional placement strategy is safe." Bedrock's Converse API handles cache point blocks gracefully for models that don't support them. Adding a model allowlist would be a maintenance burden that provides no practical benefit.

Both findings are explicitly marked P2/non-blocking by Greptile, and neither represents a real code quality or correctness issue. No changes needed.

…lock construction

Extract the repeated CachePointBlock builder pattern (used in 3 places)
into a single `build_cache_point()` helper function. Also add a comment
noting that cache points are intended for supported Anthropic Claude
models and that unsupported models will surface an API error.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@laminar-coding-agent
Copy link
Copy Markdown
Contributor Author

Done. Changes pushed:

  1. Comment 1 (DRY cache point construction): Extracted a build_cache_point() helper function that replaces the identical CachePointBlock::builder()...build()...map_err() pattern at all three call sites (messages, system blocks, and tools).

  2. Comment 2 (model support documentation): Added a comment above the cache point injection block noting that this targets supported Anthropic Claude models (Opus 4, Sonnet 4, and newer) and that unsupported models will surface an API error propagated to callers. A model-ID allowlist was not added since it would be fragile and the error propagation already works correctly.

skull8888888 and others added 3 commits April 9, 2026 08:24
Extended thinking and prompt caching are incompatible on Anthropic's
Bedrock API. Hoist the thinking_enabled computation to the top of
generate_content and gate all three cache point injection sites
(messages, system instruction, tool config) on !thinking_enabled.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Place cache point on the last message instead of the first, so the
  full conversation prefix is cached across multi-turn requests.
- Include cache read/write tokens in total_token_count to keep it
  consistent with prompt_token_count.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@laminar-coding-agent
Copy link
Copy Markdown
Contributor Author

Pushed. Here's what was fixed:

  1. Comments 1 & 3 (cache point on first vs last message): Changed messages.first() / messages[0] to messages.last() / messages[last_idx]. The PR description says the cache point should be on the last message to cache the full conversation history prefix — using the first message only cached that single message.

  2. Comment 2 (total_token_count inconsistency): Added cache read/write tokens to total_token_count so it stays consistent with prompt_token_count. AWS Bedrock's total_tokens() returns input_tokens + output_tokens and excludes cache tokens, so without this fix prompt_token_count could exceed total_token_count.

Copy link
Copy Markdown
Contributor

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 3125eae. Configure here.

Cache points were added for system content and last user message,
but missing from tool definitions. This adds the cache point to
bedrock_tools before building the ToolConfiguration.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant