-
Notifications
You must be signed in to change notification settings - Fork 2.6k
feat: add prompt caching support for LiteLLM (#5791) #6074
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add prompt caching support for LiteLLM (#5791) #6074
Conversation
- Add litellmUsePromptCache configuration option to provider settings - Implement cache control headers in LiteLLM handler when enabled - Add UI checkbox for enabling prompt caching (only shown for supported models) - Track cache read/write tokens in usage data - Add comprehensive test for prompt caching functionality - Reuse existing translation keys for consistency across languages This allows LiteLLM users to benefit from prompt caching with supported models like Claude 3.7, reducing costs and improving response times.
|
|
||
| expect(createCall.messages[lastUserIdx]).toMatchObject({ | ||
| cache_control: { type: "ephemeral" }, | ||
| }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding an assertion for the second last user message as well, to fully verify that cache control headers are applied to both the last two user messages.
- Convert system message to structured format with cache_control - Handle both string and array content types for user messages - Apply cache_control to content items, not just message level - Update tests to match new message structure This ensures prompt caching works correctly for all messages in a conversation, not just the initial system prompt and first user message.
Use type assertion to handle cache_control property that's not in OpenAI types
Updated PR with improved prompt caching implementationBased on feedback from issue #5791, I've updated the prompt caching implementation to properly handle multi-turn conversations. Changes made:
Technical details:The key issue was that LiteLLM needs cache control to be applied more granularly - specifically to content items within messages, similar to how Anthropic's native API handles it. The flat approach of spreading cache control at the message level doesn't work correctly for subsequent messages in a conversation. This implementation now follows the pattern used in Cline's rather than their , which ensures proper caching throughout multi-turn conversations. All tests are passing and the linter issues have been resolved. |
daniel-lxs
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
daniel-lxs
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @MuriloFP Can you confirm this follows the documentation on https://docs.litellm.ai/docs/proxy/caching#usage? I think I see some inconsistencies, liteLLM doesn't seem to follow the usual OpenAI-style caching but I might be wrong.
|
I believe https://docs.litellm.ai/docs/proxy/caching#usage refers to caching in LiteLLM when the proxy receives identical requests. In that case no request is sent on to the LLM Model and LiteLLM's cached response is returned. This PR adds caching directives to the messages that are proxied by LiteLLM to the LLM Model. |
daniel-lxs
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @MuriloFP and @steve-gore-snapdocs for the clarification!
…deInc#6074) * feat: add prompt caching support for LiteLLM (RooCodeInc#5791) - Add litellmUsePromptCache configuration option to provider settings - Implement cache control headers in LiteLLM handler when enabled - Add UI checkbox for enabling prompt caching (only shown for supported models) - Track cache read/write tokens in usage data - Add comprehensive test for prompt caching functionality - Reuse existing translation keys for consistency across languages This allows LiteLLM users to benefit from prompt caching with supported models like Claude 3.7, reducing costs and improving response times. * fix: improve LiteLLM prompt caching to work for multi-turn conversations - Convert system message to structured format with cache_control - Handle both string and array content types for user messages - Apply cache_control to content items, not just message level - Update tests to match new message structure This ensures prompt caching works correctly for all messages in a conversation, not just the initial system prompt and first user message. * fix: resolve TypeScript linter error for cache_control property Use type assertion to handle cache_control property that's not in OpenAI types
* main: Changeset version bump (RooCodeInc#6826) Update contributors list (RooCodeInc#6636) chore: add changeset for v3.25.10 (RooCodeInc#6825) feat: add GPT-5 model support (RooCodeInc#6819) fix: add missing MCP error translation keys (RooCodeInc#6821) fix: use CDATA sections in XML examples to prevent parser errors (RooCodeInc#4852) (RooCodeInc#6811) Changeset version bump (RooCodeInc#6810) chore: add changeset for v3.25.9 (RooCodeInc#6809) Fix rounding of max tokens (RooCodeInc#6808) feat: add GLM-4.5 and OpenAI gpt-oss models to Fireworks provider (RooCodeInc#6784) feat: focus chat input when clicking plus button in extension menu (RooCodeInc#6689) Support linking to @roo-code/cloud in Roo-Code repo (RooCodeInc#6799) Bring back "Use @roo-code/cloud from npm" (RooCodeInc#6795) Changeset version bump (RooCodeInc#6790) Release v3.25.8 (RooCodeInc#6789) fix: prevent unnecessary MCP server refresh on settings save (RooCodeInc#6772) (RooCodeInc#6779) fix: Replace scrollToIndex with scrollTo to fix scroll jitter (RooCodeInc#6780) Clamp default model max tokens to 20% of context window (RooCodeInc#6761) fix: handle current directory path "." correctly in codebase_search tool (RooCodeInc#6517) fix: recover from error state when Qdrant becomes available (RooCodeInc#6661) Revert "Use @roo-code/cloud from npm" (RooCodeInc#6742) fix: prevent disabled MCP servers from starting processes and show correct status (RooCodeInc#6084) feat: reduce Gemini 2.5 Pro minimum thinking budget to 128 (RooCodeInc#6588) fix: trim whitespace from OpenAI base URL to fix model detection (RooCodeInc#6560) fix: improve handling of net::ERR_ABORTED errors in URL fetching (RooCodeInc#6635) Add swift files to fallback list (RooCodeInc#6724) Fix: Resolve Memory Leak in ChatView Virtual Scrolling Implementation (RooCodeInc#6697) Update CHANGELOG.md Changeset version bump (RooCodeInc#6738) Stop making types private (RooCodeInc#6737) Delete bad changeset (RooCodeInc#6736) Revert "Changesets config tweak (RooCodeInc#6733)" (RooCodeInc#6735) feat: add OpenAI GPT OSS model to Cerebras providers (RooCodeInc#6734) Changesets config tweak (RooCodeInc#6733) v3.25.7 (RooCodeInc#6730) feat: add GPT-OSS 120b and 20b models to Groq provider (RooCodeInc#6732) feat: clean up task list in HistoryPreview and History components (RooCodeInc#6687) feat: add support for Claude Opus 4.1 (claude-opus-4-1-20250805) (RooCodeInc#6728) Revert "Extension bridge (RooCodeInc#6677)" (RooCodeInc#6729) Redesigned Task Header (RooCodeInc#6561) feat: code indexing support multiple folder similar with task history (RooCodeInc#6204) fix: prevent MCP server creation when setting is disabled (RooCodeInc#6613) Add the fireworks AI provider (RooCodeInc#6652) Extension bridge (RooCodeInc#6677) fix: prevent empty mode names from being saved (fixes RooCodeInc#5766) (RooCodeInc#5767) Bump @roo-code/types to v1.44.0 (RooCodeInc#6675) feat(ui): Make mode selection dropdowns responsive (RooCodeInc#6422) Add Z AI provider (RooCodeInc#6657) use assistantMessageParser class instead of parseAssistantMessage (RooCodeInc#5341) style: update highlightLayer style and align to textarea (RooCodeInc#6648) Remove 'Initial Checkpoint' terminology, use 'Checkpoint' consistently (RooCodeInc#6643) Use @roo-code/cloud from npm (RooCodeInc#6611) Bump @roo-code/types to v1.43.0 (RooCodeInc#6640) Changing checkpoint timing and ensuring checkpoints work (RooCodeInc#6359) feat: conditionally include reminder section based on todo list config (RooCodeInc#6411) Fix the UI for approving chained commands (RooCodeInc#6623) Bump @roo-code/types to v1.42.0 (RooCodeInc#6610) Task and TaskProvider event emitter cleanup + a few new events (RooCodeInc#6606) Changeset version bump (RooCodeInc#6579) Update contributors list (RooCodeInc#6506) Release v3.25.6 (RooCodeInc#6578) feat: set horizon-beta model max tokens to 32k for OpenRouter (RooCodeInc#6577) chore: bump @roo-code/types to v1.41.0 (RooCodeInc#6568) Cloud: support syncing provider profiles from the cloud (RooCodeInc#6540) Changeset version bump (RooCodeInc#6565) Release v3.25.5 (RooCodeInc#6564) Add Qwen 3 Coder from Cerebras (RooCodeInc#6562) fix: Fix VB.NET indexing by implementing fallback chunking system (RooCodeInc#6552) More tolerant search/replace match (RooCodeInc#6537) Clean up the auto-approve UI (RooCodeInc#6538) fix: linter not applied to locales/*/README.md (RooCodeInc#6477) Revert "Migrate evals database when deploying roo-code-website" (RooCodeInc#6525) Phase 1 website updates (RooCodeInc#6085) Cloud service cleanup callbacks / move to events (RooCodeInc#6519) Add Cerebras as a provider (RooCodeInc#6392) fix: handle Qdrant deletion errors gracefully to prevent indexing interruption (RooCodeInc#6296) Add auto-approved cost limits (RooCodeInc#6484) fix: LM Studio model context length (RooCodeInc#5075) (RooCodeInc#6183) fix: restore message sending when clicking save button (RooCodeInc#6487) Handle more variations of chaining and subshell command validation (RooCodeInc#6486) fix: improve Claude Code ENOENT error handling with installation guidance (RooCodeInc#5867) Skip interpolation for non-existent slash commands (RooCodeInc#6475) Changeset version bump (RooCodeInc#6474) Release v3.25.4 (RooCodeInc#6473) Revert experiments with nightly marketplace config (RooCodeInc#6472) feat: set horizon-alpha model max tokens to 32k for OpenRouter (RooCodeInc#6470) fix: Remove misleading task resumption message (RooCodeInc#5851) chore(deps): update dependency lint-staged to v16.1.2 (RooCodeInc#4965) fix(deps): update dependency recharts to v2.15.4 (RooCodeInc#4971) Fix: Kill button for execute_command tool (RooCodeInc#6457) feat: add zai-org/GLM-4.5-FP8 model to Chutes AI provider (RooCodeInc#6441) Update contributors list (RooCodeInc#6360) chore(deps): update dependency @changesets/cli to v2.29.5 (RooCodeInc#4936) Migrate evals database when deploying roo-code-website (RooCodeInc#6146) Miscellaneous cleanup (RooCodeInc#6453) feat: add symlink support for AGENTS.md file loading (RooCodeInc#6326) feat: diagnose nightly freeze by disabling contributes (RooCodeInc#6450) fix: resolve navigator global error by updating mammoth and bluebird dependencies (RooCodeInc#6363) Add pattern to support Databricks /invocations endpoints (RooCodeInc#6317) feat: enhance token counting by extracting text from messages using VSCode LM API (RooCodeInc#6424) feat: Add Task History Context to Prompt Enhancement (RooCodeInc#6343) Support new LLM provider: Doubao (RooCodeInc#6345) fix: Use separate changelog for nightly builds to prevent marketplace freezing (RooCodeInc#6449) feat: auto-refresh marketplace data when organization settings change (RooCodeInc#6446) feat: add SambaNova provider integration (RooCodeInc#6188) Changeset version bump (RooCodeInc#6438) chore: add changeset for v3.25.3 (RooCodeInc#6435) docs: clarify apply_diff tool descriptions to emphasize surgical edits (RooCodeInc#6278) fix(chat): Prevent input clearing when clicking chat buttons (RooCodeInc#6222) Remove event types mention from PR reviewer rules (RooCodeInc#6428) Update the auto-translate prompt (RooCodeInc#6430) Hide Gemini checkboxes on the welcome view (RooCodeInc#6415) Remove "(prev Roo Cline)" from extension title in all languages (RooCodeInc#6426) feat: add translation check action to pull_request.opened event (RooCodeInc#6393) Allow queueing images (RooCodeInc#6414) Add docs link for slash commands (RooCodeInc#6409) Update PR reviewer rules and mode configuration (RooCodeInc#6391) feat: increase Claude Code default max output tokens to 16k (RooCodeInc#6312) Changeset version bump (RooCodeInc#6390) Release v3.25.2 (RooCodeInc#6389) Fix zap hover (RooCodeInc#6388) fix: show diff view before approval when PREVENT_FOCUS_DISRUPTION is disabled (RooCodeInc#6386) Bump types to 1.40.0 (RooCodeInc#6387) Cloud: add organization MCP controls (RooCodeInc#6378) Changeset version bump (RooCodeInc#6385) Release v3.25.1 (RooCodeInc#6384) feat: add zai-org/GLM-4.5-Air model to Chutes AI provider (RooCodeInc#6377) Improvements to subshell validation (RooCodeInc#6379) docs: update docs extractor mode configuration and rules (RooCodeInc#6373) Changeset version bump (RooCodeInc#6362) Update contributors list (RooCodeInc#6302) fix: exclude rules-{slug} folder from mode export paths (RooCodeInc#6186) chore: add changeset for v3.25.0 (RooCodeInc#6358) feat: make PR reviewer rules generic (RooCodeInc#6357) Remove duplicated assignment (RooCodeInc#6355) fix(environment): Filter out non-text tab inputs (RooCodeInc#6350) Add .roomotes.yml (RooCodeInc#6346) Better command highlighting (RooCodeInc#6336) Command argument hints and change release engineer to a command (RooCodeInc#6333) Ensure form-data >= 4.0.4 (RooCodeInc#6332) feat(tools): add image support to read_file tool (RooCodeInc#5172) feat: Add experimental setting to prevent editor focus disruption (RooCodeInc#6214) feat: make task mode sticky to task (RooCodeInc#6177) feat: add ESC key handling for modes, API provider, and indexing settings popovers (RooCodeInc#6175) Parse mentions from all user inputs (RooCodeInc#6331) Support inserting mentions after a slash command (RooCodeInc#6327) feat: Update PR reviewer mode to use todo lists and GitHub CLI (RooCodeInc#6328) Restore PR reviewer (RooCodeInc#6324) Fix slash command highlighting (RooCodeInc#6325) fix: empty README.vscode.md to test marketplace freezing issue (RooCodeInc#6315) Add support for slash command frontmatter descriptions (RooCodeInc#6314) feat(mode-writer): add validation and cohesion checking for mode creation (RooCodeInc#6313) fix: normalize Windows paths to forward slashes in mode export (RooCodeInc#6308) Improve issue-writer mode initialization workflow (RooCodeInc#6311) Add a UI for managing slash commands (RooCodeInc#6286) feat: add prompt caching support for LiteLLM (RooCodeInc#5791) (RooCodeInc#6074) fix: remove demo GIF from VS Code README to prevent marketplace freezing (RooCodeInc#6305) fix: hide Test ErrorBoundary button in production mode (RooCodeInc#6216) Fix keyboard shortcuts for non-QWERTY layouts (RooCodeInc#6162) Revert "fix: optimize README for VS Code marketplace to prevent UI freezing" (RooCodeInc#6303) fix: optimize README for VS Code marketplace to prevent UI freezing (RooCodeInc#6275) Feat: Adding Gemini tools - URL Context and Grounding with Google Search (RooCodeInc#5959) feat: Add support for message queueing (RooCodeInc#6167) Update issue writer rules and workflow configurations (RooCodeInc#6230) feat: sync API config selector style with mode selector from PR RooCodeInc#6140 (RooCodeInc#6148) feat: Add search functionality to mode selector popup and reorganize layout (RooCodeInc#6140) Support for custom slash commands (RooCodeInc#6263) fix: prevent scrollbar flickering in chat view during content streaming (RooCodeInc#6266) test: update list-files test for fixed hidden files bug (RooCodeInc#6261) fix: add text wrapping to command patterns in Manage Command Permissions (RooCodeInc#6255) # fix: list_files recursive mode now works for dot directories (RooCodeInc#5176) fix: restore working settings link in command permissions tooltip (RooCodeInc#6253) feat: add markdown table rendering support (RooCodeInc#6252) Changeset version bump (RooCodeInc#6239) Kick off release (RooCodeInc#6238) Update contributors list (RooCodeInc#6110) Delete .changeset/kind-horses-sniff.md chore: add changeset for v3.24.0 (RooCodeInc#6237) Clean up some of the hugging face provider settings (RooCodeInc#6236) refactor: consolidate HuggingFace models API into providers/fetchers (RooCodeInc#6228) fix: add error message when no workspace folder is open for code indexing (RooCodeInc#6227) Hugging Face provider: add more details (RooCodeInc#6190) fix(chat): Cancel auto-approve timer when editing follow-up suggestion (RooCodeInc#6226) fix: prevent duplicate command patterns by trimming full command (RooCodeInc#6224) fix: restore list styles for markdown lists in chat interface (RooCodeInc#6095) feat: Add terminal command permissions UI to chat interface (RooCodeInc#5480) (RooCodeInc#5798) fix: respect maxReadFileLine setting for file mentions to prevent context exhaustion (RooCodeInc#6073) Add support for bedrock api keys (RooCodeInc#6132) Fix Ollama API URL normalization by removing trailing slashes (RooCodeInc#6079) fix: allow auto-approve checkbox to be toggled at any time (RooCodeInc#6061) feat: add efficiency warning for single SEARCH/REPLACE blocks in apply_diff (RooCodeInc#6055) Add type for allowing org members to view all tasks (RooCodeInc#6193) Expose default modes in @roo-code/types (RooCodeInc#6184) feat: add Google Analytics tag to marketing website (RooCodeInc#6179) fix: Fix Hugging Face provider setup not transitioning from welcome view (RooCodeInc#6173) feat: add confirmation dialog and proper cleanup for marketplace mode removal (RooCodeInc#6136) feat: add support for Agent Rules standard via AGENTS.md (RooCodeInc#5966) (RooCodeInc#5969) Update next.config.ts Fix/website logo theme persistence (RooCodeInc#6040) basic hugging face provider (RooCodeInc#6134) feat: Add settings to control diagnostic messages (RooCodeInc#5524) (RooCodeInc#5582) Changeset version bump (RooCodeInc#6131) chore: add changeset for v3.23.19 (RooCodeInc#6130) Roo Code Cloud Waitlist CTAs (RooCodeInc#6104) Smarter auto-deny (RooCodeInc#6123) Split commands on newlines (RooCodeInc#6121) Changeset version bump (RooCodeInc#6111) Update contributors list (RooCodeInc#5699) chore: add changeset for v3.23.18 (RooCodeInc#6109) debug: Add ErrorBoundary component for better error handling (RooCodeInc#5085) fix: resolve 'Bad substitution' error in command parsing (RooCodeInc#5743) Fix todo list toggle not working (RooCodeInc#6103) Use SIGKILL for command execution timeouts in the "execa" variant (RooCodeInc#6071) Changeset version bump (RooCodeInc#6092) Release v3.23.17 (RooCodeInc#6091) feat: add merge-resolver mode for intelligent conflict resolution (RooCodeInc#6090) fix: add Git installation check for checkpoints feature (RooCodeInc#3109) (RooCodeInc#5920) feat: add Qwen/Qwen3-235B-A22B-Instruct-2507 model to Chutes AI provider (RooCodeInc#6052) feat: add llama-4-maverick model to Vertex AI provider (RooCodeInc#5808) (RooCodeInc#6023) docs: clarify when to use update_todo_list tool (RooCodeInc#5926) fix: add case sensitivity mention to suggested fixes in apply_diff error message (RooCodeInc#6076) Add jump icon for newly created files (RooCodeInc#5738) Fix evals; broken by RooCodeInc#5865 (RooCodeInc#6065) Bugfix: Cloud: be more specific about session error codes (RooCodeInc#6051) feat: add moonshot provider (RooCodeInc#6046) Add todo list tool enable checkbox to provider advanced settings (RooCodeInc#6032) fix: add bedrock to ANTHROPIC_STYLE_PROVIDERS and restore vertex Claude model checking (RooCodeInc#6019) Update the max_tokens fallback logic in the sliding window (RooCodeInc#5993) fix: sort symlinked rules files by symlink names, not target names (RooCodeInc#5903) fix: properly distinguish between user cancellations and API failures (RooCodeInc#6025) fix: resolve global mode export not including rules files (RooCodeInc#5834) (RooCodeInc#5837) fix: add run parameter to vitest command in rules (RooCodeInc#5991) feat: auto-omit MCP content when no servers are configured (RooCodeInc#5889) fix: move context condensing prompt to Prompts section (RooCodeInc#4924) (RooCodeInc#5279) feat: add configurable timeout for evals (5-10 min) (RooCodeInc#5865) fix: enable export, share, and copy buttons during API operations (RooCodeInc#5324) (RooCodeInc#5849) fix: add character limit to prevent terminal output context explosion (RooCodeInc#5777) feat: Add Mistral embedding provider (RooCodeInc#5932) (RooCodeInc#5946) Changeset version bump (RooCodeInc#5953) Release v3.23.16 (RooCodeInc#5952) feat: enhance release engineer mode to include issue numbers and reporters in changelog (RooCodeInc#5940) feat: add batch limiting to code indexer (RooCodeInc#5891) feat: add global rate limiting for OpenAI-compatible embeddings (RooCodeInc#5854) feat: mark non-English translation files as linguist-generated (RooCodeInc#5943) Fix Docker port conflicts for evals services (RooCodeInc#5909) Changeset version bump (RooCodeInc#5935) chore: add changeset for v3.23.15 patch release (RooCodeInc#5934) Added YouTube to website footer (RooCodeInc#5929) Add command timeout allowlist with IPC support (RooCodeInc#5910) feat: add configurable delay for Go diagnostics to prevent premature error reporting (RooCodeInc#5863) feat: Add Issue Investigator mode and enhance Issue Writer mode (RooCodeInc#5913) Prevent completion with open todos (RooCodeInc#5716) feat: move marketplace icon from overflow menu to top navigation (RooCodeInc#5864) feat: Add description and whenToUse fields to custom modes in .roomodes (RooCodeInc#5862) fix: detect Claude models by name for API protocol selection (RooCodeInc#5840) Fix/issue fixer pr template (RooCodeInc#5839) Changeset version bump (RooCodeInc#5836) Log API-initiated tasks to a tmp directory (RooCodeInc#5833) refactor: remove orchestrator modes and update GitHub CLI usage (RooCodeInc#5832) Changeset version bump (RooCodeInc#5829) v3.23.13 (RooCodeInc#5827) Allow command execution timeout to be set via IPC task execution (RooCodeInc#5825) feat: Update Ollama UI to use text inputs (RooCodeInc#5818) Message edit/delete overhaul (RooCodeInc#5538)
Related GitHub Issue
Closes: #5791
Roo Code Task Context (Optional)
No Roo Code task context for this PR
Description
This PR implements prompt caching support for LiteLLM, allowing users to benefit from reduced costs and improved response times when using models that support prompt caching (like Claude 3.7).
Key implementation details:
litellmUsePromptCacheboolean option to provider settings schemaenablePromptCachingandenablePromptCachingTitle) to maintain consistency across all supported languagesDesign choices:
Test Procedure
Automated Testing:
src/api/providers/__tests__/lite-llm.spec.tsthat verifies:litellmUsePromptCacheis enabledManual Testing Steps:
Test Command:
Pre-Submission Checklist
Screenshots / Videos
Before: The LiteLLM settings page shows only Base URL, API Key, and Model selection.
After: When a model that supports prompt caching is selected, an additional "Enable prompt caching" checkbox appears with a description.
Note: The checkbox only appears for models that have
supportsPromptCache: truein their model info.Documentation Updates
The feature is self-explanatory through the UI, using existing translation keys that are already documented.
Additional Notes
This implementation follows the same approach as the referenced Cline commit but adapts it to RooCode's architecture. The main difference is that we reuse existing translation keys instead of creating new ones, which ensures all languages are supported without additional translation work.
Get in Touch
@MuriloFP
Important
Adds prompt caching support for LiteLLM, including schema updates, handler modifications, UI changes, and tests.
litellmUsePromptCacheboolean to provider settings schema inprovider-settings.ts.LiteLLMHandlerinlite-llm.tsto add cache control headers to system and last two user messages if caching is enabled.LiteLLMHandler.LiteLLM.tsx, visible only for models supporting caching.lite-llm.spec.tsto verify cache control headers and token tracking when caching is enabled.This description was created by
for d460f43. You can customize this summary. It will automatically update as commits are pushed.