-
Notifications
You must be signed in to change notification settings - Fork 2.6k
feat: add prompt_cache_key #8352
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add prompt_cache_key #8352
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found some issues that need attention. See inline comments for details.
| messages: [{ role: "system", content: systemPrompt }, ...convertToOpenAiMessages(messages)], | ||
| stream: true, | ||
| stream_options: { include_usage: true }, | ||
| prompt_cache_key: metadata?.taskId, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P0: Official OpenAI endpoints may reject unknown request args (e.g., 'Unrecognized request argument: prompt_cache_key'). Please gate 'prompt_cache_key' and 'safety_identifier' so they’re only sent to endpoints that accept them (OpenRouter, vLLM/sglang gateways, etc.). Otherwise this can cause 400s for users on api.openai.com.
| const response = await this.client.chat.completions.create({ | ||
| model: modelId, | ||
| messages: [{ role: "user", content: prompt }], | ||
| prompt_cache_key: metadata?.taskId, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P1: The OpenAI SDK param types may not permit extra fields. If you keep these fields, ensure types allow them or route via a supported 'extra body' mechanism. Otherwise TS type-check or runtime validation could fail depending on the SDK/version.
| stream: true as const, | ||
| ...(isGrokXAI ? {} : { stream_options: { include_usage: true } }), | ||
| ...(reasoning && reasoning), | ||
| prompt_cache_key: metadata?.taskId, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P0: Same gating concern as above — adding 'prompt_cache_key' and 'safety_identifier' to official OpenAI Chat Completions can trigger 'unrecognized argument' errors. Please conditionally include based on baseURL/provider capability (or behind a feature flag).
| stream: false, // Non-streaming for completePrompt | ||
| store: false, // Don't store prompt completions | ||
| prompt_cache_key: metadata?.taskId, | ||
| safety_identifier: metadata?.safetyIdentifier, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P0/P1: Non-streaming request includes 'prompt_cache_key' and 'safety_identifier'. Official OpenAI endpoints may reject unknown fields and the SDK types may not allow extra props. Suggest gating by endpoint or providing a type-safe extension path.
|
I am very sorry for the late response to this! This PR does not meet contribution standards. Specific issues:
Please open a tracked issue that clearly defines the problem before resubmitting. Each PR must focus on a single, well-scoped change with tests and documentation updates where applicable. |
Related GitHub Issue
None
Roo Code Task Context (Optional)
Description
Regardless of whether the service provider provides an explicit prompt cache option, adding prompt_cache_key helps the LLM provider dispatch to a more cache-friendly server, then the prefix caching provided by vllm and sglang will work.
So I created this PR to add prompt_cache_key to every LLM provider. In the long run, this will help LLM service providers optimize the scheduling of RooCode requests, thereby reducing TTFT and financial consumption.
Test Procedure
Pre-Submission Checklist
Screenshots / Videos
POST /v1/chat/completions
{ "model": "gpt-5", "temperature": 0, "messages": [{ "role": "system", }, { "role": "user", }, { "role": "assistant", }, { "role": "user", }, { "role": "assistant", }, { "role": "user", }], "stream": true, "stream_options": { "include_usage": true }, "prompt_cache_key": "6ab1234d-c123-1234-12ab-aaaaaaaaaaaa" }Documentation Updates
Does this PR necessitate updates to user-facing documentation?
Additional Notes
Get in Touch
Important
Adds
prompt_cache_keyto enhance caching across multiple LLM providers, optimizing request scheduling and reducing costs.prompt_cache_keytocompletePromptandcreateMessagefunctions across multiple LLM providers to enhance caching.OpenAiHandler,OpenAiNativeHandler,OllamaHandler,QwenCodeHandler, and others.completePromptinmistral.ts,native-ollama.ts,ollama.ts,openai-native.ts,openai.ts,openrouter.ts,qwen-code.ts,requesty.ts,unbound.ts,vercel-ai-gateway.ts,vscode-lm.ts, andxai.tsto includeprompt_cache_key.createMessagein the same files to includeprompt_cache_key.codebaseSearchTool.tsandc-sharp.ts.This description was created by
for 54820c9. You can customize this summary. It will automatically update as commits are pushed.