Skip to content

Conversation

@CellenLee
Copy link
Contributor

@CellenLee CellenLee commented Sep 28, 2025

Related GitHub Issue

None

Roo Code Task Context (Optional)

Description

Regardless of whether the service provider provides an explicit prompt cache option, adding prompt_cache_key helps the LLM provider dispatch to a more cache-friendly server, then the prefix caching provided by vllm and sglang will work.

So I created this PR to add prompt_cache_key to every LLM provider. In the long run, this will help LLM service providers optimize the scheduling of RooCode requests, thereby reducing TTFT and financial consumption.

Test Procedure

Pre-Submission Checklist

  • Issue Linked: This PR is linked to an approved GitHub Issue (see "Related GitHub Issue" above).
  • Scope: My changes are focused on the linked issue (one major feature/fix per PR).
  • Self-Review: I have performed a thorough self-review of my code.
  • Testing: New and/or updated tests have been added to cover my changes (if applicable).
  • Documentation Impact: I have considered if my changes require documentation updates (see "Documentation Updates" section below).
  • Contribution Guidelines: I have read and agree to the Contributor Guidelines.

Screenshots / Videos

POST /v1/chat/completions

{
	"model": "gpt-5",
	"temperature": 0,
	"messages": [{
		"role": "system",
	}, {
		"role": "user",
	}, {
		"role": "assistant",
	}, {
		"role": "user",
	}, {
		"role": "assistant",
	}, {
		"role": "user",
	}],
	"stream": true,
	"stream_options": {
		"include_usage": true
	},
	"prompt_cache_key": "6ab1234d-c123-1234-12ab-aaaaaaaaaaaa"
}

Documentation Updates

Does this PR necessitate updates to user-facing documentation?

  • No documentation updates are required.
  • Yes, documentation updates are required. (Please describe what needs to be updated or link to a PR in the docs repository).

Additional Notes

Get in Touch


Important

Adds prompt_cache_key to enhance caching across multiple LLM providers, optimizing request scheduling and reducing costs.

  • Behavior:
    • Adds prompt_cache_key to completePrompt and createMessage functions across multiple LLM providers to enhance caching.
    • Affects providers like OpenAiHandler, OpenAiNativeHandler, OllamaHandler, QwenCodeHandler, and others.
    • Aims to optimize request scheduling and reduce costs by improving cache utilization.
  • Functions:
    • Updates completePrompt in mistral.ts, native-ollama.ts, ollama.ts, openai-native.ts, openai.ts, openrouter.ts, qwen-code.ts, requesty.ts, unbound.ts, vercel-ai-gateway.ts, vscode-lm.ts, and xai.ts to include prompt_cache_key.
    • Updates createMessage in the same files to include prompt_cache_key.
  • Misc:
    • Minor code style adjustments in codebaseSearchTool.ts and c-sharp.ts.

This description was created by Ellipsis for 54820c9. You can customize this summary. It will automatically update as commits are pushed.

@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. enhancement New feature or request labels Sep 28, 2025
@hannesrudolph hannesrudolph added the Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. label Sep 28, 2025
Copy link
Contributor

@roomote roomote bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found some issues that need attention. See inline comments for details.

messages: [{ role: "system", content: systemPrompt }, ...convertToOpenAiMessages(messages)],
stream: true,
stream_options: { include_usage: true },
prompt_cache_key: metadata?.taskId,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P0: Official OpenAI endpoints may reject unknown request args (e.g., 'Unrecognized request argument: prompt_cache_key'). Please gate 'prompt_cache_key' and 'safety_identifier' so they’re only sent to endpoints that accept them (OpenRouter, vLLM/sglang gateways, etc.). Otherwise this can cause 400s for users on api.openai.com.

const response = await this.client.chat.completions.create({
model: modelId,
messages: [{ role: "user", content: prompt }],
prompt_cache_key: metadata?.taskId,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: The OpenAI SDK param types may not permit extra fields. If you keep these fields, ensure types allow them or route via a supported 'extra body' mechanism. Otherwise TS type-check or runtime validation could fail depending on the SDK/version.

stream: true as const,
...(isGrokXAI ? {} : { stream_options: { include_usage: true } }),
...(reasoning && reasoning),
prompt_cache_key: metadata?.taskId,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P0: Same gating concern as above — adding 'prompt_cache_key' and 'safety_identifier' to official OpenAI Chat Completions can trigger 'unrecognized argument' errors. Please conditionally include based on baseURL/provider capability (or behind a feature flag).

stream: false, // Non-streaming for completePrompt
store: false, // Don't store prompt completions
prompt_cache_key: metadata?.taskId,
safety_identifier: metadata?.safetyIdentifier,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P0/P1: Non-streaming request includes 'prompt_cache_key' and 'safety_identifier'. Official OpenAI endpoints may reject unknown fields and the SDK types may not allow extra props. Suggest gating by endpoint or providing a type-safe extension path.

@daniel-lxs daniel-lxs moved this from Triage to PR [Needs Prelim Review] in Roo Code Roadmap Oct 14, 2025
@hannesrudolph hannesrudolph added PR - Needs Preliminary Review and removed Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. labels Oct 14, 2025
@hannesrudolph
Copy link
Collaborator

I am very sorry for the late response to this! This PR does not meet contribution standards.

Specific issues:

  • No linked issue or defined problem statement.
  • Scope creep: includes unrelated formatting changes.
  • Implementation causes instability and breaks existing functionality.

Please open a tracked issue that clearly defines the problem before resubmitting. Each PR must focus on a single, well-scoped change with tests and documentation updates where applicable.

@github-project-automation github-project-automation bot moved this from New to Done in Roo Code Roadmap Oct 20, 2025
@github-project-automation github-project-automation bot moved this from PR [Needs Prelim Review] to Done in Roo Code Roadmap Oct 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request PR - Needs Preliminary Review size:L This PR changes 100-499 lines, ignoring generated files.

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

2 participants